link

Resource for AI Sweden

GPT-SW3

ML Model by AI Sweden 2y ago

GPT-SW3 is a 3.5B parameter autoregressive language model, trained on a 100 GB Swedish corpus.

Model details

GPT-SW3 follows the GPT architecture, as implemented in the Megatron-LM framework. The model consists of 30 Transformer layers, with 32 attention heads each. The embedding dimension is 3072 and the dimension of the feedforward layer is 12288. The tokenizer used is BPE, with a vocabulary size of 50304.

Attributes

Language
DNN, Transformer
Textual Data