GPT-SW3
ML Model
by AI Sweden
2y ago
GPT-SW3 is a 3.5B parameter autoregressive language model, trained on a 100 GB Swedish corpus.
Model details
GPT-SW3 follows the GPT architecture, as implemented in the Megatron-LM framework. The model consists of 30 Transformer layers, with 32 attention heads each. The embedding dimension is 3072 and the dimension of the feedforward layer is 12288. The tokenizer used is BPE, with a vocabulary size of 50304.
Language
DNN, Transformer
Textual Data