GPT-SWE

Project owned by AI Sweden English
2y ago update

AI Sweden together with RISE are developing large-scale language models for the Swedish language, enabling a number of applications with specific relevance to both the public and private sectors in Sweden.

GPT-SWE is the first truly large-scale generative language model for the Swedish language. Based on the same technical principles as the much-discussed GPT-3, GPT-SWE will help Swedish organizations build language applications not previously possible. 

Going forward, preparations are already underway to train an even larger model, with more data. The aim is to train a significantly larger Nordic language model from data in all Nordic languages. Such a model would be able to handle all the Nordic languages, and would likely lead to improved performance for each of the individual languages, due to the variation present in the combined training data.

The current GPT-SWE model is trained on Linköpings University’s supercomputer, Berzelius, using the Megatron framework from NVIDIA.

Attributes

Language
NLP
pages

Resources

2022-06-03 09:57 Social Media
2022-06-03 09:54 Linked ML Model

GPT-SW3 is a 3.5B parameter autoregressive language model, trained on a 100 GB Swedish corpus.

Model details

GPT-SW3 follows the GPT architecture, as implemented in the Megatron-LM framework. The model consists of 30 Transformer layers, with 32 attention heads each. The embedding dimension is 3072 and the dimension of the feedforward layer is 12288. The tokenizer used is BPE, with a vocabulary size of 50304.

2022-01-31 15:07 Weblink
Q&A
2022-01-31 15:07 Weblink