AI Sweden together with RISE are developing large-scale language models for the Swedish language, enabling a number of applications with specific relevance to both the public and private sectors in Sweden.
GPT-SWE is the first truly large-scale generative language model for the Swedish language. Based on the same technical principles as the much-discussed GPT-3, GPT-SWE will help Swedish organizations build language applications not previously possible.
Going forward, preparations are already underway to train an even larger model, with more data. The aim is to train a significantly larger Nordic language model from data in all Nordic languages. Such a model would be able to handle all the Nordic languages, and would likely lead to improved performance for each of the individual languages, due to the variation present in the combined training data.
The current GPT-SWE model is trained on Linköpings University’s supercomputer, Berzelius, using the Megatron framework from NVIDIA.
2022-06-03 09:57
Social Media
|
2022-06-03 09:54
Linked ML Model
GPT-SW3 is a 3.5B parameter autoregressive language model, trained on a 100 GB Swedish corpus. Model detailsGPT-SW3 follows the GPT architecture, as implemented in the Megatron-LM framework. The model consists of 30 Transformer layers, with 32 attention heads each. The embedding dimension is 3072 and the dimension of the feedforward layer is 12288. The tokenizer used is BPE, with a vocabulary size of 50304. |
2022-01-31 15:07
Weblink
|
2022-01-31 15:07
Weblink
|