GPT-SWE

Project owned by AI Sweden English

3y ago

Read more at https://www.ai.se/en/node/81535/gpt-swe

Summary Solution

AI Sweden together with RISE are developing large-scale language models for the Swedish language, enabling a number of applications with specific relevance to both the public and private sectors in Sweden.

GPT-SWE is the first truly large-scale generative language model for the Swedish language. Based on the same technical principles as the much-discussed GPT-3, GPT-SWE will help Swedish organizations build language applications not previously possible.

Going forward, preparations are already underway to train an even larger model, with more data. The aim is to train a significantly larger Nordic language model from data in all Nordic languages. Such a model would be able to handle all the Nordic languages, and would likely lead to improved performance for each of the individual languages, due to the variation present in the combined training data.

The current GPT-SWE model is trained on Linköpings University’s supercomputer, Berzelius, using the Megatron framework from NVIDIA.

Attributes

Capability

Language

AI Technology

NLP

Resources

GPT-SW3 Model

2022-06-03 09:54 Linked ML Model

GPT-SW3 is a 3.5B parameter autoregressive language model, trained on a 100 GB Swedish corpus.

Model details

GPT-SW3 follows the GPT architecture, as implemented in the Megatron-LM framework. The model consists of 30 Transformer layers, with 32 attention heads each. The embedding dimension is 3072 and the dimension of the feedforward layer is 12288. The tokenizer used is BPE, with a vocabulary size of 50304.

What's the latest with the GPT-SWE?

2022-01-31 15:07 Weblink

Q&A

2022-01-31 15:07 Weblink