Context size/length: the number of tokens taken into account to predict the next token.
Models
Cerebras-GPT
Cerebras-GPT is a family of seven GPT models ranging from 111 million to 13 billion parameters.
These models were trained on CS-2 systems (Andromeda AI supercomputer) using the Chinchilla formula.
Their weights and checkpoints are available on Hugging Face and GitHub under the Apache 2.0 license.
GPT-4
GPT-4 was released without any information about its model architecture, training data, training hardware or hyperparameters.
Chinchilla (DeepMind)
Same budget as Gopher but with 70 B parameters and 4 times more data.
Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. It uses substantially less computing for fine-tuning and inference, greatly facilitating downstream usage.
Gemini
This model is created from the ground up to be multi modal, highly efficient at tool and API integrations and built to enable future innovations like memory and planning.