layers | heads | embd | ctx | vocab | |
124 M | 12 | 12 | 768 | 1024 | 50257 |
355 M | 24 | 16 | 1024 | 1024 | 50257 |
774 M | 36 | 20 | 1280 | 1024 | 50257 |
1.5 B | 48 | 25 | 1600 | 1024 | 50257 |
<|endoftext|>
?) token. checkpoint | |
encoder.json | The vocabulary |
hparams.json | The model's hyper-parameters with the values n_vocab (number of tokens in the vocabulary), n_ctx (the maximum input sequence length), n_embd (Embedding dimension, the width of the network), n_head (number of attention heads, n_embd must be divisible by n_head ) and n_layer (the depth of the network). |
model.ckpt.data-00000-of-00001 | |
model.ckpt.index | |
model.ckpt.meta | |
vocab.bpe | Byte-pair merges |
for $model in '124M', '335M', '774M', '1558M' loop for $filename in 'checkpoint','encoder.json','hparams.json','model.ckpt.data-00000-of-00001', 'model.ckpt.index', 'model.ckpt.meta', 'vocab.bpe' loop download(https://openaipublic.blob.core.windows.net/gpt-2/models/$model/$filename) end loop end loop