Search notes:
Generative pre-trained transformers (GPT)
Generative pre-trained transformers (GPT)
models is a family of
Large Language Models (LLMs)
which were
introduced by OpenAI in 2018
.
GPT is called
generative
because it generates text (on a given prompt).
It is
pre-trained
because it was trained on lots of text.
And
transformer
refers to the (decoder-only?) transformer architecture (which was introduced in 2017 with the paper
Attention Is All You Need
).
TODO
Relation to BERT
Unlike BERT models, GPT models are unidirectional.
minGPT etc.
Andrej Karpathy's
minGPT
and
nanoGPT
as well as Jay Mody's
picoGPT
(see also
his blog post
).
nanoGPT
https://github.com/karpathy/nanoGPT
, requires
tqdm
tiktoken
datasets
numpy
When running
prepare.py
(in the directory
data/openwebtext
, Windows), I got the error message
ImportError: numpy.core.multiarray failed to import
.
I was able to solve that by running
py -m pip install numpy -I
.
See also
GPT-2
,
GPT-3
,
GPT-4
,
ChatGPT
Links
Improving Language Understanding by Generative Pre-Training
Index