Search notes:

Generative pre-trained transformers (GPT)

Generative pre-trained transformers (GPT) models is a family of Large Language Models (LLMs) which were introduced by OpenAI in 2018.

GPT is called generative because it generates text (on a given prompt).

It is pre-trained because it was trained on lots of text.

And transformer refers to the (decoder-only?) transformer architecture (which was introduced in 2017 with the paper Attention Is All You Need).

TODO

Relation to BERT

Unlike BERT models, GPT models are unidirectional.

minGPT etc.

Andrej Karpathy's minGPT and nanoGPT as well as Jay Mody's picoGPT (see also his blog post).

nanoGPT

https://github.com/karpathy/nanoGPT, requires

When running prepare.py (in the directory data/openwebtext, Windows), I got the error message ImportError: numpy.core.multiarray failed to import.

I was able to solve that by running py -m pip install numpy -I.

Links

Improving Language Understanding by Generative Pre-Training

Index