Search notes:
Python library: datasets
The two main purposes of
datasets
is to
be able to download public datasets (
Hugging Face
)
efficiently pre-process these public as well as local datasets
datasets
seems to support
CSV
JSON
Apache Parquet
Image formats: png, jpeg
Audio formats: wav, mp3
Ordinary text
See also
Data exchange formats
datasets
is a prerequisite for nanoGPT
Index