Search notes:

Data lake

According to Dull, a data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured and unstructered data.

This implies that any and all data can be stored in a data lake.

The goal of data lakes is to be able to quickly load large volumes of data from changing heterogenous data sources.

The data lake stores data until it is needed.

Because of the data's diversity and because it's stored in its native format, the concept of a data lake is in contrast to that of a data warehouse.

The fact that a data lake does store all kinds of data formats, even unstructred ones, it is important to design the data lake itself in structured fashion lest the data lake does not become a data swamp.

Advantages

Advantages of data lakes include

Data is rarely thrown away which allows to find significant insights from the data that was not foreseeable when the data was loaded
Analysts can explore the data the way the want and need toa
A data lake might be populated faster than a DWH with traditional ETL pipelines.

Use cases

Typical usa cases for data lakes are

Often, data lakes are also found in event streaming or IoT scenarios.

Data lake

Advantages

Use cases

See also