Search notes:

Big Data

Job families:

Internet of Things (IoT)

A special subset of big data is the so called Internet of Things.

Lambda Architecture

Nathan Marz proposes to deal with Big Data using a so-called Lambda Architecture that rests upon three layers:
Batch layer redundant and distributed processing. It aims at its data being accurate. Hadoop, Snowflake, Redshift, Synapse and Big Query
Speed layer procesing real time data faster than batch layer, but with less accuracy. As soon as batch layer is finsihed with its calculation, the results of the batch layer are merged into the speed layer. Apache Storm, Spark, SQLstream, Azure Stream Analytics, Azure Cosmos DB
Serving layer Stores results of speed and batch layer to be used for example for ad-hoc queries. Apache Cassandra, HBase, Hive, Impala; Elasticsearch, Azure Cosmos DB, MongoDB, VoltDB, Elephant DB, SAP HANA
The term Lambda is borrowed from the realm of functional programming where a lambda function does not change data, but copy-modifies data for processing.

Open source systems

The major open source systems for Big Data include

See also

Azure Data Lake Analytics

Index