Search notes:

Hadoop MapReduce

Hadoop MapReduce is a framework for processing large datasets in parallel using a cluster of lots of computers.

It behaves as if were one big computer.

Hadoop's MapReduce is a disk based two-stage paradigm. Apache Spark, in contrast, uses an in-memory primitive.

Architcture

batch-based: it is therefore weak when dealing with real-time data.

shared nothing: might be problematic for some algorithms