Search notes:

Hadoop MapReduce

Hadoop MapReduce is a framework for processing large datasets in parallel using a cluster of lots of computers.
It behaves as if were one big computer.
Hadoop's MapReduce is a disk based two-stage paradigm. Apache Spark, in contrast, uses an in-memory primitive.

Architcture

batch-based: it is therefore weak when dealing with real-time data.
shared nothing: might be problematic for some algorithms

See also

MapReduce

Index