Search notes:

Apache Spark

Spark is a fast and general cluster computing system for Big Data.
High-level APIs in Scala, Java, Python, and R
In contrast to Hadoop MapReduce, which is disk based, Spark uses memory. Apparently, this is useful for machine learning algorithms.
Spark runs on
Spark accesses various data sources.
Originally developed in 2009 in UC Berkeley's AMP lab.
Fully open sourced in 2010 - now a top level project at the Apache Software Foundation

Documentation

http://spark.apache.org/documentation.html
https://cwiki.apache.org/confluence/display/SPARK

Concepts

RDD Resilient Distributed Datasets
Transformations
Actions

TODOs

MLlib: machine learning
GraphX: graph processing
Spark Streaming

See also

SQL Server: Big Data Clusters
Microsoft's implementation of Spark is Azure HDInsight.
Compare also with Azure Databricks and Synapse Analytics.

Links

http://spark.apache.org.
http://github.com/apache/spark

Index