Search notes:

Apache Spark

Spark is a fast and general cluster computing system for Big Data.

High-level APIs in Scala, Java, Python, and R

In contrast to Hadoop MapReduce, which is disk based, Spark uses memory. Apparently, this is useful for machine learning algorithms.

Spark runs on

Hadoop (YARN?)
Apache Mesos
Kubernetes
standalone
In the Cloud

Spark accesses various data sources.

Originally developed in 2009 in UC Berkeley's AMP lab.

Fully open sourced in 2010 - now a top level project at the Apache Software Foundation

Documentation

http://spark.apache.org/documentation.html

https://cwiki.apache.org/confluence/display/SPARK

Concepts

RDD Resilient Distributed Datasets

Transformations

Actions

TODOs

MLlib: machine learning

GraphX: graph processing

Spark Streaming

See also

SQL Server: Big Data Clusters

Microsoft's implementation of Spark is Azure HDInsight.

Compare also with Azure Databricks and Synapse Analytics.

Links

http://spark.apache.org.

http://github.com/apache/spark

Fatal error: Uncaught PDOException: SQLSTATE[HY000]: General error: 8 attempt to write a readonly database in /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php:78 Stack trace: #0 /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php(78): PDOStatement->execute(Array) #1 /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php(30): insert_webrequest_('/notes/developm...', 1759391080, '216.73.216.42', 'Mozilla/5.0 App...', NULL) #2 /home/httpd/vhosts/renenyffenegger.ch/httpsdocs/notes/development/Apache/Spark/index(103): insert_webrequest() #3 {main} thrown in /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php on line 78