Spark: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 13: Line 13:
=Subjects=
=Subjects=
* [[Spark Concepts|Concepts]]
* [[Spark Concepts|Concepts]]
=Organizatorium=
* Spark SQL
* PySpark/Spark SQL in interactive mode on [[JupyterHub]].

Revision as of 20:56, 7 December 2021

External

Internal

Overview

Spark is a third generation unified analytics engine for large-scale data processing. It natively supports batch processing and stream processing. Stream processing is implemented as micro-batching. It uses HDFS as state backend.

Subjects

Organizatorium

  • Spark SQL
  • PySpark/Spark SQL in interactive mode on JupyterHub.