Spark: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 8: Line 8:
* [[Stream Processing]]
* [[Stream Processing]]
* [[Flink]]
* [[Flink]]
* [[Beam]]
* [[Iceberg]]
* [[Iceberg]]
* [[Alluxio]]
* [[Alluxio]]

Revision as of 01:27, 8 December 2021

External

Internal

Overview

Spark is a third generation unified analytics engine for large-scale data processing. It natively supports batch processing and stream processing. Stream processing is implemented as micro-batching. It uses HDFS as state backend.

Subjects

Organizatorium

  • Spark SQL
  • PySpark/Spark SQL in interactive mode on JupyterHub.
  • Spark batch and streaming.
  • Spark job.
  • Spark UI
  • Spark history server
  • Spark remote shuffle service
  • Spark K8s Operator