Revision as of 01:27, 8 December 2021

External

Internal

Overview

Spark is a third generation unified analytics engine for large-scale data processing. It natively supports batch processing and stream processing. Stream processing is implemented as micro-batching. It uses HDFS as state backend.

Subjects

Concepts

Organizatorium

Spark SQL
PySpark/Spark SQL in interactive mode on JupyterHub.
Spark batch and streaming.
Spark job.
Spark UI
Spark history server
Spark remote shuffle service
Spark K8s Operator

@@ Line 8: / Line 8: @@
 * [[Stream Processing]]
 * [[Flink]]
+* [[Beam]]
 * [[Iceberg]]
 * [[Alluxio]]

Spark: Difference between revisions

Revision as of 01:27, 8 December 2021

Contents

External

Internal

Overview

Subjects

Organizatorium

Navigation menu

Spark: Difference between revisions

Revision as of 01:27, 8 December 2021

External

Internal

Overview

Subjects

Organizatorium

Navigation menu

Search