Latest revision as of 16:25, 17 May 2022

External

Internal

Overview

Spark is a third generation unified analytics engine for large-scale data processing. It natively supports batch processing and stream processing. Stream processing is implemented as micro-batching. It uses HDFS as state backend.

Subjects

Concepts

Organizatorium

Spark SQL
PySpark/Spark SQL in interactive mode on JupyterHub.
Spark batch and streaming.
Spark job.
Spark UI
Spark history server
Spark remote shuffle service
Spark Operator

@@ Line 6: / Line 6: @@
 =Internal=
 * [[Distributed_Systems#Distributed_Computation|Distributed Systems]]
+* [[Stream Processing]]
 * [[Flink]]
+* [[Beam]]
 * [[Iceberg]]
 * [[Alluxio]]
-* [[Spark K8s Operator]]
+* [[Spark Operator]]
+* [[Genie]]
+* [[Livy]]
+* [[dbt]]
 =Overview=
@@ Line 17: / Line 22: @@
 * [[Spark Concepts|Concepts]]
 =Organizatorium=
-* Spark SQL
+* <span id='Spark_SQL'></span>Spark SQL
 * PySpark/Spark SQL in interactive mode on [[JupyterHub]].
 * Spark batch and streaming.
@@ Line 24: / Line 29: @@
 * Spark history server
 * Spark remote shuffle service
-* [[Spark K8s Operator]]
+* [[Spark Operator]]

Spark: Difference between revisions

Latest revision as of 16:25, 17 May 2022

Contents

External

Internal

Overview

Subjects

Organizatorium

Navigation menu

Spark: Difference between revisions

Latest revision as of 16:25, 17 May 2022

External

Internal

Overview

Subjects

Organizatorium

Navigation menu

Search