Latest revision as of 16:25, 17 May 2022

External

Internal

Overview

Spark is a third generation unified analytics engine for large-scale data processing. It natively supports batch processing and stream processing. Stream processing is implemented as micro-batching. It uses HDFS as state backend.

Subjects

Concepts

Organizatorium

Spark SQL
PySpark/Spark SQL in interactive mode on JupyterHub.
Spark batch and streaming.
Spark job.
Spark UI
Spark history server
Spark remote shuffle service
Spark Operator

@@ Line 1: / Line 1: @@
+=External=
+* https://spark.apache.org
+* https://spark.apache.org/docs/latest/index.html
+* https://www.macrometa.com/event-stream-processing/spark-vs-flink
 =Internal=
+* [[Distributed_Systems#Distributed_Computation|Distributed Systems]]
+* [[Stream Processing]]
+* [[Flink]]
+* [[Beam]]
+* [[Iceberg]]
+* [[Alluxio]]
+* [[Spark Operator]]
+* [[Genie]]
+* [[Livy]]
+* [[dbt]]
+=Overview=
+Spark is a third generation unified analytics engine for large-scale data processing. It natively supports [[System_Design#Batch_Processing|batch processing]] and [[System_Design#Stream_Processing|stream processing]]. Stream processing is implemented as micro-batching. It uses [[HDFS]] as state backend.
-* [[Distributed_Systems#Distributed_Computation|Distributed Systems]]
+=Subjects=
+* [[Spark Concepts|Concepts]]
+=Organizatorium=
+* <span id='Spark_SQL'></span>Spark SQL
+* PySpark/Spark SQL in interactive mode on [[JupyterHub]].
+* Spark batch and streaming.
+* Spark job.
+* Spark UI
+* Spark history server
+* Spark remote shuffle service
+* [[Spark Operator]]

Spark: Difference between revisions

Latest revision as of 16:25, 17 May 2022

Contents

External

Internal

Overview

Subjects

Organizatorium

Navigation menu

Spark: Difference between revisions

Latest revision as of 16:25, 17 May 2022

External

Internal

Overview

Subjects

Organizatorium

Navigation menu

Search