Spark: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
(Created page with "=Internal= * Distributed Systems")
 
No edit summary
 
(27 intermediate revisions by the same user not shown)
Line 1: Line 1:
=External=
* https://spark.apache.org
* https://spark.apache.org/docs/latest/index.html
* https://www.macrometa.com/event-stream-processing/spark-vs-flink
=Internal=
=Internal=
* [[Distributed_Systems#Distributed_Computation|Distributed Systems]]
* [[Stream Processing]]
* [[Flink]]
* [[Beam]]
* [[Iceberg]]
* [[Alluxio]]
* [[Spark Operator]]
* [[Genie]]
* [[Livy]]
* [[dbt]]
=Overview=
Spark is a third generation unified analytics engine for large-scale data processing. It natively supports [[System_Design#Batch_Processing|batch processing]] and [[System_Design#Stream_Processing|stream processing]]. Stream processing is implemented as micro-batching. It uses [[HDFS]] as state backend.


* [[Distributed_Systems#Distributed_Computation|Distributed Systems]]
=Subjects=
* [[Spark Concepts|Concepts]]
=Organizatorium=
* <span id='Spark_SQL'></span>Spark SQL
* PySpark/Spark SQL in interactive mode on [[JupyterHub]].
* Spark batch and streaming.
* Spark job.
* Spark UI
* Spark history server
* Spark remote shuffle service
* [[Spark Operator]]

Latest revision as of 16:25, 17 May 2022

External

Internal

Overview

Spark is a third generation unified analytics engine for large-scale data processing. It natively supports batch processing and stream processing. Stream processing is implemented as micro-batching. It uses HDFS as state backend.

Subjects

Organizatorium

  • Spark SQL
  • PySpark/Spark SQL in interactive mode on JupyterHub.
  • Spark batch and streaming.
  • Spark job.
  • Spark UI
  • Spark history server
  • Spark remote shuffle service
  • Spark Operator