Spark: Difference between revisions
Jump to navigation
Jump to search
Line 21: | Line 21: | ||
* [[Spark Concepts|Concepts]] | * [[Spark Concepts|Concepts]] | ||
=Organizatorium= | =Organizatorium= | ||
* Spark SQL | * <span id='Spark_SQL'></span>Spark SQL | ||
* PySpark/Spark SQL in interactive mode on [[JupyterHub]]. | * PySpark/Spark SQL in interactive mode on [[JupyterHub]]. | ||
* Spark batch and streaming. | * Spark batch and streaming. |
Revision as of 20:00, 9 December 2021
External
- https://spark.apache.org
- https://spark.apache.org/docs/latest/index.html
- https://www.macrometa.com/event-stream-processing/spark-vs-flink
Internal
Overview
Spark is a third generation unified analytics engine for large-scale data processing. It natively supports batch processing and stream processing. Stream processing is implemented as micro-batching. It uses HDFS as state backend.
Subjects
Organizatorium
- Spark SQL
- PySpark/Spark SQL in interactive mode on JupyterHub.
- Spark batch and streaming.
- Spark job.
- Spark UI
- Spark history server
- Spark remote shuffle service
- Spark K8s Operator