Revision as of 21:14, 1 May 2023

External

https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics by Michael Armbrust, Ali Ghodsi, Reynold Xin, Matei Zaharia

Internal

Overview

An architectural pattern used to implement access to data that is based on open direct-access data formats (such as Apache Parquet and ORC), has support for machine learning and data science and offers state-of-the-art performance. It is based on the concept of Data Lake.

Related Concepts

Data Warehouse. Schema-on-write. Business Intelligence (BI). Unstructured data. Data Lake. Schema-on-read. ETL, ELT, Machine Learning, data management, zero-copy cloning. DataFrame, data pipeline, batch job, streaming pipeline, SQL engines: Spark SQL, PrestoDB, Hive, AWS Athena.

Implementations

Apache Iceberg

@@ Line 12: / Line 12: @@
 =Related Concepts=
-[[Data Warehouse]]. Schema-on-write. Business Intelligence (BI).  Unstructured data. [[Data Lake]]. Schema-on-read. ETL, ELT, [[Machine Learning]], data management, zero-copy cloning. DataFrame, data pipeline, batch job, streaming pipeline, SQL engines: Spark SQL, [[Presto]], [[Hive]], AWS Athena.
+[[Data Warehouse]]. Schema-on-write. Business Intelligence (BI).  Unstructured data. [[Data Lake]]. Schema-on-read. ETL, ELT, [[Machine Learning]], data management, zero-copy cloning. DataFrame, data pipeline, batch job, streaming pipeline, SQL engines: Spark SQL, [[PrestoDB]], [[Hive]], AWS Athena.
 =Implementations=
 * [[Apache Iceberg]]

Lakehouse: Difference between revisions

Revision as of 21:14, 1 May 2023

Contents

External

Internal

Overview

Related Concepts

Implementations

Navigation menu

Lakehouse: Difference between revisions

Revision as of 21:14, 1 May 2023

External

Internal

Overview

Related Concepts

Implementations

Navigation menu

Search