Lakehouse: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 12: Line 12:


=Related Concepts=
=Related Concepts=
[[Data Warehouse]]. Schema-on-write. Business Intelligence (BI).  Unstructured data. [[Data Lake]]. Schema-on-read. ETL, ELT, [[Machine Learning]], data management, zero-copy cloning. DataFrame, data pipeline, batch job, streaming pipeline, SQL engines: Spark SQL, [[Presto]], [[Hive]], AWS Athena.
[[Data Warehouse]]. Schema-on-write. Business Intelligence (BI).  Unstructured data. [[Data Lake]]. Schema-on-read. ETL, ELT, [[Machine Learning]], data management, zero-copy cloning. DataFrame, data pipeline, batch job, streaming pipeline, SQL engines: Spark SQL, [[PrestoDB]], [[Hive]], AWS Athena.


=Implementations=
=Implementations=
* [[Apache Iceberg]]
* [[Apache Iceberg]]

Revision as of 21:14, 1 May 2023

External

Internal

Overview

An architectural pattern used to implement access to data that is based on open direct-access data formats (such as Apache Parquet and ORC), has support for machine learning and data science and offers state-of-the-art performance. It is based on the concept of Data Lake.

Lakehouse.png

Related Concepts

Data Warehouse. Schema-on-write. Business Intelligence (BI). Unstructured data. Data Lake. Schema-on-read. ETL, ELT, Machine Learning, data management, zero-copy cloning. DataFrame, data pipeline, batch job, streaming pipeline, SQL engines: Spark SQL, PrestoDB, Hive, AWS Athena.

Implementations