Lakehouse

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Overview

An architectural pattern used to implement access to data that is based on open direct-access data formats (such as Apache Parquet and ORC), has support for machine learning and data science and offers state-of-the-art performance. The implementation of a Lakehouse pattern is a data management system, which provides traditional analytical DBMS management and performance features such as ACID transactions, data versioning, auditing, indexing, caching and query optimization. It hovers of a Data Lake and combines its key benefits with those of a data warehouse. Works well with cloud environment where compute and storage are separated.

Lakehouse.png

Related Concepts

Data Warehouse. Schema-on-write. Business Intelligence (BI). Unstructured data. Data Lake. Schema-on-read. ETL, ELT, Machine Learning, data management, zero-copy cloning. DataFrame, data pipeline, batch job, streaming pipeline, SQL engines: Spark SQL, PrestoDB, Hive, AWS Athena.

Implementations