Lakehouse: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 7: | Line 7: | ||
An architectural pattern used to implement access to data that is based on open direct-access data formats (such as Apache Parquet and ORC), has support for machine learning and data science and offers state-of-the-art performance. It is based on the concept of [[Data Lake]]. | An architectural pattern used to implement access to data that is based on open direct-access data formats (such as Apache Parquet and ORC), has support for machine learning and data science and offers state-of-the-art performance. It is based on the concept of [[Data Lake]]. | ||
=Related Concepts= | =Related Concepts= | ||
Data warehouse. Schema on write. Business Intelligence (BI). Unstructured data. [[Data Lake]] | Data warehouse. Schema on write. Business Intelligence (BI). Unstructured data. [[Data Lake]]. Schema-on-read. |
Revision as of 20:54, 1 May 2023
External
- https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics by Michael Armbrust, Ali Ghodsi, Reynold Xin, Matei Zaharia
Internal
Overview
An architectural pattern used to implement access to data that is based on open direct-access data formats (such as Apache Parquet and ORC), has support for machine learning and data science and offers state-of-the-art performance. It is based on the concept of Data Lake.
Related Concepts
Data warehouse. Schema on write. Business Intelligence (BI). Unstructured data. Data Lake. Schema-on-read.