NoSQL: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 11: Line 11:
The NoSQL databases are grouped in four categories, according to their '''data model''': [[#Column_Stores|column stores]], [[#Document_Databases|document databases]], [[#Graph_Databases|graph databases]] and [[#Distributed_Key-Value_Stores|key-value stores]] .  
The NoSQL databases are grouped in four categories, according to their '''data model''': [[#Column_Stores|column stores]], [[#Document_Databases|document databases]], [[#Graph_Databases|graph databases]] and [[#Distributed_Key-Value_Stores|key-value stores]] .  


While different kinds of NoSQL databases address different requirements, as described below, the common factor is the lack of predefined schema. A NoSQL database could be a good choice if the data to be stored by the application is unstructured or has a structure that is not known in advance or changes frequently. A NoSQL database may '''improve development productivity'''. One of the drawbacks of using [[Relational_Databases|relational databases]] is that effort is required to map data between in-memory structures, in most cases object-oriented, and tables and rows. NoSQL databases may provide a data model that better fits the application needs, thus reducing this effort and resulting in less code to write, debug and evolve.
While different kinds of NoSQL databases address different requirements, as described below, the common factor is the lack of predefined schema. A NoSQL database could be a good choice if the data to be stored by the application is unstructured or has a structure that is not known in advance or changes frequently. As such, a NoSQL database may '''improve development productivity''': one of the drawbacks of using [[Relational_Databases|relational databases]] is that effort is required to map data between in-memory structures, in most cases object-oriented, and tables and rows. NoSQL databases may provide a data model that better fits the application needs, thus reducing this effort and resulting in less code to write, debug and evolve.


Some NoSQL stores can be tuned for low latency. Others can be used to store '''large amounts of data''' in a [[Replication|replicated]] and [[Partitioning|partitioned]] manner. A [[Relational_Databases|relational database]] is designed to run on a single machine, which may be insufficient for the amount of data to store. Many NoSQL databases are designed to run on clusters and commodity hardware, and scale for large amounts of data.
Some NoSQL stores can be tuned for low latency. Others can be used to store '''large amounts of data''' in a [[Replication|replicated]] and [[Partitioning|partitioned]] manner. A [[Relational_Databases|relational database]] is designed to run on a single machine, which may be insufficient for the amount of data to store. Many NoSQL databases are designed to run on clusters and commodity hardware, and scale for large amounts of data.

Revision as of 23:46, 7 November 2021

External

Internal

Overview

The NoSQL databases are grouped in four categories, according to their data model: column stores, document databases, graph databases and key-value stores .

While different kinds of NoSQL databases address different requirements, as described below, the common factor is the lack of predefined schema. A NoSQL database could be a good choice if the data to be stored by the application is unstructured or has a structure that is not known in advance or changes frequently. As such, a NoSQL database may improve development productivity: one of the drawbacks of using relational databases is that effort is required to map data between in-memory structures, in most cases object-oriented, and tables and rows. NoSQL databases may provide a data model that better fits the application needs, thus reducing this effort and resulting in less code to write, debug and evolve.

Some NoSQL stores can be tuned for low latency. Others can be used to store large amounts of data in a replicated and partitioned manner. A relational database is designed to run on a single machine, which may be insufficient for the amount of data to store. Many NoSQL databases are designed to run on clusters and commodity hardware, and scale for large amounts of data.

NoSQL Databases

Column Stores

Google Bigtable introduced a data model allowing rows to be added with any set of columns. The columns do not need to be predefined. The lack of predefined schema makes these databases attractive for applications where the attributes of objects are not known in advance or change frequently.

Document Databases

Document databases are conceptually similar to Google Bigtable database. They have a related data model, where a Bigtable row with its arbitrary number of columns/attributes corresponds to a document. The document is a tree of objects containing attribute values and lists, often with a mapping to JSON or XML. Unlike dumping JSON in a relational database, the document databases can work with the structure of the documents, they can extract, index, aggregate and filter based on attribute values in these documents.

The problem with Bigtable and document databases is that they cannot perform joins or transactions spanning several rows or documents. This behavior is deliberate because it allows the database to do automate partitioning.

Graph Databases

Graph databases focus on the relationship between items, and are appropriate for highly interconnected data models. Standard SQL cannot query transitive relationships, i.e. variable-length chains of joins which continue until some condition is reached. Graph databases, on the other hand, are optimized precisely for this kind of data.

Distributed Key-Value Stores

A key-value store is a distributed hash table designed for scalability. While a document database or a graph database can provide a useful data model for small-scale applications, distributed key value stores only make sense or truly vast amounts of data, much more than a single server could hold. These database can transparently partition and replicate data across many machines in a cluster. Key-value stores can be optimized for low latency, which is useful to speed up request/response cycle of the application or for high throughput, which is useful in case of batch processing jobs.

Key-value stores:

As in the case of document databases, the distributed key-value stores lack transactions and joins and rely on eventual consistency to ensure that the data eventually reaches a consistent state. These stores should be used only if the data items are independent so the consistent update of two or more items is not a requirement, and if the availability and performance is more important than the ACID guarantees.