NoSQL

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Overview

The NoSQL databases are grouped in four categories:

  1. document databases
  2. graph databases
  3. key-value stores
  4. column stores

While different kinds of stores address different requirements, as described below, the unifying factor is the lack of predefined schema.

NoSQL Databases

Document Databases

Google Bigtable introduced a data model allowing rows to be added with any set of columns. The columns do not need to be predefined. The lack of predefined schema makes these databases attractive for applications where the attributes of objects are not known in advance or change frequently.

Document databases are conceptually similar to Google Bigtable database. They have a related data model, where a Bigtable row with its arbitrary number of columns/attributes corresponds to a document. The document is a tree of objects containing attribute values and lists, often with a mapping to JSON or XML. Unlike dumping JSON in a relational database, the document databases can work with the structure of the documents, they can extract, index, aggregate and filter based on attribute values in these documents.

The problem with Bigtable and document databases is that they cannot perform joins or transactions spanning several rows or documents. This behavior is deliberate because it allows the database to do automate partitioning.

Graph Databases

Graph databases focus on the relationship between items, and are appropriate for highly interconnected data models. Standard SQL cannot query transitive relationships, i.e. variable-length chains of joins which continue until some condition is reached. Graph databases, on the other hand, are optimized precisely for this kind of data.

Distributed Key-Value Stores

A key-value store is a distributed hash table designed for scalability. While a document database or a graph database can provide a useful data model for small-scale applications, distributed key value stores only make sense or truly vast amounts of data, much more than a single server could hold. These database can transparently partition and replicate data across many machines in a cluster. Key-value stores can be optimized for low latency, which is useful to speed up request/response cycle of the application or for high throughput, which is useful in case of batch processing jobs.

Key-value stores:

As in the case of document databases, the distributed key-value stores lack transactions and joins and rely on eventual consistency to ensure that the data eventually reaches a consistent state. These stores should be used only if the data items are independent so the consistent update of two or more items is not a requirement, and if the availability and performance is more important than the ACID guarantees.

Organizatorium