Data Structures: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(103 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Internal=
=Internal=
* [[Software Engineering]]


* [[Programming#Data_Structures|Programming]]
=Overview=


=Overview=
A '''data structure''' is an arrangement of data in computer's memory or external storage, designed to facilitate a specific way to access or modify the data, quickly and usefully. Data structures include arrays, linked lists, stacks, queues, binary trees, search trees, heaps, hash tables, bloom filters, union-find, etc.


A '''data structure''' is an arrangement of data in computer's memory or external storage, designed to facilitate a specific way to access or modify it. Data structures include arrays, linked lists, stacks, binary trees, hash tables, heaps, etc.  
Data structures are crucial ingredients in the design of fast [[#Algorithms|algorithms]]. A data structure is responsible for organizing data in a way that supports fast queries. Different data structures support different types of [[#Data_Structure_Characteristics|operations]] and are suited for different kinds of tasks.
=Data Structure Characteristics=
A data structure can be described based on several characteristics:
* space requirements.
* what '''operations''' the data structure supports.
* what is the '''running time''' of those operations.
In general, fewer operations a data structure supports, the faster the operations and smallest the space overhead required by the data structure will be.


All data structures used to represent data in the memory of a computer are at essentially <span id='Set'></span>'''sets'''. Unlike a mathematical [[Set|set]], the sets manipulated by computers are '''finite''', and their composition change in time, so they are called <span id='Dynamic_Set'></span>'''dynamic'''. We know from mathematics that the elements of a set are distinct. Data structure sets also consist of distinct elements, if we conceptualize the fact that the elements are maintained in distinct memory locations. In a typical implementation of a dynamic set, each element is represented by an object (memory location) whose attributes can be examined and manipulated if we have a pointer or a reference to the objects. Different objects have different references.
=<span id='Set'></span>Data Structures as Sets=
All data structures used to represent data in the memory of a computer are essentially '''sets'''. Unlike a mathematical [[Set|set]], the sets manipulated by computers are '''finite''', and their composition change in time, so they are called <span id='Dynamic_Set'></span>'''dynamic sets'''. We know from mathematics that the elements of a set are distinct. Data structure sets also consist of distinct elements, if we conceptualize the fact that the elements are maintained in distinct memory locations. In a typical implementation of a dynamic set, each element is represented by an object (memory location) whose attributes can be examined and manipulated if we have a pointer or a reference to the objects. Different objects have different references.


Some dynamic sets assume that one of the element's attributes is an identifying <span id='Key'></span>'''key'''. If the keys are all different, we can think of the dynamic set as being a set of key values. Some dynamic sets assume that the keys are drawn from a ''totally ordered set'', such as the real numbers, or the set of all worlds under the usual alphabetic ordering. A total ordering allows defining a minimum element of the set, or to speak of the next element larger than the given element.
Some dynamic sets assume that one of the element's attributes is an identifying <span id='Key'></span>'''key'''. If the keys are all different, we can think of the dynamic set as being a set of key values. Some dynamic sets assume that the keys are drawn from a [[Relations#Total_Order|totally ordered set]], such as the real numbers, or the set of all words under the usual alphabetic ordering. A total ordering allows defining a minimum element of the set, or to speak of the next element larger than the given element.
=Augmenting a Data Structure=
Augmenting a data structure means modifying the data structure to maintain additional information about the data structure itself. An example is maintaining the number of nodes contained by the subtree of a certain node in a binary search tree, which is useful to implement [[Binary_Search_Trees#SELECT_Implementation|SELECT()]] and [[Binary_Search_Trees#RANK_Implementation|RANK()]] on binary search trees, or the "color" of a red-black tree. Care should be taken to update the augmented state in case of any operation that modifies the data structure.


<span id='Dictionary'></span>A dynamic set that allows ''insertion'' of new elements, ''deletion'' of existing elements and ''membership testing'' (or search) for a specific element is called a '''dictionary'''. Of these, the insertion and deletion modifies the set, and they are called ''modifying operations'', whereas the search does not. As such, the search operation belongs to the ''query'' category of operations.
=Dictionary=
A dynamic set that allows '''insertion''' of new elements, '''deletion''' of existing elements and '''membership testing''' (or search) for a specific element is called a '''dictionary'''. Insertion and deletion modify the set, and they are called [[#Modifying_Dictionary_Operations|modifying operations]]. Search does not modify the set and it belongs to the [[#Queries|query]] category of operations. Depending on the contexts, the concept of dictionary may imply an underlying total ordered set and the [[#n43ysL|associated queries on totally ordered sets]].


Modifying dictionary operations:
==Modifying Dictionary Operations==
Assuming that X is a pointer, or a reference, not a [[#Key|key]], a dictionary supports the following modifying operations:
===<span id='INSERT'></span><tt>INSERT(X)</tt>===
Add the element pointed by X to the set.
===<span id='DELETE'></span><tt>DELETE(X)</tt>===
Remove the element pointed by X.


* <span id='INSERT'></span>INSERT(X) - adds the element pointed by X to the set.
==Queries==
* <span id='DELETE'></span>DELETE(X) - removes the element pointed by X (X is a pointer, or a reference, not a [[#Key|key]]).
===<span id='SEARCH'></span><tt>SEARCH(K)</tt>===
A query that given a [[#Key|key]] value K, returns a pointer to an element with the given key, or NULL if there is no such element. Different data structures implement search in different ways, resulting in different running times relative to the size of the set. For example, a [[Sorting_Algorithms#Static_Sorted_Array|static sorted array]] or a [[Binary_Search_Trees#SEARCH|binary search tree]] implement the search operation in O(log n). Also see: {{Internal|Binary Search|Binary Search}}


Queries:
==<span id='n43ysL'></span>Queries on Totally Ordered Sets==
<span id='Totally_Ordered_Set'></span>A '''totally ordered''' set, also known as a '''linearly ordered set''', is [[Relations#Total_Order|formally defined here]].


* <span id='SEARCH'></span>SEARCH(K) - a query that given a [[#Key|key]] value K, returns a pointer to an element with the given key, or NULL if there is no such element.
We usually measure the time taken the execute an operation in terms of the size of the set n. [[Algorithm_Complexity#Asymptotic_Notation|Asymptotic notation]] is used to describe the behavior of these operations, as a function of the set size.


If the set is '''totally ordered''', the following operations may be exposed:
If the set is '''totally ordered''', the following operations may be exposed:


* MINIMUM() - returns the pointer to the element with the smallest key.
===<span id='SELECT'></span><tt>SELECT</tt>(i<sup>th</sup> order statistics)===
* MAXIMUM() - returns the pointer to the element with the largest key.
Return the i<sup>th</sup> order statistics in the set. The i<sup>th</sup> order statistics of a set of n numbers is the i<sup>th</sup> smallest number in the set.  
* SUCCESSOR(X) - a query that, given an element X whose key is from a totally ordered set, returns a pointer to the next larger element, or NULL if X is the maximum element.
* PREDECESSOR(X) - a query that, given an element X whose key is from a totally ordered set, returns a pointer to the next smaller element, or NULL if X is the minimum element.


We usually measure the time taken the execute an operation in terms of the size of the set n. [[Algorithm_Complexity#Asymptotic_Notation|Asymptotic notation]] is used to describe the behavior of these operations, as a function of the set size.
For more details about the selection problem see:
{{Internal|Selection_Problem#Overview|Selection Problem}}


A '''totally ordered''' set is <font color=darkgray>TODO: [[Relations]]</font>.
===<span id='MINIMUM'></span><tt>MINIMUM()</tt>===
Return the pointer to the element with the smallest key. MINIMUM() is the particular case of the 0<sup>th</sup> order statistic in a sorted set, returned by [[#SELECT|SELECT()]].


A '''successor''' in a totally ordered set is <font color=darkgray>TODO: [[Relations]]</font>.
===<span id='MAXIMUM'></span><tt>MAXIMUM()</tt>===
Returns the pointer to the element with the largest key. MAXIMUM() is the particular case of the n-1<sup>th</sup> order statistic in a sorted set, returned by [[#SELECT|SELECT()]].


A '''predecesor''' in a totally ordered set is <font color=darkgray>TODO: [[Relations]]</font>.
===<span id='PREDECESSOR'></span><tt>PREDECESSOR(X)</tt>===
A query that, given an element X whose key is from a totally ordered set, returns a pointer to the next smaller element, or NULL if X is the minimum element.
===<span id='SUCCESSOR'></span><tt>SUCCESSOR(X)</tt>===
A query that, given an element X whose key is from a totally ordered set, returns a pointer to the next larger element, or NULL if X is the maximum element.
===<span id='RANK'></span><tt>RANK(X)</tt>===
How many keys stored in the data structure are less or equal than the given value.


=Arrays=
=Arrays=
An array is a fundamental data structure in computer science. Arrays are how data is physically stored in memory. An array consists in a contiguous set of memory locations. The elements are maintained in linear order, which is determined by the array index. Given the pointer to the first element of the array, accessing any element is a Θ(1) operation that consists in adding the index multiplied with the element size to the first element pointer to obtain the pointer of the desired element.


An array is a fundamental data structure in computer science, as it maps on how data is physically stored in memory. An array consists in a contiguous set of memory locations. The elements are maintained in linear order, which is determined by the array index. Given the pointer to the first element of the array, accessing any element is a Θ(1) operation that consists in adding the index multiplied with the element size to the first element pointer to obtain the pointer of the desired element.
<font color=DarkKhaki>
<font color=darkgray>
In Java, fast copies for primitive arrays with <code>System.arraycopy(source, sourcePosition, destination, destinationPosition, count)</code>. Also, process this: https://github.com/jwasham/coding-interview-university#arrays</font>
In Java, fast copies for primitive arrays with System.arraycopy(source, sourcePosition, destination, destinationPosition, count);
==Static Sorted Arrays==
</font>
{{Internal|Sorting_Algorithms#Static_Sorted_Array|Static Sorted Arrays}}


=<span id='Simple_Dynamic_Sets'></span>Dynamic Sets=
=<span id='Simple_Dynamic_Sets'></span>Dynamic Sets=
 
==<span id='Stacks'></span>Stack==
* <span id='Stack'></span><span id='Stacks'></span>[[Stack]]
{{Internal|Stack#Overview|Stack}}
* <span id='Queue'></span><span id='Queues'></span>[[Queue]]
==<span id='Queues'></span>Queue==
{{Internal|Queue#Overview|Queue}}
==Others==
* <span id='Deque'></span><span id='Deque'></span>[[Deque]]
* <span id='Deque'></span><span id='Deque'></span>[[Deque]]
* <span id='List'></span><span id='Lists'></span>[[List|Lists]] (Linked lists, doubly-linked lists, array lists, etc.)
* <span id='List'></span><span id='Lists'></span>[[List|Lists]] (Linked lists, doubly-linked lists, array lists, etc.)
* <span id='Graph'></span><span id='Graphs'></span>[[Graph|Graphs]]
==<span id='Graph'></span>Graphs==
* <span id='Tree'></span><span id='Trees'></span>[[Tree|Trees]]
{{Internal|Graphs#Overview|Graphs}}
* <span id='Hash_Table'></span>[[Hash Table]] (Hash Map)
==<span id='Tree'></span>Trees==
* <span id='Heap'></span>[[Heap#Overview|Heaps]]
{{Internal|Trees#Overview|Trees}}
===<span id='Balanced_Binary_Search_Tree'></span><span id='Balanced_Binary_Search_Trees'></span>Binary Search Trees===
{{Internal|Binary_Search_Trees#Overview|Binary Search Trees}}
===<span id='Heap'></span>Heaps===
{{Internal|Heap#Overview|Heaps}}
 
==<span id='Hash_Table'></span><span id='Hash_Map'></span>Hash Tables (Hash Maps)==
{{Internal|Hash Table|Hash Tables}}
==Bloom Filters==
{{Internal|Bloom Filters|Bloom Filters}}


=Other Data Structures=
=Other Data Structures=
 
==Union-Find==
* <span id='Caches'></span>[[Caches#Overview|Caches]]
{{Internal|Union-Find#Overview|Union-Find}}
* <span id='Logs'></span>[[Log#Overview|Logs]]
==<span id='Caches'></span>Cache==
{{Internal|Caches#Overview|Caches}}
==<span id='Logs'></span>Log==
{{Internal|Log#Overview|Logs}}

Latest revision as of 23:43, 9 November 2021

Internal

Overview

A data structure is an arrangement of data in computer's memory or external storage, designed to facilitate a specific way to access or modify the data, quickly and usefully. Data structures include arrays, linked lists, stacks, queues, binary trees, search trees, heaps, hash tables, bloom filters, union-find, etc.

Data structures are crucial ingredients in the design of fast algorithms. A data structure is responsible for organizing data in a way that supports fast queries. Different data structures support different types of operations and are suited for different kinds of tasks.

Data Structure Characteristics

A data structure can be described based on several characteristics:

  • space requirements.
  • what operations the data structure supports.
  • what is the running time of those operations.

In general, fewer operations a data structure supports, the faster the operations and smallest the space overhead required by the data structure will be.

Data Structures as Sets

All data structures used to represent data in the memory of a computer are essentially sets. Unlike a mathematical set, the sets manipulated by computers are finite, and their composition change in time, so they are called dynamic sets. We know from mathematics that the elements of a set are distinct. Data structure sets also consist of distinct elements, if we conceptualize the fact that the elements are maintained in distinct memory locations. In a typical implementation of a dynamic set, each element is represented by an object (memory location) whose attributes can be examined and manipulated if we have a pointer or a reference to the objects. Different objects have different references.

Some dynamic sets assume that one of the element's attributes is an identifying key. If the keys are all different, we can think of the dynamic set as being a set of key values. Some dynamic sets assume that the keys are drawn from a totally ordered set, such as the real numbers, or the set of all words under the usual alphabetic ordering. A total ordering allows defining a minimum element of the set, or to speak of the next element larger than the given element.

Augmenting a Data Structure

Augmenting a data structure means modifying the data structure to maintain additional information about the data structure itself. An example is maintaining the number of nodes contained by the subtree of a certain node in a binary search tree, which is useful to implement SELECT() and RANK() on binary search trees, or the "color" of a red-black tree. Care should be taken to update the augmented state in case of any operation that modifies the data structure.

Dictionary

A dynamic set that allows insertion of new elements, deletion of existing elements and membership testing (or search) for a specific element is called a dictionary. Insertion and deletion modify the set, and they are called modifying operations. Search does not modify the set and it belongs to the query category of operations. Depending on the contexts, the concept of dictionary may imply an underlying total ordered set and the associated queries on totally ordered sets.

Modifying Dictionary Operations

Assuming that X is a pointer, or a reference, not a key, a dictionary supports the following modifying operations:

INSERT(X)

Add the element pointed by X to the set.

DELETE(X)

Remove the element pointed by X.

Queries

SEARCH(K)

A query that given a key value K, returns a pointer to an element with the given key, or NULL if there is no such element. Different data structures implement search in different ways, resulting in different running times relative to the size of the set. For example, a static sorted array or a binary search tree implement the search operation in O(log n). Also see:

Binary Search

Queries on Totally Ordered Sets

A totally ordered set, also known as a linearly ordered set, is formally defined here.

We usually measure the time taken the execute an operation in terms of the size of the set n. Asymptotic notation is used to describe the behavior of these operations, as a function of the set size.

If the set is totally ordered, the following operations may be exposed:

SELECT(ith order statistics)

Return the ith order statistics in the set. The ith order statistics of a set of n numbers is the ith smallest number in the set.

For more details about the selection problem see:

Selection Problem

MINIMUM()

Return the pointer to the element with the smallest key. MINIMUM() is the particular case of the 0th order statistic in a sorted set, returned by SELECT().

MAXIMUM()

Returns the pointer to the element with the largest key. MAXIMUM() is the particular case of the n-1th order statistic in a sorted set, returned by SELECT().

PREDECESSOR(X)

A query that, given an element X whose key is from a totally ordered set, returns a pointer to the next smaller element, or NULL if X is the minimum element.

SUCCESSOR(X)

A query that, given an element X whose key is from a totally ordered set, returns a pointer to the next larger element, or NULL if X is the maximum element.

RANK(X)

How many keys stored in the data structure are less or equal than the given value.

Arrays

An array is a fundamental data structure in computer science. Arrays are how data is physically stored in memory. An array consists in a contiguous set of memory locations. The elements are maintained in linear order, which is determined by the array index. Given the pointer to the first element of the array, accessing any element is a Θ(1) operation that consists in adding the index multiplied with the element size to the first element pointer to obtain the pointer of the desired element.

In Java, fast copies for primitive arrays with System.arraycopy(source, sourcePosition, destination, destinationPosition, count). Also, process this: https://github.com/jwasham/coding-interview-university#arrays

Static Sorted Arrays

Static Sorted Arrays

Dynamic Sets

Stack

Stack

Queue

Queue

Others

  • Deque
  • Lists (Linked lists, doubly-linked lists, array lists, etc.)

Graphs

Graphs

Trees

Trees

Binary Search Trees

Binary Search Trees

Heaps

Heaps

Hash Tables (Hash Maps)

Hash Tables

Bloom Filters

Bloom Filters

Other Data Structures

Union-Find

Union-Find

Cache

Caches

Log

Logs