Data Structures
Internal
Overview
A data structure is an arrangement of data in computer's memory or external storage, designed to facilitate a specific way to access or modify it. Data structures include arrays, linked lists, stacks, binary trees, hash tables, heaps, etc.
All data structures used to represent data in the memory of a computer are at essentially sets. Unlike a mathematical set, the sets manipulated by computers are finite, and their composition change in time, so they are called dynamic. We know from mathematics that the elements of a set are distinct. Data structure sets also consist of distinct elements, if we conceptualize the fact that the elements are maintained in distinct memory locations. In a typical implementation of a dynamic set, each element is represented by an object (memory location) whose attributes can be examined and manipulated if we have a pointer or a reference to the objects. Different objects have different references.
A set element, and consequently all its attributes can be accessed if we have a reference to it. Some dynamic sets assume that one of the element's attributes is an identifying key. If the keys are all different, we can think of the dynamic set as being a set of key values. Some dynamic sets assume that the keys are drawn from a totally ordered set, such as the real numbers, or the set of all worlds under the usual alphabetic ordering. A total ordering allows defining a minimum element of the set, or to speak of the next element larger than the given element.
A dynamic set that allows insertion of new elements, deletion of existing elements and membership testing (or search) for a specific element is called a dictionary. Of these, the insertion and deletion modifies the set, and they are called modifying operations, whereas the search does not. As such, the search operation belongs to the query category of operations.
Modifying dictionary operations:
- INSERT(X) - adds the element pointed by X to the set.
- DELETE(X) - removes the element pointed by X (X is a pointer, or a reference, not a key).
Queries:
- SEARCH(K) - a query that given a key value K, returns a pointer to an element with the given key, or NULL if there is no such element.
If the set is totally ordered, the following operations may be exposed:
- MINIMUM() - returns the pointer to the element with the smallest key.
- MAXIMUM() - returns the pointer to the element with the largest key.
- SUCCESSOR(X) - a query that, given an element X whose key is from a totally ordered set, returns a pointer to the next larger element, or NULL if X is the maximum element.
- PREDECESSOR(X) - a query that, given an element X whose key is from a totally ordered set, returns a pointer to the next smaller element, or NULL if X is the minimum element.
A totally ordered set, also known as linearly ordered set is defined here.
A successor in a totally ordered set is TODO: Relations.
A predecesor in a totally ordered set is TODO: Relations.
We usually measure the time taken the execute an operation in terms of the size of the set n. Asymptotic notation is used to describe the behavior of these operations, as a function of the set size.
Arrays
An array is a fundamental data structure in computer science, as it maps on how data is physically stored in memory. An array consists in a contiguous set of memory locations. The elements are maintained in linear order, which is determined by the array index. Given the pointer to the first element of the array, accessing any element is a Θ(1) operation that consists in adding the index multiplied with the element size to the first element pointer to obtain the pointer of the desired element. In Java, fast copies for primitive arrays with System.arraycopy(source, sourcePosition, destination, destinationPosition, count);
Dynamic Sets
- Stack
- Queue
- Deque
- Lists (Linked lists, doubly-linked lists, array lists, etc.)
- Graphs
- Trees
- Hash Table (Hash Map)
- Heaps