Data Structures and Algorithms
- Based on Introduction to Algorithms, 3rd edition by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein (CLRS)
- Dynamic programming and memoization.
- Greedy algorithms.
- Matrix multiplication (NOKB and code).
- Random variable analysis. Indicator random variable.
- Probability Distribution.
- Consider starting upside down, with the bottom (math) sections, and NOKB concepts from them first.
- Mathematics: Understand and document induction.
- Need to understand aggregate analysis Section 17.1
Data StructuresA data structure is an arrangement of data in computer's memory or external storage, designed to facilitate a specific way to access or modify it. Data structures include arrays, linked lists, stacks, binary trees, hash tables, heaps, etc. A top-down exploration of various data structures used by computers is available in:
An algorithm is any well-defined computational procedure consisting in a sequence of steps, which takes some value or set of values, called input and produces a value, or a set of values, called output. An algorithm can be viewed as a tool for solving a well-specified computational problem. In this context, a specific set of input values provided to the algorithm is called an instance of the problem. Algorithms manipulate data structures in various ways.
Algorithms should be considered a technology, the same as computer hardware or object-oriented programming. Total system performance depends on choosing efficient algorithms as much as on choosing fast hardware. Having a solid base of algorithmic knowledge and techniques is one of the factors that separates a skilled programmer from a novice.
One of the most important characteristics of an algorithm is its correctness. An algorithm is said to be correct if, for every input instance, it halts with the correct output. It is almost always desirable for an algorithm to be correct. However, in some cases, even incorrect algorithms are useful if we can control the error rate. An example of such algorithm is Miller-Rabin primality test. One of the techniques that can be used to demonstrate that an algorithm is correct is the loop invariant method.
Algorithm EfficiencyAnother characteristic of algorithms is efficiency. The obvious reason to analyze the efficiency of an algorithm is that the computing time and the space in memory, are bounded resources and they must be utilized efficiently. The efficiency of an algorithm can be analyzed through formal methods and expressed using a special notation, called asymptotic notation. The asymptotic notation uses functions that bound the algorithm's running time from above and from below. To say that the running time is asymptotically bounded from above by a specific function, say n2, we use the "big-O" notation: O(n2). All these notions are expanded upon in the algorithm complexity section.
Iterative vs. Recursive Algorithms
Algorithms can be coarsely categorized in iterative, or incremental, and recursive.
A recursive algorithm solves a problem by calling themselves recursively one or more times to deal with closely related sub-problems. The running time of recursive algorithms can be estimated using recurrences, or recurrence equations . A common recursive technique is divide-and-conquer, which consists in three steps, applied at each level of the recursion:
- Divide the problem into several sub-problems that are similar to the original problem but smaller in size.
- Conquer the sub-problems by solving them recursively. If the sub-problem size is small enough, it may be solved through iterative techniques, or it may be a trivial case that does not need any work. When we reach this stage, we say that the recursion "bottomed out" and we have gotten down to the base case.
- Combine the solutions to the sub-problems to create a solution to the original problem. Merge sort is an example of sorting algorithm that uses the divide-and-conquer strategy.
Sorting a sequence of numbers into nondecreasing order is a problem that arises frequent in practice. The class of algorithms that addresses this problem are the sorting algorithms. Sorting algorithms may perform key comparison or not. When analyzing sorting algorithms, characteristics such as whether the algorithm is in place or whether the algorithm is stable may be discussed. Examples of sorting algorithms are insertion sort, merge sort.
The Selection Problem
The ith order statistic problems require selecting ith smallest element of a set. Finding the median is a particular case of a ith order statistic problem. These problems are known as the selection problem. They can be resolved generically by sorting the entire set and then selecting the desired element. However, key comparison sorting cannot be done more efficiently than Ω(n lgn), and more specialized and faster algorithms for the selection problem exist.
Algorithms whose behavior is determined not only by input, but also by the values produced by a random-number generator that are called randomized algorithms. A randomized algorithm implies an inherent probability distribution for one or more variable, so the running time of such an algorithm may differ on different inputs on the same size. Probabilistic analysis is used to analyze running time of randomized algorithms.
Number-theoretic algorithms are important due in large part to the invention of cryptographic schemes based on large prime numbers. Algorithms in this category are used to generate large prime numbers. Some of these algorithms, for example Miller-Rabin primality test algorithm, are not entirely correct, the sense that there is a very small chance of error, but the chance of error is so small that is considered acceptable.
Almost all the algorithms mentioned so far have been polynomial-time algorithms, which is to say that on an input of size n, their worst running time is O(nk) for some constant k. Generally, we think of a problem that is solvable by a polynomial-time algorithm as tractable or easy. A problem that requires super-polynomial time is designated intractable or hard. There are also problems whose status is unknown: no polynomial-time algorithm has been yet discovered for them, nor has anyone yet been able to prove that no polynomial-time algorithm can exist for any of them. This class of problems is called NP-complete problems. The set of NP-complete problems has the property that if an efficient algorithm exists for any one of them, then efficient algorithms exist for all of them. There are methods to show that a problem is NP-complete, and if that is the case, an approximation algorithm instead of a polynomial-time algorithm, can be developed form it.
Multicore processors require algorithms designed with parallelism in mind. These are the multithreaded algorithms.