Algorithms
External
- Based on Introduction to Algorithms, 3rd edition by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein (CLRS)
Internal
Overview
A data structure is an arrangement of data in computer's memory or external storage, designed to facilitate a specific way to access or modify it. Data structures include arrays, linked lists, stacks, binary trees, hash tables, etc.
Algorithms manipulate the data in these structures in various ways. An algorithm is any well-defined computational procedure consisting in a sequence of steps, which takes some value or set of values, called input and produces a value, or a set of values, called output. An algorithm can be viewed as a tool for solving a well-specified computational problem. In this context, a specific set of input values provided to the algorithm is called an instance of the problem.
Algorithms should be considered a technology, the same as computer hardware or object-oriented programming. Total system performance depends on choosing efficient algorithms as much as on choosing fast hardware. Having a solid base of algorithmic knowledge and techniques is one of the factors that separates a skilled programmer from a novice.
One of the most important characteristics of an algorithm is its correctness. An algorithm is said to be correct if, for every input instance, it halts with the correct output. It is almost always desirable for an algorithm to be correct. However, in some cases, even incorrect algorithms are useful if we can control the error rate. An example of such algorithm is Miller-Rabin primality test. One of the techniques that can be used to demonstrate that an algorithm is correct is the loop invariant method.
Another characteristic of algorithms is efficiency. The obvious reason to analyze the efficiency of an algorithm is that the computing time, and the space in memory, are bounded resources and they must be utilized efficiently. The efficiency of an algorithm can be analyzed through formal methods and expressed using a special notation, called asymptotic notation. The asymptotic notation uses functions that bound the algorithm's running time from above and from below. To say that the running time is asymptotically bounded from above by a specific function, say n2, we use the "big-O" notation: O(n2). All these notions are expanded upon in the algorithm complexity section.
Algorithms can be coarsely categorized in iterative, or incremental, and recursive. A recursive algorithm solves a problem by calling themselves recursively one or more times to deal with closely related sub-problems. The running time of recursive algorithms can be estimated using recurrences. A common recursive technique is divide-and-conquer, which consists in three steps: 1) divide the problem into several sub-problems that are similar to the original problem but smaller in size, 2) conquer the sub-problems by solving them recursively; if the sub-problem size is small enough, it may be solved through iterative techniques and 3) combine the solutions to the sub-problems to create a solution to the original problem.
Sorting a sequence of numbers into nondecreasing order is a problem that arises frequent in practice. The class of algorithms that addresses this problem are the sorting algorithms. Sorting algorithms may perform key comparison or not. When analyzing sorting algorithms, characteristics such as whether the algorithm is in place or whether the algorithm is stable may be discussed. Examples of sorting algorithms are insertion sort, merge sort.
Algorithms whose behavior is determined not only by input, but also by the values produced by a random-number generator that are called randomized algorithms. A randomized algorithm implies an inherent probability distribution for one or more variable, so the running time of such an algorithm may differ on different inputs on the same size. Probabilistic analysis is used to analyze running time of randomized algorithms.
Number-theoretic algorithms are important due in large part to the invention of cryptographic schemes based on large prime numbers. Algorithms in this category are used to generate large prime numbers. Some of these algorithms, for example Miller-Rabin primality test algorithm, are not entirely correct, the sense that there is a very small chance of error, but the chance of error is so small that is considered acceptable.
Almost all the algorithms mentioned so far have been polynomial-time algorithms, which is to say that on an input of size n, their worst running time is O(nk) for some constant k. Generally, we think of a problem that is solvable by a polynomial-time algorithm as tractable or easy. A problem that requires super-polynomial time is designated intractable or hard. There are also problems whose status is unknown: no polynomial-time algorithm has been yet discovered for them, nor has anyone yet been able to prove that no polynomial-time algorithm can exist for any of them. This class of problems is called NP-complete problems. The set of NP-complete problems has the property that if an efficient algorithm exists for any one of them, then efficient algorithms exist for all of them. There are methods to show that a problem is NP-complete, and if that is the case, an approximation algorithm instead of a polynomial-time algorithm, can be developed form it.
Multicore processors require algorithms designed with parallelism in mind. These are the multithreaded algorithms.
- Associative Array
- Lists (single linked and double linked) Mathematical List
- Stack
- Queue. Difference between list and queue.
- Algorithm complexity, Bounds. Understand and document O, Omega and Theta notations. Perform complexity analysis on all the algorithms I examine. Expected running time, worst-case running time. Average case running time.
- Set
- Map
- Hash Map
- Distributed Hash Map. Insist on this as is key to systems that scale.
- Collision resistant hash function.
- https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo
- Consistent Hashing
- Tree
- Binary Trees. Binary tree height.
- Sorting trees. Difference between a binary search tree and a red-black tree. Red-Black Tree
- BTrees
- Binary Trees. Binary tree height.
- Tree walking algorithms. Difference between depth first and breadth first.
- Graphs
- Directed Acyclic Graph
- Graph algorithms
- Dynamic programming and memoization.
- Greedy algorithms.
- Matrix multiplication (NOKB and code).
- Random variable analysis. Indicator random variable.
- Probability Distribution.
- Consider starting upside down, with the bottom (math) sections, and NOKB concepts from them first.
- Mathematics: Understand and document induction.
- Need to understand aggregate analysis Section 17.1