Algorithms: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(191 intermediate revisions by the same user not shown)
Line 8: Line 8:


=<span id='Algorithm'></span><span id='Algorithms'></span><span id='Computational_Problem'></span><span id='Instance_of_the_problem'>Algorithm Definition=
=<span id='Algorithm'></span><span id='Algorithms'></span><span id='Computational_Problem'></span><span id='Instance_of_the_problem'>Algorithm Definition=
An '''algorithm''' is any well-defined computational procedure consisting in a sequence of steps, which takes some value or set of values, called '''input''' and produces a value, or a set of values, called '''output'''.
{{Note|An '''algorithm''' is any well-defined computational procedure consisting in a sequence of steps, which takes some value or set of values, called '''input''' and produces a value, or a set of values, called '''output'''. The algorithm solves a well-specified '''computational problem'''.}}
 
In this context, a specific set of input values provided to the algorithm is called an '''instance of the problem'''.  Algorithms manipulate [[Data_Structures#Overview|data structures]] in various ways.  
Another definition is a set of well defined rules - a tool - for solving a well-specified '''computational problem'''. In this context, a specific set of input values provided to the algorithm is called an '''instance of the problem'''.  Algorithms manipulate [[#Data_Structures|data structures]] in various ways.  


Algorithms should be considered a '''technology''', the same as computer hardware or object-oriented programming. Total system performance depends on choosing efficient algorithms as much as on choosing fast hardware. Having a solid base of algorithmic knowledge and techniques is one of the factors that separates a skilled programmer from a novice.
Algorithms should be considered a '''technology''', the same as computer hardware or object-oriented programming. Total system performance depends on choosing efficient algorithms as much as on choosing fast hardware. Having a solid base of algorithmic knowledge and techniques is one of the factors that separates a skilled programmer from a novice.


=Algorithm Correctness=
An algorithm should be [[#Algorithm_Correctness|correct]], in that it should produce the correct solutions of the computational problem. The algorithm correctness can be formally demonstrated using various mathematical tools and techniques. Additionally, an algorithm should be usable practically, in that it should complete within a finite amount of time, faster the better, and should use a finite amount of computational resources. The completion time and resource requirements are analyzed as part of [[#Algorithm_Efficiency|algorithm efficiency]] analysis. A good predictor of how much time and resources an algorithm needs is provided by its [[Algorithms#Algorithm_Complexity|complexity]].


One of the most important characteristics of an algorithm is its '''correctness'''. An algorithm is said to be '''correct''' if, for every input instance, it halts with the correct output. It is almost always desirable for an algorithm to be correct. However, in some cases, even incorrect algorithms are useful if we can control the error rate. An example of such algorithm is [[#Number-Theoretic_Algorithms|Miller-Rabin primality test]]. <span id='Loop_Invariant'></span>One of the techniques that can be used to demonstrate that an algorithm is correct is the '''[[Loop_Invariant#Overview|loop invariant]]''' method.
==Algorithm Correctness==
One of the most important characteristics of an algorithm is its '''correctness'''. An algorithm is said to be '''correct''' if, for every input instance, it halts with the correct output. It is almost always desirable for an algorithm to be correct. However, in some cases, even incorrect algorithms are useful if we can control the error rate. An example of such algorithm is [[Miller-Rabin Primality Test Algorithm|Miller-Rabin primality test]]. <span id='Loop_Invariant'></span>One of the techniques that can be used to demonstrate that an algorithm is correct is the [[Loop_Invariant#Overview|loop invariant]] method.


=Algorithm Efficiency=
==Algorithm Efficiency==
Another characteristic of algorithms is '''efficiency'''. The obvious reason to analyze the efficiency of an algorithm is that the computing time and the space in memory, are bounded resources and they must be utilized efficiently.  
Another characteristic of algorithms is '''efficiency'''. The obvious reason to analyze the efficiency of an algorithm is that the computing time and the space in memory, are bounded resources and they must be utilized efficiently. [[#Algorithm_Complexity|Algorithm complexity]] analysis provides good predictions of an algorithm's efficiency.
==Algorithm Complexity==
===Algorithm Complexity===
The efficiency of an algorithm can be analyzed through formal methods and expressed using a special notation, called '''asymptotic notation'''. The asymptotic notation uses functions that bound the algorithm's running time from above and from below. To say that the running time is asymptotically bounded from above by a specific function, say n<sup>2</sup>, we use the "big-O" notation: O(n<sup>2</sup>). All these notions are expanded upon in the <span id='Algorithm_Complexity'></span>'''[[Algorithm Complexity|algorithm complexity]]''' section. <span id='v7K9ly'></span>{{Internal|Algorithm Complexity#Overview|Algorithm Complexity}}
The efficiency of an algorithm can be analyzed through formal methods and expressed using a special notation called [[Algorithm_Complexity#Asymptotic_Notation|asymptotic notation]]. The asymptotic notation uses functions that bound the algorithm's running time from above and from below. To say that the running time is asymptotically bounded from above by a specific function, say n<sup>2</sup>, we use the "big-O" notation: O(n<sup>2</sup>). For more details see:<span id='v7K9ly'></span>{{Internal|Algorithm Complexity#Overview|Algorithm Complexity}}


=Algorithm Paradigms=
Individual algorithms may belong to a number of paradigms, which are general techniques that apply to different problems from different domains. Example of such paradigms are [[#Divide_and_Conquer|divide and conquer]], [[#Randomized_Algorithms|randomized algorithms]], [[#Greedy_Algorithms|greedy algorithms]] and [[#Dynamic_Programming_Algorithms|dynamic programming]].
=Iterative vs. Recursive Algorithms=
=Iterative vs. Recursive Algorithms=
Algorithms can be coarsely categorized in [[#Iterative_Algorithm|iterative]] and [[#Recursive_Algorithms|recursive]].


Algorithms can be coarsely categorized in <span id='Iterative_Algorithm'></span>'''iterative''', or '''incremental''', and '''recursive'''.
==<span id='Iterative_Algorithm'></span>Iterative Algorithms==
Iterative algorithms are also called "incremental".


==Recursive Algorithms==
==Recursive Algorithms==
 
<span id='Recursive_Algorithm'></span>A '''recursive''' algorithm solves a problem by calling itself recursively one or more times to deal with closely related sub-problems, usually of smaller size. A special class of recursive algorithms are the "[[Recursive_Algorithms#Divide_and_Conquer|divide-and-conquer]]" algorithms, but not all recursive algorithms belong to this class.
<span id='Recursive_Algorithm'></span>A '''recursive''' algorithm solves a problem by calling themselves recursively one or more times to deal with closely related sub-problems. The running time of recursive algorithms can be estimated using <span id='Recurrence'></span>[[Algorithm_Complexity#Recurrence|recurrences]], or [[Algorithm_Complexity#Recurrence_Equation|recurrence equations]]. A recursive algorithm requires a '''base case''', where the input is sufficiently small that the output can be immediately computed, rather than recursing further, as expanded in the [[#Divide_and_Conquer|Divide and Conquer]] section below.
{{Internal|Recursive_Algorithms#Overview|Recursive Algorithms}}
===Divide and Conquer===
A common recursive technique is '''divide-and-conquer''', which consists in three steps, applied at each level of the recursion:
# '''Divide''' the problem into several sub-problems that are similar to the original problem but smaller in size.
# '''Conquer''' the sub-problems by solving them recursively. If the sub-problem size is small enough, it may be solved through iterative techniques, or it may be a trivial case that does not need any work. When we reach this stage, we say that the recursion "bottomed out" and we have gotten down to the '''base case'''.  
# '''Combine''' the solutions to the sub-problems to create a solution to the original problem. [[Merge_Sort#Overview|Merge sort]] is an example of sorting algorithm that uses the divide-and-conquer strategy.
 
Examples of algorithms that use divide-and-conquer method are:
* <span id='q23wLp'></span>[[Karatsuba Multiplication|Karatsuba multiplication]], which multiplies two n-digit integers in a time better than n<sup>2</sup>
* [[Merge_Sort#Overview|Merge sort]]
* <span id='b3slp3'></span>[[Inversions in an Array|Finding the number of inversions in an array]]
* [[Strassen's Algorithm for Matrix Multiplication|Strassen's algorithm for matrix multiplication]]
* [[Maximum Subarray Problem|Maximum subarray problem]]
* [[Closest Pair Problem|Closest pair problem]]
 
<font color=darkkhaki>Are there any kind of other recursive algorithms than divide and conquer?</font>


=Randomized Algorithms=
=Randomized Algorithms=


Algorithms whose behavior is determined not only by input, but also by the values produced by a random-number generator that are called <span id='Randomized_Algorithm'>'''randomized algorithms'''. A randomized algorithm implies an inherent probability distribution for one or more variable, so the running time of such an algorithm may differ on different inputs on the same size. '''[[Randomized_Algorithms#Probabilistic_Analysis|Probabilistic analysis]]''' is used to analyze running time of randomized algorithms. The canonical example of a randomized algorithm is randomized [[Quicksort#Overview|QuickSort]], primality testing, graph partitioning, hashing. See: {{Internal|Randomized Algorithms#Overview|Randomized Algorithms}}
Algorithms whose behavior is determined not only by input, but also by the values produced by a random-number generator, invoked in the algorithm code, are called <span id='Randomized_Algorithm'>'''randomized algorithms'''. Randomized algorithms are relatively common [[#Algorithm_Paradigms|algorithm design paradigm]], which often leads to simpler, more practical or more elegant algorithms. A randomized algorithm implies an inherent probability distribution for one or more variable, so the running time of such an algorithm may differ on different inputs on the same size. [[Randomized_Algorithms#Probabilistic_Analysis|Probabilistic analysis]] is used to analyze running time of randomized algorithms. The canonical example of a randomized algorithm is randomized [[Quicksort#Overview|QuickSort]], primality testing, graph partitioning, hashing. See: {{Internal|Randomized Algorithms#Overview|Randomized Algorithms}}


=Sorting=
=Sorting=


<span id='Sorting_Algorithms'></span>'''Sorting''' a sequence of numbers into nondecreasing order is a problem that arises frequent in practice. The class of algorithms that addresses this problem are the '''[[Sorting Algorithms#Overview|sorting algorithms]]'''.  Sorting algorithms may perform '''[[Sorting Algorithms#Comparison_Sort|key comparison]]''' or '''[[Sorting_Algorithms#Non-Comparison_Sort|not]]'''.  When analyzing sorting algorithms, characteristics such as whether the algorithm is '''[[Sorting_Algorithms#In-place|in place]]''' or whether the algorithm is '''[[Sorting_Algorithms#Stability|stable]]''' may be discussed. Examples of sorting algorithms are '''[[Insertion Sort#Overview|insertion sort]]''', '''[[Merge Sort#Overview|merge sort]]'''.<span id='b6YU43'></span>
<span id='Sorting_Algorithms'></span>'''Sorting''' a sequence of numbers into nondecreasing order is a problem that arises frequent in practice. The class of algorithms that addresses this problem are the [[Sorting Algorithms#Overview|sorting algorithms]].  Sorting algorithms may perform [[Sorting Algorithms#Comparison_Sort|key comparison]] or [[Sorting_Algorithms#Non-Comparison_Sort|not]].  When analyzing sorting algorithms, characteristics such as whether the algorithm is [[Sorting_Algorithms#In-place|in place]]' or whether the algorithm is [[Sorting_Algorithms#Stability|stable]] may be discussed. Examples of sorting algorithms are [[Insertion Sort#Overview|insertion sort]], [[Merge Sort#Overview|merge sort]].<span id='b6YU43'></span>


{{Internal|Sorting Algorithms#Overview|Sorting Algorithms}}
{{Internal|Sorting Algorithms#Overview|Sorting Algorithms}}
==Partitioning==
{{Internal|The Partition Subroutine|The Partition Subroutine}}


=The Selection Problem=
=The Selection Problem=


<span id='Selection_Problem'></span>The i<sup>th</sup> order statistic problems require selecting  i<sup>th</sup> smallest element of a set. Finding the median is a particular case of a  i<sup>th</sup> order statistic problem. These problems are known as the '''[[Selection Problem#Overview|selection problem]]'''. They can be resolved generically by sorting the entire set and then selecting the desired element. However, key comparison sorting [[Comparison_Sorting_Algorithms_Complexity|cannot be done more efficiently than Ω(n lgn)]], and more specialized and faster algorithms for the selection problem [[Selection Problem#Overview|exist]].
<span id='Selection_Problem'></span>The i<sup>th</sup> order statistic problems require selecting  i<sup>th</sup> smallest element of a set. Finding the median is a particular case of a  i<sup>th</sup> order statistic problem. These problems are known as the [[Selection Problem#Overview|selection problem]]. They can be resolved generically by sorting the entire set and then selecting the desired element. However, key comparison sorting [[Comparison_Sorting_Algorithms_Complexity|cannot be done more efficiently than Ω(n lg n)]], and more specialized and faster algorithms O(n) for the selection problem [[Selection Problem#Overview|exist]].
{{Internal|Selection Problem|Selection Problem}}
Also see: {{Internal|The Median Maintenance Problem|The Median Maintenance Problem}}
 
=Graph Algorithms=
{{Internal|Graphs#Graph_Algorithms|Graph Algorithms}}
==Tree Algorithms==
{{Internal|Trees#Tree_Algorithms|Tree Algorithms}}
 
=Greedy Algorithms=
Greedy algorithms use a design paradigm that involves making at each step of the algorithm a myopic decision, doing something that seems to be a good idea at the time. The decision made by a greedy algorithm at a certain step is '''irrevocable''', it cannot be changed later as the algorithm progresses. Interestingly enough, this approach works out just fine sometimes, greedy local decisions lead some times to an overall optimal solution, but this is not a given. The correctness of a greedy algorithm must be formally proven to ensure that the algorithm is any good.
 
A strength and a weakness of the greedy algorithm paradigm is just how easy it is to apply. It is often quite easy to come up with plausible greedy algorithms for a problem, even multiple, different plausible greedy algorithms. This is a point of contrast with the [[Algorithms#Divide_and_Conquer|divid-and-conquer algorithms]], which are non-trivial to come up with.


=Number-Theoretic Algorithms=
<span id='Greedy_Score'></span><span id='Greedy_Criterion'></span>A general approach to designing greedy algorithms has two steps. The first step involves looking at a few particular cases of the problem, where is reasonably intuitive what should be the optimal thing to do. While doing so attempt to come up with a single '''greedy score''' (or criterion) that aggregates various parameters of the individual elements of the problem. Then attempt to use the score and prove that it optimizes the <span id='Objective_Function'></span>'''objective function''' we are trying to compute in the algorithm. The weakness of greedy algorithm design, however, is that is really easy to come up with alternative "obvious" scores that might do the wrong thing. It is not obvious that a greedy algorithm that uses a particular score always minimizes the objective function. The correctness must be proven formally.
 
Greedy algorithms are usually easy to analyze from a running time perspective. However, they are generally much harder to be proven correct, it takes a lot of creativity and has a bit of an ad hoc flavor. Greedy algorithms can be proven correct by [[Mathematical_Induction|induction]], as in the case of [[Dijkstra's Shortest-Path Algorithm#Overview|Dijkstra's shortest-path algorithm]]. Some text books call this method "greedy stays ahead", meaning that you prove that the greedy algorithm does the right thing iteration by iteration. <span id='Exchange_Argument'></span>The second method of proving greedy algorithms is by '''exchange argument'''.
 
{{Warn|⚠️ Most greedy algorithms are NOT correct, unless they are formally proven correct.}}
 
Examples of greedy algorithms and computational problems greedy algorithms are useful for:
* The [[The Optimal Caching Problem|optimal caching problem]]
* The [[The Job Scheduling Problem|job scheduling problem]]
* Graph algorithms:
** [[Dijkstra's Shortest-Path Algorithm#Overview|Dijkstra's shortest-path algorithm]]
** The [[The Minimum Spanning Tree Problem|minimum spanning tree problem]]:
*** [[Prim's Algorithm]]
*** [[Kruskal's Algorithm]]
* The [[Clustering_Concepts#The_Clustering_Problem|clustering problem]]. Also see [[#Clustering_Algorithms|clustering algorithms]] below.
* Compression. Greedy algorithms can be used to compute [[Binary_Codes#The_Optimal_Variable-Length_Binary_Code_Problem|optimal variable-length binary codes]], and the canonical example is the computation of the Huffman codes with the [[Binary_Codes#The_Huffman_Algorithm|Huffman algorithm]].


'''Number-theoretic algorithms''' are important due in large part to the invention of cryptographic schemes based on large prime numbers. Algorithms in this category are used to generate large prime numbers. Some of these algorithms, for example '''[[Miller-Rabin Primality Test Algorithm|Miller-Rabin primality test algorithm]]''', are not entirely correct, the sense that there is a very small chance of error, but the chance of error is so small that is considered acceptable.
==Clustering Algorithms==
Clustering algorithms aim to identify the best clustering of a set of objects according to some objective function, which is to say they aim to solve the [[Clustering_Concepts#The_Clustering_Problem|clustering problem]]. Some of the most well known clustering algorithms are [[#Greedy_Algorithms|greedy]].
{{Internal|Clustering#Overview|Clustering}}


=NP Completeness=
=Dynamic Programming Algorithms=
Dynamic programming algorithms are optimized recursive algorithms, where we store the solution of smaller subproblems and use them in computing the solution for larger subproblems, avoiding duplicate work and yielding superior running times. The classic example where the difference between the straightforward recursive solution and the corresponding dynamic programming solution is obvious is computing [[Fibonacci_Numbers|Fibonacci numbers]].
{{Internal|Dynamic Programming#Overview|Dynamic Programming}}


<span id='NP-complete_Problems'></span>Almost all the algorithms mentioned so far have been '''polynomial-time algorithms''', which is to say that on an input of size n, their worst running time is O(n<sup>k</sup>) for some constant k. Generally, we think of a problem that is solvable by a polynomial-time algorithm as '''tractable''' or '''easy'''. A problem that requires super-polynomial time is designated '''intractable''' or '''hard'''. There are also problems whose status is unknown: no polynomial-time algorithm has been yet discovered for them, nor has anyone yet been able to prove that no polynomial-time algorithm can exist for any of them. This class of problems is called '''[[NP Completeness#Overview|NP-complete problems]]'''. The set of NP-complete problems has the property that if an efficient algorithm exists for any one of them, then efficient algorithms exist for all of them. There are methods to show that a problem is NP-complete, and if that is the case, an '''approximation algorithm''' instead of a polynomial-time algorithm, can be developed form it.
=Number-Theoretic Algorithms=


'''Number-theoretic algorithms''' are important due in large part to the invention of cryptographic schemes based on large prime numbers. Algorithms in this category are used to generate large prime numbers. Some of these algorithms, for example [[Miller-Rabin Primality Test Algorithm|Miller-Rabin primality test algorithm]], are not entirely correct, the sense that there is a very small chance of error, but the chance of error is so small that is considered acceptable.
=<span id='NP-complete_Problems'></span>NP Completeness=
{{Internal|NP Completeness#Overview|NP Completeness}}
{{Internal|NP Completeness#Overview|NP Completeness}}


=Multithreaded Algorithms=
=Multithreaded Algorithms=


Multicore processors require algorithms designed with parallelism in mind. These are the '''[[Multithreaded Algorithms#Overview|multithreaded algorithms]]'''.
Multicore processors require algorithms designed with parallelism in mind. These are the multithreaded algorithms:


{{Internal|Multithreaded Algorithms#Overview|Multithreaded Algorithms}}
{{Internal|Multithreaded Algorithms#Overview|Multithreaded Algorithms}}
=TODO=
 
<font color=darkkhaki>
=Optimization Algorithms=
* Dynamic programming and memoization.
{{Internal|Optimization Algorithms|Optimization Algorithms}}
* Greedy algorithms.
 
* Matrix multiplication (NOKB and code).
=Organizatorium=
* Random variable analysis. Indicator random variable.
* CS261 Stanford: https://www.youtube.com/playlist?list=PLEGCF-WLh2RJh2yDxlJJjnKswWdoO8gAc
* Probability Distribution.
* Consider starting upside down, with the bottom (math) sections, and NOKB concepts from them first.
* [[Mathematics]]: Understand and document induction.
* Need to understand aggregate analysis Section 17.1
</font>
<center>&#91;[[Software_Engineering#Algorithms|Next]]]</center>

Latest revision as of 16:47, 6 October 2023

External

Internal

Algorithm Definition


An algorithm is any well-defined computational procedure consisting in a sequence of steps, which takes some value or set of values, called input and produces a value, or a set of values, called output. The algorithm solves a well-specified computational problem.

In this context, a specific set of input values provided to the algorithm is called an instance of the problem. Algorithms manipulate data structures in various ways.

Algorithms should be considered a technology, the same as computer hardware or object-oriented programming. Total system performance depends on choosing efficient algorithms as much as on choosing fast hardware. Having a solid base of algorithmic knowledge and techniques is one of the factors that separates a skilled programmer from a novice.

An algorithm should be correct, in that it should produce the correct solutions of the computational problem. The algorithm correctness can be formally demonstrated using various mathematical tools and techniques. Additionally, an algorithm should be usable practically, in that it should complete within a finite amount of time, faster the better, and should use a finite amount of computational resources. The completion time and resource requirements are analyzed as part of algorithm efficiency analysis. A good predictor of how much time and resources an algorithm needs is provided by its complexity.

Algorithm Correctness

One of the most important characteristics of an algorithm is its correctness. An algorithm is said to be correct if, for every input instance, it halts with the correct output. It is almost always desirable for an algorithm to be correct. However, in some cases, even incorrect algorithms are useful if we can control the error rate. An example of such algorithm is Miller-Rabin primality test. One of the techniques that can be used to demonstrate that an algorithm is correct is the loop invariant method.

Algorithm Efficiency

Another characteristic of algorithms is efficiency. The obvious reason to analyze the efficiency of an algorithm is that the computing time and the space in memory, are bounded resources and they must be utilized efficiently. Algorithm complexity analysis provides good predictions of an algorithm's efficiency.

Algorithm Complexity

The efficiency of an algorithm can be analyzed through formal methods and expressed using a special notation called asymptotic notation. The asymptotic notation uses functions that bound the algorithm's running time from above and from below. To say that the running time is asymptotically bounded from above by a specific function, say n2, we use the "big-O" notation: O(n2). For more details see:

Algorithm Complexity

Algorithm Paradigms

Individual algorithms may belong to a number of paradigms, which are general techniques that apply to different problems from different domains. Example of such paradigms are divide and conquer, randomized algorithms, greedy algorithms and dynamic programming.

Iterative vs. Recursive Algorithms

Algorithms can be coarsely categorized in iterative and recursive.

Iterative Algorithms

Iterative algorithms are also called "incremental".

Recursive Algorithms

A recursive algorithm solves a problem by calling itself recursively one or more times to deal with closely related sub-problems, usually of smaller size. A special class of recursive algorithms are the "divide-and-conquer" algorithms, but not all recursive algorithms belong to this class.

Recursive Algorithms

Randomized Algorithms

Algorithms whose behavior is determined not only by input, but also by the values produced by a random-number generator, invoked in the algorithm code, are called

randomized algorithms. Randomized algorithms are relatively common algorithm design paradigm, which often leads to simpler, more practical or more elegant algorithms. A randomized algorithm implies an inherent probability distribution for one or more variable, so the running time of such an algorithm may differ on different inputs on the same size. Probabilistic analysis is used to analyze running time of randomized algorithms. The canonical example of a randomized algorithm is randomized QuickSort, primality testing, graph partitioning, hashing. See:
Randomized Algorithms

Sorting

Sorting a sequence of numbers into nondecreasing order is a problem that arises frequent in practice. The class of algorithms that addresses this problem are the sorting algorithms. Sorting algorithms may perform key comparison or not. When analyzing sorting algorithms, characteristics such as whether the algorithm is in place' or whether the algorithm is stable may be discussed. Examples of sorting algorithms are insertion sort, merge sort.

Sorting Algorithms

Partitioning

The Partition Subroutine

The Selection Problem

The ith order statistic problems require selecting ith smallest element of a set. Finding the median is a particular case of a ith order statistic problem. These problems are known as the selection problem. They can be resolved generically by sorting the entire set and then selecting the desired element. However, key comparison sorting cannot be done more efficiently than Ω(n lg n), and more specialized and faster algorithms O(n) for the selection problem exist.

Selection Problem
Also see:
The Median Maintenance Problem

Graph Algorithms

Graph Algorithms

Tree Algorithms

Tree Algorithms

Greedy Algorithms

Greedy algorithms use a design paradigm that involves making at each step of the algorithm a myopic decision, doing something that seems to be a good idea at the time. The decision made by a greedy algorithm at a certain step is irrevocable, it cannot be changed later as the algorithm progresses. Interestingly enough, this approach works out just fine sometimes, greedy local decisions lead some times to an overall optimal solution, but this is not a given. The correctness of a greedy algorithm must be formally proven to ensure that the algorithm is any good.

A strength and a weakness of the greedy algorithm paradigm is just how easy it is to apply. It is often quite easy to come up with plausible greedy algorithms for a problem, even multiple, different plausible greedy algorithms. This is a point of contrast with the divid-and-conquer algorithms, which are non-trivial to come up with.

A general approach to designing greedy algorithms has two steps. The first step involves looking at a few particular cases of the problem, where is reasonably intuitive what should be the optimal thing to do. While doing so attempt to come up with a single greedy score (or criterion) that aggregates various parameters of the individual elements of the problem. Then attempt to use the score and prove that it optimizes the objective function we are trying to compute in the algorithm. The weakness of greedy algorithm design, however, is that is really easy to come up with alternative "obvious" scores that might do the wrong thing. It is not obvious that a greedy algorithm that uses a particular score always minimizes the objective function. The correctness must be proven formally.

Greedy algorithms are usually easy to analyze from a running time perspective. However, they are generally much harder to be proven correct, it takes a lot of creativity and has a bit of an ad hoc flavor. Greedy algorithms can be proven correct by induction, as in the case of Dijkstra's shortest-path algorithm. Some text books call this method "greedy stays ahead", meaning that you prove that the greedy algorithm does the right thing iteration by iteration. The second method of proving greedy algorithms is by exchange argument.


⚠️ Most greedy algorithms are NOT correct, unless they are formally proven correct.

Examples of greedy algorithms and computational problems greedy algorithms are useful for:

Clustering Algorithms

Clustering algorithms aim to identify the best clustering of a set of objects according to some objective function, which is to say they aim to solve the clustering problem. Some of the most well known clustering algorithms are greedy.

Clustering

Dynamic Programming Algorithms

Dynamic programming algorithms are optimized recursive algorithms, where we store the solution of smaller subproblems and use them in computing the solution for larger subproblems, avoiding duplicate work and yielding superior running times. The classic example where the difference between the straightforward recursive solution and the corresponding dynamic programming solution is obvious is computing Fibonacci numbers.

Dynamic Programming

Number-Theoretic Algorithms

Number-theoretic algorithms are important due in large part to the invention of cryptographic schemes based on large prime numbers. Algorithms in this category are used to generate large prime numbers. Some of these algorithms, for example Miller-Rabin primality test algorithm, are not entirely correct, the sense that there is a very small chance of error, but the chance of error is so small that is considered acceptable.

NP Completeness

NP Completeness

Multithreaded Algorithms

Multicore processors require algorithms designed with parallelism in mind. These are the multithreaded algorithms:

Multithreaded Algorithms

Optimization Algorithms

Optimization Algorithms

Organizatorium