Algorithm Complexity: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 59: Line 59:
We start by expressing the algorithm in [[#Random-Access_Machine_.28RAM.29|random-access machine]] instructions. As mentioned above, each type of instruction takes a constant time c<sub>i</sub> to execute, referred to as the '''cost''' of the instruction.
We start by expressing the algorithm in [[#Random-Access_Machine_.28RAM.29|random-access machine]] instructions. As mentioned above, each type of instruction takes a constant time c<sub>i</sub> to execute, referred to as the '''cost''' of the instruction.


The running time of the algorithm is the sum of running times for each statement executed. A statement with a cost c<sub>i</sub> that executes n times will contribute c<sub>i</sub>⋅n to the total running time. When adding various costs, the terms of the same order can be coalesced: c<sub>1</sub>n + c<sub>2</sub>n + ... c<sub>k</sub>n can be expressed as c⋅n.
The running time of the algorithm is the sum of running times for each statement executed. A statement with a cost c<sub>i</sub> that executes n times will contribute c<sub>i</sub>⋅n to the total running time. When adding various costs, the terms of the same order can be coalesced: c<sub>1</sub>n + c<sub>2</sub>n + ... c<sub>k</sub>n can be expressed as c⋅n. More informally, if an algorithm is in the form "do this, and when you are done, do that", the corresponding running times are added. However, if an algorithm is in the form "do this for each time you do that", the corresponding running times are multiplied.


Some statements may execute or not depending on which input of size n is given. From this perspective, an input may be the worst, leading to the longest running time, the best, leading to the shortest running time, or average. Additional complications in calculating the running time arise when we deal with recursive algorithms. Those aspects will be discussed below in the [[#Recursive_Algoritms_and_Recurrences|Recursive Algorithms and Recurrences]] section.
Some statements may execute or not depending on which input of size n is given. From this perspective, an input may be the worst, leading to the longest running time, the best, leading to the shortest running time, or average. Additional complications in calculating the running time arise when we deal with recursive algorithms. Those aspects will be discussed below in the [[#Recursive_Algoritms_and_Recurrences|Recursive Algorithms and Recurrences]] section.

Revision as of 00:25, 13 September 2018

External

Internal

Overview

Algorithm complexity analysis is an essential subject if the amount of resources, such as time and memory, consumed by an algorithm, is important. For practical applications, and with the exception of the most trivial cases, resource consumption is important. Algorithms that differ slightly may come with significant differences in processing time, or memory consumption, when the size of the input is large. Algorithm complexity analysis is used to approximate, or bound, the amount of resources consumed by an algorithm while solving a computational problem. In most cases, we are interested in running time, but memory requirements, network bandwidth and other resources can also be estimated.

Algorithm complexity analysis usually involves the following sequence:

  1. We establish the relevant input and input size of the problem. The input size may be multi-dimensional.
  2. The algorithm is analyzed and its complexity, or the quantity of consumed resources, such as time and memory, is expressed as a function of the input size and the constant costs of elementary operations of an idealized random access machine. Different techniques exist for iterative algorithms or recursive algorithms. At this stage, situations like worst case and average case are usually taken into consideration. Worst case analysis is common, as it occurs fairly often and its complexity is usually comparable with the average case.
  3. The resulting functions are approximated using their order of growth. If we compare two algorithms, the one with a lower order of growth is usually better for a large input size. This is not necessarily true for small input sizes.

"Big O" is the language and the metric used when discussing algorithm complexity. "Big O" is an informal name for the formalism known as asymptotic notation.

Intuition behind Time Complexity

The intuition behind algorithm complexity analysis can be introduced using the following example: let's assume that we created a computer file, which is currently stored locally, and we want to share it with a friend in a remote location. There are at least two different "algorithms" to do that: send the content over the network as part of a FTP or scp transfer, or copy the file on a large enough drive and ship it with FedEx. The natural input size in this case is the size of the file, in bytes. It is interesting to analyze the transfer time as a function of the file size and determine what algorithm is more efficient.

The network transfer is subject to bandwidth limitation. Assuming that we can transfer data with 1 Mbs (1 Megabit per second, which is about 131072 bytes per second, an arbitrary value to simplify the calculations), 1GB file can be transferred in roughly 2 hours and 15 minutes, and 1 TB file in a thousand times that, which is about 98 days. The transfer time is proportional with the file size. We can describe the time complexity of the network transfer "algorithm" as O(n) - the details of the asymptotic notation will be explained below. In case of transfer by courier, it will take 24 hours to overnight a drive containing the file, regardless of size of the file stored on the drive. The asymptotic notation for constant time is O(1), which says the transfer time is a constant that does not depend on the input size.

We will argue that for an arbitrarily large input size, drive shipping, assuming that the drive is large enough, has a better time complexity than the file transfer. Even if the constant involved by the shipping "algorithm" is relatively large (24 hours), shipping will be faster than the network transfer for any file of approximately 10 GB size and up. Obviously, if the file has 1KB, shipping is terribly inefficient, and the O(1) algorithm performs far worse than the O(n) algorithm, because of the large built-in constants. However, no matter how big the O(1) constant is, and no matter how slow the linear increase is, the linear algorithm will at some point surpass the constant time algorithm.

Input Size

The concept of input size depends on the problem being addressed. In case of sorting algorithms, the most natural measure of input size is the number of items in the input. For the problem of multiplying two integers, the best measure of input size is the total number of bits needed to represent the input in binary notation.

Input size can be multi-dimensional: In the case of graph algorithms, there are two numbers that describe the input size: the graph's number of vertices and edges. If the problem involves two vectors whose size is unrelated, then the input size is also two-dimensional and consists of the lengths of both vectors.

Algorithm Complexity Measures

An algorithm's complexity is expressed as quantity of consumed resources, such as time and memory. The amount of resources is expressed with a function of the input size and the cost of elementary operations of an idealized random access machine. If we are interested in the behavior of the algorithm for large input sizes, it is sufficient to approximate the function by its order of growth.

Time Complexity

The running time, or computational time of an algorithm on a particular input is usually very well predicted by the number of random-access machine primitive operations, or steps, executed.

Space Complexity

Space complexity refers to the algorithm's requirements for memory.

Random-Access Machine (RAM)

An algorithm is usually analyzed in the context of a random-access machine (RAM), which is a theoretical model for the computing resources typically used by algorithms. A random-access machine has only one processor. In this model, instructions are executed one after another, with no concurrent operations, though extensions of the model exist for the case where we need to analyze multithreaded algorithms. Algorithms are implemented as computer programs for such a machine. The random-access machine mimics how real computers are designed, so it has instructions commonly found in real computers: arithmetic (add, subtract, multiply, divide, remainder, floor, ceiling), data movement (load, store, copy) and control (conditional branch, subroutine call and return). It is important to emphasize that each type of instruction takes a constant amount of time - this assumption is essential in algorithm analysis. This time is marked as ci when analyzing a specific algorithm.

In accordance with how real computers are implemented, operations as multiplication with 2k are also considered constant-time, because they are implemented by shifting left a number representation for k times, as long as k is no more than the number of bits in a computer word. Also, a statement calling a subroutine takes a constant amount of time, although the execution of the subroutine will usually not.

The data types in the RAM model are integer and floating point. The model assumes a limit on the size of each word of data: if the word size could grow arbitrarily, we could store large amounts of data in one word and operate on it in constant time, which is an unrealistic scenario.

A random-access machine does not attempt to model a memory hierarchy (caches or virtual memory). In some cases, memory hierarchy's effect is significant, but usually, RAM-model analyses are good predictors of performance on actual machines.

Time Complexity Analysis Method

We start by expressing the algorithm in random-access machine instructions. As mentioned above, each type of instruction takes a constant time ci to execute, referred to as the cost of the instruction.

The running time of the algorithm is the sum of running times for each statement executed. A statement with a cost ci that executes n times will contribute ci⋅n to the total running time. When adding various costs, the terms of the same order can be coalesced: c1n + c2n + ... ckn can be expressed as c⋅n. More informally, if an algorithm is in the form "do this, and when you are done, do that", the corresponding running times are added. However, if an algorithm is in the form "do this for each time you do that", the corresponding running times are multiplied.

Some statements may execute or not depending on which input of size n is given. From this perspective, an input may be the worst, leading to the longest running time, the best, leading to the shortest running time, or average. Additional complications in calculating the running time arise when we deal with recursive algorithms. Those aspects will be discussed below in the Recursive Algorithms and Recurrences section.

Worst-case Running Time

Time complexity analysis focuses on worst-case running time. The worst-case running time is the longest running time for any input of size n. The worst-case running time is important because it gives us an upper bound for the running time for any input, and knowing it provides a guarantee that the algorithm cannot do any worse. For some algorithms, the worst case occurs fairly often: for example, searching in a database when the information is not present. Also, the average case is often roughly as bad as the worst case. As we will see below, the O-notation provides a way to express asymptotic upper bounds to functions, so it is used when talking about running time for any situation, including worst-case running time.

Best-case Running Time

The best-case running time for an algorithm is the running time when the input is arranged in the best possible way, so the algorithm has to perform the minimal number of steps to solve the problem. An example is an already sorted array for insertion sort. As we will see below, we use the Ω-notation to describe the best-case running time of an algorithm.

Average-case Running Time

The average-case running time is the running time of an algorithm, in average, for a large number of different inputs. The average-case running time can be determined using probabilistic analysis.

More explanations on the difference between average-case running time and expected running time.

Expected Running Time

In general we expect that all inputs of a given size are equally likely, but in practice this may not be the case. In such situations we can use a randomized algorithm to allow probabilistic analysis and yield an expected running time.

Order of Growth

The result of the time complexity analysis method described above is a mathematical function of the input size.

A very common case, but by all means not the only case, is that the running time can approximated by a polynomial function in n, such as, for example a⋅n2 + b⋅n + c. In this situation, we can consider only the highest-order term of the formula: a⋅n2. We can also ignore the leading term's constant coefficient, since the constant factors are less significant than the other factor of the term in determining computational efficiency for large inputs.

The rate of growth or the order of growth, or the rate of increase of the algorithm's running time is the highest-order term, without coefficient. In this case of our example, the rate of growth is n2.

One algorithm is considered to be more efficient than another if its worst-case running time has a lower order of growth, so that is why when we are analyzing the function describing the running time, we are usually interested in its order of growth.

Due to constant factors and lower-order terms, an algorithm whose running time has a higher order of growth might take less time for small inputs than an algorithm whose running time has a lower order of growth. However, for large enough inputs, the algorithm with lower order of growth will aways run faster. For example, comparing running time for two algorithms of which one is c1n2 and the other is c2 n lg n, even if c2 > c1, no matter how much smaller c1 is than c2, there will always be a crossover point beyond which the c2n lg n algorithm is faster.

Determining the Order of Growth of a Generic Function

The asymptotic behavior of a function is the same as the asymptotic behavior of its order of growth, which can be obtained from the function by:

  • Coalescing the terms of the same order: the function c1n + c2n + ... ckn can be collapsed to c⋅n. Note that if the input size is multi-dimensional, the terms corresponding to independent input sizes cannot be collapsed, c1n + c2m cannot be approximated with c⋅n (or c⋅m) if m and n are independent.
  • Dropping smaller order (non-dominant) terms: the function c1n2 + c2n can be collapsed to c1n2. Note that if the input size is multi-dimensional, the terms corresponding to independent input sizes cannot dropped, even if their order is lower. c1n2 + c2m cannot be approximate with c1n2 if m and n are independent.
  • Dropping the constant of the highest order term: the function c⋅n can be collapsed to n.

Asymptotic Efficiency

When we look at input sizes large enough to make only the order of growth of an algorithm relevant, we say that we are analyzing the asymptotic efficiency of the algorithm. Asymptotic efficiency is concerned with how the running time of the algorithm increases with the size input in the limit, as the size of the input increases without bound. Usually, an algorithm that is asymptotically more efficient is the best choice, for all but very small inputs.

Asymptotic Notation

To make easier to reason about asymptotic efficiency of various function expressing algorithms' running times, we can use the asymptotic notation. Asymptotic notation essentially designates classes, or sets of functions that behave asymptotically in the same way as the function used in the notation. Adding two function with the same asymptotic notation results in a function with the same asymptotic notation, and so on, and this simplifies reasoning when we estimate the algorithm's efficiency. While we mostly use the notation to describe algorithms running times, it actually applies to functions whose domain is the set of natural numbers. We can use asymptotic notation to function that describe space complexity, or to functions that have nothing to do with algorithms.

The notation abstracts away some details of the actual function: the lower-order terms of an asymptotically positive function can be ignored when determining the asymptotical bounds, because they are insignificant for large n. The terms of the same order can be coalesced. The coefficient (constant) of the highest-order term can be dropped, since it only changes the constants c1 and c2 in the definitions presented below by just a constant factor. All these can be done while the essence of the function - its asymptotic efficiency - is preserved.

Technically, a function f(n) belongs to a set such as O(g(n)) or Θ(g(n)), so it should be written as f(n) ∈ O(g(n)), but we conventionally write "f(n) = O(g(n))". For more details, see Using Asymptotic Notation in Equations and Inequalities.

Any constant function can be expressed as Θ(1). This notation is a minor abuse, because it does not indicate what variable is tending to infinity, but is accepted by convention.

Θ-notation

For a given function g(n), we denote by Θ(g(n)) the set of functions f(n) as follows:


Θ(g(n)) = {all f(n) : there exist positive constants c1, c2 and a integer n0 such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ n0}

A function f(n) belongs to the set Θ(g(n)) if there exist positive constants c1 and c2 so f(n) is bounded by c1g(n) and c2g(n) for sufficiently large n. In other words, for all n ≥ n0, function f(n) is equal to g(n) within a constant factor. We also say that g(n) is an asymptotically tight bound for f(n).

Θ-notation is a stronger notion that O-notation and Ω-notation, described below. That means f(n) = Θ(g(n)) implies f(n) = O(g(n)) and f(n) = Ω(g(n)).

Theta Notation.png

For all values of n at and to the right of n0, the value of f(n) lies at or above c1g(n) and at or below c2g(n).

O-notation

Unlike the Θ notation, the O notation is used to express an asymptotic upper bound. For a given function g(n), we denote by O(g(n)) the set of functions f(n) as follows:


O(g(n)) = {all f(n) : there exist a positive constant c and a integer n0 such that 0 ≤ f(n) ≤ c⋅g(n) for all n ≥ n0}

For all n ≥ n0, function g(n) provides an upper bound to function f(n) within a constant factor.

Since O-notation provides an upper bound, if we use it to describe worst-case running time, then it characterizes the running time for any input. Worst-case running time is the running time we're most interested in, as argued above, and that is the reason the O-notation is most commonly used.

O Notation.png

For all values n at and to the right of n0, the value of the function f(n) is on or below c⋅g(n).

Note that the industry uses the "O-notation" or "Big O" notion with a semantics similar to Θ-notation. For more details see Big O in Industry.

Ω-notation

The Ω notation is used to express an asymptotic lower bound. For a given function g(n), we denote by Ω(g(n)) the set of functions f(n) as follows:


Ω(g(n)) = {all f(n) : there exist a positive constant c and a integer n0 such that 0 ≤ c⋅g(n) ≤ f(n) for all n ≥ n0}

In other words, for all n ≥ n0, function g(n) provides a lower bound to function f(n) within a constant factor. Informally, we say that under any circumstance, the algorithm cannot do better that. The Ω is used to describe the best-case running time of an algorithm.

Omega Notation.png

For all values n at and to the right of n0, the value of the function f(n) is on or above c⋅g(n).

Using Asymptotic Notation in Equations and Inequalities

Technically, a function f(n) belongs to a set such as O(g(n)), Θ(g(n)) or Ω(g(n)), so it should be written as f(n) ∈ O(g(n)), but we conventionally write "f(n) = O(g(n))".

When the asymptotic notation stands alone on the right-hand side of an equation, as in the above example, or inequality, the equal sign means set membership. Example: f(n) = O(n2) means f(n) ∈ O(n2).

When the asymptotic notation appears in a formula, it is a stand-in for some anonymous function. The main reason to use the notation this way is to eliminate inessential details in an expression.

Because transitivity, reflexivity and symmetry apply to asymptotic notation, we can draw an analogy between the asymptotic comparison of two functions f() and g()( and the comparison of two real numbers:

  • f(n) = O(g(n)) is like a ≤ b
  • f(n) = Θ(g(n)) is like a = b
  • f(n) = Ω(g(n)) is like a ≥ b

Big O in Industry

In industry, people seem to have merged Θ and O together. Industry's meaning of big O is closer to what the asymptotic notation formalism defines as Θ-notation: the tightest description of the complexity.

Recursive Algoritms and Recurrences

The running time of recursive algorithms can be analyzed using recurrences.

A recurrence, or a recurrence equation, is an equation or an inequality that describe the run time of a recursive algorithm for a problem of size n in terms of the running time on smaller inputs. Once the run time is expressed this way, we can use mathematical tools or other methods to solve the recurrence, which means obtaining asymptotic bounds on the run time of the algorithm.

Solving Recurrences

There are at least three method to solve recurrences:

Substitution Method

The method consists in guessing a bound and then using mathematical induction on the recurrence to prove the solution correct. We can guess a solution using a recursion tree. Examples on how to use mathematical induction to prove the bound is correct are available in CLRS page 83.

Recursion-Tree Method

The method consists in converting the recurrence into graphical tree representation whose nodes represent costs incurred at various level of the recursion. After we lay out the tree, we sum the costs within each level of the tree to obtain a set of per-level costs, then we sum all the per-level costs to determine the total cost of the recursion. The bound such guessed can be proven with the substitution method. Examples on how to build recursion trees are available in CLRS page 37 and page 88.

The Master Theorem

The master method is a method of analyzing running time of recursive algorithms by using the "master theorem". The master theorem applies to algorithms whose running times can be expressed as a recurrence equation. A generic form of a recurrence equation is:

T(n) = a⋅T(n/b) + f(n)

where a ≥ 1, b > 1 and f(n) is a function that approximates the non-recursive cost. Note that the above expression does include a boundary condition for n = 1. The reasons is that T(1) does not significantly change the solution to the recurrence, it may change it with a constant factor, but the order of growth will stay unchanged.

TODO CLRS page 93.

Analyzing Divide-and-Conquer Algorithms

For divide-and-conquer recursive problems, where solving the problem involves dividing the problem into a subproblems of size n/b of the original', applying the algorithm recursively a times on n/b-size problems and then combining the result, the run time can be expressed by the generic recurrence presented below. If the problem size is small enough (n ≤ c) the straightforward solution takes constant time, which is expressed as a constant function Θ(1).

       | Θ(1)                     if n ≤ c
T(n) = |
       | aT(n/b) + D(n) + C(n)    otherwise

where the D(n) is the cost of dividing the n-size problem into a subproblems, and the C(n) is the cost of combining the results.

Examples of Time Complexity Analysis