Algorithm Complexity: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 65: Line 65:
For divide-and-conquer recursive problems, where solving the problem involves dividing the problem into a subproblems of size b, applying the algorithm recursively a times on b-size problems and then combining the result, the run time can be expressed by the following generic recurrence:
For divide-and-conquer recursive problems, where solving the problem involves dividing the problem into a subproblems of size b, applying the algorithm recursively a times on b-size problems and then combining the result, the run time can be expressed by the following generic recurrence:


         | Θ(1)           if n ≤ c
         | Θ(1)                     if n ≤ c
  T(n) = |
  T(n) = |
         | a T(b) + D(n) + C(n) otherwise
         | aT(b) + D(n) + C(n)   otherwise


where the D(n) is the cost of dividing the n-size problem into a subproblems, and the C(n) is the cost of combining the results.
where the D(n) is the cost of dividing the n-size problem into a subproblems, and the C(n) is the cost of combining the results.

Revision as of 18:21, 7 August 2018

External

Internal

Overview

Algorithm analysis aims at determining, or predicting, the amount of resources consumed by an algorithm while solving a computational problem as a function of its input size. In most cases the resource being consumed is running time. An analysis can also refer to memory, network bandwidth, etc.

The concept of input size depends on the problem being addressed. In case of sorting algorithms, the most natural measure of input size is the number of items in the input. For the problem of multiplying two integers, the best measure of input size is the total number of bits needed to represent the input in binary notation. In the case of graph algorithms, there are two numbers that describe the input size: the graph's number of vertices and edges.

The running time (or computational time) of an algorithm on a particular input is the number of random-access machine primitive operations, or steps, executed.

Random-Access Machine (RAM)

An algorithm is usually analyzed in the context of a random-access machine (RAM), which is a theoretical model for the computing resources typically used by algorithms and their costs. A random-access machine has only one processor. In this model, instructions are executed one after another, with no concurrent operations, though extensions of the model exists when we need to analyze multithreaded algorithms. Algorithms are implemented as computer programs for such a machine. The random-access machine mimics how real computers are designed, so it has instructions commonly found in real computers: arithmetic (add, subtract, multiply, divide, remainder, floor, ceiling), data movement (load, store, copy) and control (conditional branch, subroutine call and return). It is important to emphasize that each type of instruction takes a constant amount of time - this assumption is essential in algorithm analysis. This time is marked as ci when analyzing a specific algorithm.

In accordance with how real computers are implemented, operations as multiplication with 2k are also considered constant-time, because they are implemented by shifting left a number representation for k times, as long as k is no more than the number of bits in a computer word. Also, a statement calling a subroutine takes a constant amount of time, while the execution of the subroutine will usually not.

The data types in the RAM model are integer and floating point. The model assumes a limit on the size of each word of data (if the word size could grow arbitrarily, we could store large amounts of data in one word and operate on it in constant time, which is an unrealistic scenario).

A random-access machine does not attempt to model a memory hierarchy (caches or virtual memory). In some cases, memory hierarchies effects are significant, but usually, RAM-model analyses are good predictors of performance on actual machines.

Time Complexity Analysis Method

We start by expressing the algorithm in random-access machine instructions. As mentioned above, each type of instruction takes a constant time ci to execute, referred to as the cost of the instruction (statement).

The running time of the algorithm is the sum of running times for each statement executed. A statement with a cost ci that executes n times will contribute cin to the total running time. When adding various costs, the terms of the same order can be coalesced: c1n + c2n + ... ckn can be expressed as c*n.

Some statements may execute or not depending on which input of size n is given. From this perspective, an input may be the worst, leading to the longest running time, the best, leading to the shortest running time, or average.

Time complexity analysis focuses on worst-case running time. The worst-case running time is the longest running time for any input of size n. The worst-case running time is important because it gives us an upper bound for the running time for any input, and knowing it provides a guarantee that the algorithm cannot do any worse. For some algorithms, the worst case occurs fairly often (for example, searching in a database when the information is not present). Also, the average case is often roughly as bad as the worst case.

The best-case running time for an algorithm is the running time when the input is arranged in the best possible way, so the algorithm has to perform the minimal number of steps to solve the problem. An example is an already sorted array for insertion sort.

The average-case running time is the running time of an algorithm, in average, for a large number of different inputs. The average-case running time can be determined using probabilistic analysis.

In general we expect that all inputs of a given size are equally likely, but in practice this may not be the case. In such situations we can use a randomized algorithm to allow probabilistic analysis and yield an expected running time.

Order of Growth of an Algorithm's Running Time

If the running time can approximated by a polynomial function in n, such as, for example a*n2+b*n + c, we can consider only the highest-order term of the formula: a*n2. We can also ignore the leading term's constant coefficient, since the constant factors are less significant than the other factor of the term in determining computational efficiency for large inputs. The rate of growth or the order of growth of the algorithm's running time is the highest-order term, without coefficient. In this case of our example, the rate of growth is n2.

Comparing Running Time

One algorithm is considered to be more efficient than another if its worst-case running time has a lower order of growth.

Due to constant factors and lower-order terms, an algorithm whose running time has a higher order of growth might take less time for small inputs than an algorithm whose running time has a lower order of growth. However, for large enough inputs, the algorithm with lower order of growth will aways run faster.

Comparing running time for two algorithms of which one is c1n2 and the other is c2 n lg n, even if c2 ≥ c1, no matter how much smaller c1 is than c2, there will always be a crossover point beyond which the c2n lg n algorithm is faster.

Examples of Time Complexity Analyses

Recurrences

The running time of recursive algorithms can be analyzed using recurrences. A recurrence, or a recurrence equation, is a mathematical expression that describe the run time of a recursive algorithm for a problem of size n in terms of the running time on smaller inputs. Once the run time is expressed this way, we can use mathematical tools, such as the master theorem to provide bounds on the performance of the algorithm.

For divide-and-conquer recursive problems, where solving the problem involves dividing the problem into a subproblems of size b, applying the algorithm recursively a times on b-size problems and then combining the result, the run time can be expressed by the following generic recurrence:

       | Θ(1)                     if n ≤ c
T(n) = |
       | aT(b) + D(n) + C(n)    otherwise

where the D(n) is the cost of dividing the n-size problem into a subproblems, and the C(n) is the cost of combining the results.

The Master Theorem

A method of analyzing running time of recursive algorithms is the "master theorem". The master theorem applies to algorithms whose running times can be expressed as a recurrence equation.