CSC236 Data Structures

Complexity

How to Measure

We could code the algorithm, run the program, time the execution. This is a bad option: the timing depends on the speed and current load on the computer. To compare to other algorithms, we need to code them and run them on the same computer under the same load.
We could look at the algorithm (detailed pseudocode or code) and count the number of times each statement executes. Sometimes we choose a particular operation that is critical to the algorithm and count that operation rather than all statements. Then we can compare this count with the counts for existing algorithms.

We choose option 2. However, the data has a major impact on this count.

Effect of Data

Size: more data takes longer to process. For example, to print an array of 100,000 elements takes longer to print than an array of 100 elements. We call this "the size of the problem", and use a variable (usually N) to represent the size.
Arrangement: some data sets take longer than others. For example, when searching for a number in an array we might find it the first place we look or we might not find it till the last place we look. If the number is in the last place we look, the computer will have to execute more statements to find the number. We call this the amount of work required. We have three ways we can count the amount of work:
- Best Case: This is the least amount of work the algorithm can do. For example, if we are searching for a value in an array, we might find it the first place we look. Even if the array has 100,000 elements we could find it in one step. The best case is easy to calculate but not very useful, because many algirithms have a really good best case. THE BEST CASE IS NOT WHEN N = 1. THE BEST CASE USES THE SAME N AS THE AVERAGE AND WORST CASES.
- Worst Case: This is the most amount of work the algorithm can do. For example, if we are searching for a value in an array, we might find it the last place we look. If the array has 100,000 elements we might need to look in every location before we find it.
- Average Case: This is the average amount of work the algorithm does. To compute it we must calculate the amount of work for all cases, add that up, then divide by the number of cases. The average case is the most useful but is the hardest to calculate.

In general we will use the worst case, since it is easy to calculate and at least gives us a bound on the amount of work required.

To recap, we will use a variable for the size of the problem (the amount of data), and we will use the worst case arrangement of data. Remember, the worst case is not when N is large, the worst case is when, for any value of N, the arrangement of data results in more work than for any other data set of the same size.

Examples

Linear Search

        int search(int[] arr, int size, int value) {
           int loc = -1;
           int i = 0;
           while (i < size) {
              if (arr[i] == value)
                 loc = i;
              i++;
           }
           return loc;
        }

Best Case:

Worst Case:

Another version, that stops looping when the value is found:

        int search(int[] arr, int size, int value) {
           int loc = -1;
           int i = 0;
           while (i < size && loc == -1) {
              if (arr[i] == value)
                 loc = i;
              i++;
           }
           return loc;
        }

Best Case:

Worst Case:

Binary Search (array must be sorted!)

        int binsearch(int[] arr, int size, int value) {
           int low = 0;
           int high = size - 1;
           int mid;
           while (low <= high) {
              mid = (low + high) / 2;
              if (arr[mid] > value)
                 high = mid - 1;
              else if (arr[mid] < value)
                 low = mid + 1;
              else
                 return mid;
           }
           return -1;
        }

Best Case:

Worst Case: Suppose N = 2^k

Big O

Our analysis gives us a formula in terms of N for the amount of work required by the algorithm. But we don't need an exact count because we need to make a distinction between algorithms that have a big difference in efficiency, not between one that is slightly faster than another. Also, we only care when N is large; when we have only a small problem it will never take that long to process it, even if the algorithm is slow. This table shows the counts for the worst case of sequential search and binary search for some values of N:

size (N)	seq search (3N+4)	bin search (5logN+3)
2	10	8
4	16	13
8	28	18
16	52	23
32	100	28
1024	3076	53
1,000,000	3,000,004	98

Using the exact formula is not necessary. Since we only care for large N, the high order term of the formula is the one that has the most impact on the count, for example, 3N impacts the count much more than 4, especially when N is large (look at the last row of the table). Also, it's really the N that counts, not the fact that it is multiplied by 3. Look at the following table, which is like the previous table, except that it uses only the high order terms, without the constant multipliers.

size (N)	seq search (N)	bin search (logN)
2	2	1
4	4	2
8	8	3
16	16	4
32	32	5
1024	1024	10
1,000,000	1,000,000	20

Thus when we measure the complexity of an algorithm we count the number of steps, then use the high order term of the resulting formula, without any constant multiplier. This is called the "order of magnitude", "order of growth", "rate of growth", or "big O". Here are some common orders of magnitude:

1 constant time	log N logarithmic time	N linear time	NlogN NlogN time	N² quadratic time	N³	2^N exponential time
1	1	2	2	4	8	4
1	2	4	8	16	64	16
1	3	8	24	64	512	256
1	4	16	64	256	4,096	65,536
1	5	32	160	1,024	32,768	4,294,967,296
1	7	128	896	16,384	2,097,152	fast computer: trillion, billion years
1	8	256	2,048	65,536	16,777,216	don't ask
1	10	1,000	10,000	10⁶	10⁹
1	20	10⁶	2*10⁷	10¹²	10¹⁸