Struktur Data & Algoritme (Data Structures & Algorithms ) Analisa Algoritma
Denny
[email protected] Fakultas Ilmu Komputer Universitas Indonesia Semester Genap - 2000/2001
Version 1.0 - Internal Use Only
Algoritme n
Suatu set instruksi yang harus diikuti oleh computer untuk memecahkan suatu masalah.
SDA/ALG-AN/DN/V1.0/2
Fundamental Data Types: String, Math, Casting
1
Analisa Algoritme: motivasi n n
perlu diketahui berapa banyak resource (time & space) yang diperlukan oleh sebuah algoritme Menggunakan teknik-teknik untuk mengurangi waktu yang dibutuhkan oleh sebuah algoritme
SDA/ALG-AN/DN/V1.0/3
Objectives n
dapat memperkirakan waktu yang dibutuhkan
n
mempelajari teknik yang secara drastis menurunkan running time
n
mempelajari kerangka matematik yang secara rigorously menggambarkan running time
SDA/ALG-AN/DN/V1.0/4
Fundamental Data Types: String, Math, Casting
2
Outline n
Apa itu analisa algoritme?- what
n
Bagaimana cara untuk analisa/mengukur? - how
n
Notasi Big-Oh
n
Case Study: The Maximum Contiguous Subsequence Sum Problem n
Algorithm 1: A cubic algorithm
n
Algorithm 2: A quadratic algorithm
n
Algorithm 3: An N log N algorithm
n
Algorithm 4: A linear algorithm
SDA/ALG-AN/DN/V1.0/5
Analisa Algoritme: what? n n
n n
Mengukur jumlah sumber daya (time dan space) yang diperlukan oleh sebuah algoritme Waktu yang diperlukan (running time) oleh sebuah algorithm cenderung tergantung pada jumlah input yang diproses. Running time dari sebuah algoritme adalah fungsi dari jumlah inputnya Selalu tidak terikat pada platform, bahasa pemrograman atau bahkan metodologi (mis. Procedural vs OO)
SDA/ALG-AN/DN/V1.0/6
Fundamental Data Types: String, Math, Casting
3
Worst, Best and Average Case n n n
Worst-case running time is a bound over all inputs of a certain size N. (Guarantee - Pesimistic) Average-case running time is an average over all inputs of a certain size N. (Prediction) Best-case running time is defined by the minimum number of steps taken on any instance of size n. (Optimistic)
SDA/ALG-AN/DN/V1.0/7
Worst, Best and Average Case (2) running time
worst case average case
best case
input size
n
SDA/ALG-AN/DN/V1.0/8
Fundamental Data Types: String, Math, Casting
4
Analisa Algoritme: how? n
n
Bagaimana jika kita menggunakan jam? n Jumlah waktu yang digunakan bervariasi tergantung pada beberapa faktor lain: kecepatan mesin, sistem operasi, kualitas kompiler, dan bahasa pemrograman. n Sehingga kurang memberikan gambaran yang tepat tentang algoritme Notasi O (sering disebut sebagai “notasi big-Oh”) n Digunakan sebagai bahasa untuk membahas efisiensi dari sebuah algoritme: log n, linier, n log n, n 2, n3, ... n Contoh: run time algoritme Insertion Sort adalah O(n2) n Dari hasil run-time, dapat kita buat grafik dari waktu eksekusi dan jumlah data.
SDA/ALG-AN/DN/V1.0/9
Big Oh n
n
Sebuah fungsi kubic adalah sebuah fungsi yang suku dominannya (dominant term) adalah sebuah konstan dikalikan dengan n3. Contoh: n 10 n 3 + n 2 + 40 n + 80 n n3 + 1000 n 2 + 40 n + 80 n n3 + 10000
SDA/ALG-AN/DN/V1.0/10
Fundamental Data Types: String, Math, Casting
5
Dominant Term n n n
n n
Mengapa hanya suku yang memiliki pangkat tertinggi/dominan saja yang diperhatikan? Contoh: kita estimasi n3 + 350n2 + n dengan n3. Untuk n = 10000 n Nilai aktual 1,003,500,010,000 n Nilai estimasi 1,000,000,000,000 n Kesalahan dalam estimasi 0.35%, dapat diabaikan. Untuk n yang besar, suku dominan lebih mengindikasikan perilaku dari algoritme. Untuk n yang kecil, suku dominan tidak selalu mengindikasikan perilakunya, tetapi program dengan input kecil umumnya berjalan sangat cepat sehingga kita tidak perlu perhatikan. SDA/ALG-AN/DN/V1.0/11
Big Oh: issues n
Apakah fungsi linier selalu lebih kecil dari fungsi kubic? n Untuk n yang kecil, bisa saja fungsi linier > fungsi kubic n Tetapi untuk n yang besar, fungsi kubic > fungsi linier n Contoh: • 10 n3 + 20 n + 10 • 1000 n + 100
n
n
Mengapa nilai konstan/koefesien pada setiap suku tidak diperhatikan? n Nilai konstan/koefesien tidak berarti pada mesin yang berbeda ∴ Big Oh digunakan untuk merepresentasikan laju pertumbuhan (growth rate) SDA/ALG-AN/DN/V1.0/12
Fundamental Data Types: String, Math, Casting
6
Orde Fungsi Running-Time Function c log N log 2 N N N log N N2 N3 2N N!
Name Constant Logarithmic Log-squared Linear N log N Quadratic Cubic Exponential Factorial SDA/ALG-AN/DN/V1.0/13
Fungsi Running-Time n
Untuk input yang sedikit, beberapa fungsi lebih cepat dibandingkan dengan yang lain.
SDA/ALG-AN/DN/V1.0/14
Fundamental Data Types: String, Math, Casting
7
Fungsi Running-Time (2) n
Untuk input yang besar, beberapa fungsi runningtime sangat lambat - tidak berguna.
SDA/ALG-AN/DN/V1.0/15
Contoh n
n
Mencari elemen terkecil dalam sebuah array n Algoritme: sequential scan n Orde-nya: O(n) - linear Mencari dua titik yang memiliki jarak terpendek dalam sebuah bidang (koordinat X-Y) n Masalah dalam komputer grafis. n Algoritme: • hitung jarak dari semua pasangan titik • cari jarak yang terpendek n n n
Jika jumlah titik adalah n, maka jumlah pasangan ada n * (n - 1) / 2 Orde-nya: O(n2) - kuadratik Ada solusi yang O(n log n) bahkan O(n) dengan cara algoritme lain. SDA/ALG-AN/DN/V1.0/16
Fundamental Data Types: String, Math, Casting
8
Contoh (2) n
Tentukan apakah ada tiga titik dalam sebuah bidang yang segaris (colinier). n Algoritme: • periksa semua pasangan titik yang terdiri dari 3 titik. n n n
Jumlah pasangan: n * (n - 1) * (n - 2) / 6 Orde-nya: O(n3) - kubik. Sangat tidak berguna untuk 10.000 titik. Ada algoritme yang kuadratik.
SDA/ALG-AN/DN/V1.0/17
Studi Kasus n n
n n
Mengamati sebuah masalah dengan beberapa solusi. Masalah Maximum Contiguous Subsequence Sum n Diberikan (angka integer negatif dimungkinkan) A1, A2, …, AN, cari nilai maksimum dari (Ai + Ai+1 + …+ Aj ). maximum contiguous subsequence sum adalah nol jika semua integer adalah negatif. (Why?) Contoh (maximum subsequences digarisbawahi) n -2, 11, -4, 13, -4, 2 n 1, -3, 4, -2, -1, 6
SDA/ALG-AN/DN/V1.0/18
Fundamental Data Types: String, Math, Casting
9
Brute Force Algorithm (1) n
Algoritme: n Hitung jumlah dari semua sub-sequence yang mungkin n Cari nilai maksimumnya
SDA/ALG-AN/DN/V1.0/19
Brute Force Algorithm (2) public static int MaxSubSum1( int [] A ) { int MaxSum = 0; for(int i = 0; i < A.length; i++) { for( int j = i; j < A.length; j++) { int ThisSum = 0; for(int k = i; k <= j; k++) ThisSum += A[ k ]; if( ThisSum > MaxSum ) MaxSum = ThisSum; } } return MaxSum; } SDA/ALG-AN/DN/V1.0/20
Fundamental Data Types: String, Math, Casting
10
Analysis n n
Loop of size N inside of loop of size N inside of loop of size N means O(N3), or cubic algorithm. Slight over-estimate that results from some loops being of size less than N is not important.
SDA/ALG-AN/DN/V1.0/21
Actual Running Time n n
n n n
For N = 100, actual time is 0.47 seconds on a particular computer. Can use this to estimate time for larger inputs: T(N) = cN 3 T(10N) = c(10N)3 = 1000cN 3 = 1000T(N) Inputs size increases by a factor of 10 means that running time increases by a factor of 1,000. For N = 1000, estimate an actual time of 470 seconds. (Actual was 449 seconds). For N = 10,000, estimate 449000 seconds (6 days).
SDA/ALG-AN/DN/V1.0/22
Fundamental Data Types: String, Math, Casting
11
How To Improve n n n
Remove a loop; not always possible. Here it is: innermost loop is unnecessary because it throws away information. ThisSum for next j is easily obtained from old value of ThisSum: n Need Ai + A i+1 + … + A j-1 + Aj n Just computed Ai +A i+1 + …+ A j-1 n What we need is what we just computed + Aj
SDA/ALG-AN/DN/V1.0/23
The Better Algorithm public static int MaxSubSum2( int [ ] A ) { int MaxSum = 0; for( int i = 0; i < A.length; i++ ) { int ThisSum = 0; for( int j = i; j < A.length; j++ ) { ThisSum += A[ j ]; if( ThisSum > MaxSum ) MaxSum = ThisSum; } } return MaxSum; } SDA/ALG-AN/DN/V1.0/24
Fundamental Data Types: String, Math, Casting
12
Analysis n n n
Same logic as before: now the running time is quadratic, or O(N2) As we will see, this algorithm is still usable for inputs in the tens of thousands. Recall that the cubic algorithm was not practical for this amount of input.
SDA/ALG-AN/DN/V1.0/25
Actual running time n n
n n n
For N = 100, actual time is 0.011 seconds on the same particular computer. Can use this to estimate time for larger inputs: T(N) = cN 2 T(10N) = c(10N)2 = 100cN 2 = 100T(N) Inputs size increases by a factor of 10 means that running time increases by a factor of 100. For N = 1000, estimate a running time of 1.11 seconds. (Actual was 1.12 seconds). For N = 10,000, estimate 111 seconds (= actual).
SDA/ALG-AN/DN/V1.0/26
Fundamental Data Types: String, Math, Casting
13
Recursive Algorithm n
Use a divide-and-conquer approach. n
Membagi (divide) permasalahan ke dalam bagian yang lebih kecil
n
Menyelesaikan (conquer) masalah per bagian secara recursive
n
Menggabung penyelesaian per bagian menjadi solusi masalah awal
SDA/ALG-AN/DN/V1.0/27
Recursive Algorithm (2) n
n n
The maximum subsequence either n lies entirely in the first half n lies entirely in the second half n starts somewhere in the first half, goes to the last element in the first half, continues at the first element in the second half, ends somewhere in the second half. Compute all three possibilities, and use the maximum. First two possibilities easily computed recursively.
SDA/ALG-AN/DN/V1.0/28
Fundamental Data Types: String, Math, Casting
14
Computing the Third Case n n
n
Easily done with two loops; see the code For maximum sum that starts in the first half and extends to the last element in the first half, use a right-to-left scan starting at the last element in the first half. For the other maximum sum, do a left-to-right scan, starting at the first element in the first half.
4
-3
5
-2
-1
2
6
-2
4*
0
3
-2
-1
1
7*
5
SDA/ALG-AN/DN/V1.0/29
Recursion version static private int maxSumRec (int[] a, int left, int right) { int center = (left + right) / 2; if(left == right) // Base case return a[left] > 0 ? a[left] : 0; int maxLeftSum = maxSumRec (a, left, center); int maxRightSum = maxSumRec (a, center+1, right); for(int i = center; i >= left; i--) { leftBorderSum += a[i]; if(leftBorderSum > maxLeftBorderSum) maxLeftBorderSum = leftBorderSum; } ... SDA/ALG-AN/DN/V1.0/30
Fundamental Data Types: String, Math, Casting
15
Recursion Version ... for(int j = center + 1; j <= right; j++) { rightBorderSum += a[j]; if(rightBorderSum > maxRightBorderSum) maxRightBorderSum = rightBorderSum; } return max3 (maxLeftSum, maxRightSum, maxLeftBorderSum + maxRightBoderSum); } // publicly visible routine static public int maxSubSum (int [] a) { return maxSumRec (a, 0, a.length-1); } SDA/ALG-AN/DN/V1.0/31
Coding Details n n n n
The code is more involved Make sure you have a base case that handles zeroelement arrays. Use a public static driver with a private recursive routine. Recursion rules: n Have a base case n Make progress to the base case n Assume it works n Avoid computing the same solution twice
SDA/ALG-AN/DN/V1.0/32
Fundamental Data Types: String, Math, Casting
16
Analysis n n n
Let T( N ) = the time for an algorithm to solve a problem of size N. Then T( 1 ) = 1 (1 will be the quantum time unit; remember that constants don't matter). T( N ) = 2 T( N / 2 ) + N n Two recursive calls, each of size N / 2. The time to solve each recursive call is T( N / 2 ) by the above definition n Case three takes O( N ) time; we use N, because we will throw out the constants eventually.
SDA/ALG-AN/DN/V1.0/33
Bottom Line T(1) = 1 = 1 * 1 T(2) = 2 * T(1) + 2 = 4 = 2 * 2 T(4) = 2 * T(2) + 4 = 12 = 4 * 3 T(8) = 2 * T(4) + 8 = 32 = 8 * 4 T(16) = 2 * T(8) + 16 = 80 = 16 * 5 T(32) = 2 * T(16) + 32 = 192 = 32 * 6 T(64) = 2 * T(32) + 64 = 448 = 64 * 7 T(N) = N(1 + log N) = O(N log N)
SDA/ALG-AN/DN/V1.0/34
Fundamental Data Types: String, Math, Casting
17
N log N n
n n
Any recursive algorithm that solves two half-sized problems and does linear non-recursive work to combine/split these solutions will always take O( N log N ) time because the above analysis will always hold. This is a very significant improvement over quadratic. It is still not as good as O( N ), but is not that far away either. There is a linear-time algorithm for this problem; see the online code. The running time is clear, but the correctness is non-trivial.
SDA/ALG-AN/DN/V1.0/35
Linear Algorithms n n n n
Linear algorithm would be best. Remember: linear means O( N ). Running time is proportional to amount of input. Hard to do better for an algorithm. If input increases by a factor of ten, then so does running time.
SDA/ALG-AN/DN/V1.0/36
Fundamental Data Types: String, Math, Casting
18
Idea
n
Let A i,j be any sequence with S i,j < 0. If q > j, then A i,q is not the maximum contigous subsequence.
n
Proof: n
Si,q = S i,j + Sj+1,q
n
Si,j < 0 => S i,q < S j+1,q
SDA/ALG-AN/DN/V1.0/37
The Code - version 1 static public int maximumSubSequenceSum3 (int a[]) { int maxSum = 0; int thisSum = 0; for (int j = 0; j < a.length; j++) { thisSum += a[j]; if (thisSum > maxSum) { maxSum = thisSum; } else if (thisSum < 0) { thisSum = 0; } } return maxSum; } SDA/ALG-AN/DN/V1.0/38
Fundamental Data Types: String, Math, Casting
19
The Code - version 2 static public int maximumSubSequenceSum3b (int data[]) { int thisSum = 0, maxSum = 0; for (int i = 0; i < data.length; i++) { thisSum = Max(0, data[i] + thisSum); maxSum = Max(thisSum, maxSum); } return maxSum; }
SDA/ALG-AN/DN/V1.0/39
Running Time
SDA/ALG-AN/DN/V1.0/40
Fundamental Data Types: String, Math, Casting
20
Running Time: on different machines n
Cubic Algorithm on Alpha 21164 at 533 Mhz using C compiler
n
Linear Algorithm on Radio Shack TRS-80 Model III (a 1980 personal computer with a Z-80 processor running at 2.03 Mhz) using interpreted Basic
n
10 100 1,000 10,000 100,000 1,000,000
Alpha 21164A, C compiled, Cubic Algorithm 0.6 microsecs 0.6 milisecs 0.6 secs 10 mins 7 days 19 yrs
TRS-80, Basic interpreted, Linear Algorithm 200 milisecs 2.0 secs 20 secs 3.2 mins 32 mins 5.4 hrs SDA/ALG-AN/DN/V1.0/41
Running Time: moral story n n
Even the most clever programming tricks cannot make an ineffiecient algorithm fast. Before we waste effort attempting to optimize code, we need to optimize the algorithm.
SDA/ALG-AN/DN/V1.0/42
Fundamental Data Types: String, Math, Casting
21
The Logarithm n
n
n
Formal Definition n For any B, N > 0, logBN = K if B K = N. n If (the base) B is omitted, it defaults to 2 in computer science. Examples: n log 32 = 5 (because 25 = 32) n log 1024 = 10 n log 1048576 = 20 n log 1 billion = about 30 The logarithm grows much more slowly than N, and slower than the square root of N.
SDA/ALG-AN/DN/V1.0/43
Examples of the Logarithm n
n
n
n
BITS IN A BINARY NUMBER n How many bits are required to represent N consecutive integers? REPEATED DOUBLING n Starting from X = 1, how many times should X be doubled before it is at least as large as N? REPEATED HALVING n Starting from X = N, if N is repeatedly halved, how many iterations must be applied to make N smaller than or equal to 1 (Halving rounds up). Answer to all of the above is log N (rounded up).
SDA/ALG-AN/DN/V1.0/44
Fundamental Data Types: String, Math, Casting
22
Why log N? n
n
B bits represents 2 B integers. Thus 2B is at least as big as N, so B is at least log N. Since B must be an integer, round up if needed. Same logic for the other examples.
SDA/ALG-AN/DN/V1.0/45
Repeated Halving Principle n
n
An algorithm is O( log N ) if it takes constant time to reduce the problem size by a constant fraction (which is usually 1/2). Reason: there will be log N iterations of constant work.
SDA/ALG-AN/DN/V1.0/46
Fundamental Data Types: String, Math, Casting
23
Static Searching n
n
Given an integer X and an array A, return the position of X in A or an indication that it is not present. If X occurs more than once, return any occurrence. The array A is not altered. If input array is not sorted, solution is to use a sequential search. Running times: n Unsuccessful search: O( N ); every item is examined n Successful search: • Worst case: O( N ); every item is examined • Average case: O( N ); half the items are examined
n
Can we do better if we know the array is sorted?
SDA/ALG-AN/DN/V1.0/47
Binary Search n n
n n
Yes! Use a binary search. Look in the middle n Case 1: If X is less than the item in the middle, then look in the subarray to the left of the middle n Case 2: If X is greater than the item in the middle, then look in the subarray to the right of the middle n Case 3: If X is equal to the item in the middle, then we have a match n Base Case: If the subarray is empty, X is not found. This is logarithmic by the repeated halving principle. The first binary search was published in 1946. The first published binary search without bugs did not appear until 1962. SDA/ALG-AN/DN/V1.0/48
Fundamental Data Types: String, Math, Casting
24
/** Performs the standard binary search using two comparisons per level. @param a the array @param x the key @exception ItemNotFound if appropriate. @return index where item is found. */ public static int binarySearch (Comparable [ ] a, Comparable x ) throws ItemNotFound { int low = 0; int high = a.length - 1; int mid; while( low <= high ) { mid = (low + high) / 2; if (a[mid].compares (x) < 0) { low = mid + 1; } else if (a[mid].compares (x) > 0) { high = mid - 1; } else { return mid; } } throw new ItemNotFound( "BinarySearch fails" ); } SDA/ALG-AN/DN/V1.0/49
Binary Search n n n
Can do one comparison per iteration instead of two by changing the base case. See online code for details. Average case and worst case in revised algorithm are identical. 1 + log N comparisons (rounded down to the nearest integer) are used. Example: If N = 1,000,000, then 20 element comparisons are used. Sequential search would be 25,000 times more costly on average.
SDA/ALG-AN/DN/V1.0/50
Fundamental Data Types: String, Math, Casting
25
Big-Oh Rules (1) n
Mathematical expression relative rates of growth n
n
n
n
DEFINITION: (Big-Oh) T(N) = O(F(N)) if there are positive constants c and N0 such that T(N) ≤ c F(N) when N ≥ N0. DEFINITION: (Big-Omega) T(N) = Ω(F(N)) if there are constants c and N0 such that T(N) ≥ c F(N) when N ≥ N0. DEFINITION: (Big-Theta) T(N) = Θ(F(N)) if and only if T(N) = O(F(N)) and T(N) = Ω(F(N)). DEFINITION: (Little-Oh) T(N) = o(F(N)) if and only if T(N) = O(F(N)) and T(N) ≠ Θ (F(N)). SDA/ALG-AN/DN/V1.0/51
Big-Oh Rules (2) n
Meanings of the various growth functions
T(N) = O(F(N)) T(N) = Ω(F(N)) T(N) = Θ(F(N)) T(N) = o(F(N))
Growth of T(N) is ≤ growth of F( N) Growth of T(N) is ≥ growth of F( N) Growth of T(N) is = growth of F(N ) Growth of T(N) is < growth of F( N)
SDA/ALG-AN/DN/V1.0/52
Fundamental Data Types: String, Math, Casting
26
Limitation of Big Oh n n
Tidak dapat digunakan untuk N yang kecil Faktor X n Memory access & disk access n Memory size & disk cache
SDA/ALG-AN/DN/V1.0/53
Summary n n
more data means the program takes more time Big-Oh tidak dapat digunakan untuk N yang kecil
SDA/ALG-AN/DN/V1.0/54
Fundamental Data Types: String, Math, Casting
27
Exercises n
Urutkan fungsi berikut berdasarkan laju pertumbuhan (growth rate) n
n
N, √N, N1.5, N2, N log N, N log log N, N log2 N, N log (N 2), 2/N, 2N, 2 N/2, 37, N 3, N2 log N
A5 + B5 + C 5 + D 5 + E5 = F5 n
0 < A ≤ B ≤ C ≤ D ≤ E ≤ F ≤ 75
n
hanya memiliki satu solusi. berapa?
SDA/ALG-AN/DN/V1.0/55
Assignment n
Critical Mass n Secara umum dapat dilihat di http://pala.acad.cs.ui.ac.id/contest/ OnlineJudge/v5/580.html n Lengkapnya akan diposting di forum.iki.struktur n Due date: 2 Maret 2001
SDA/ALG-AN/DN/V1.0/56
Fundamental Data Types: String, Math, Casting
28
Further Reading n n
n
Chapter 5 http://pala.acad.cs.ui.ac.id/contest/ mirror/evo.apm.tuwien.ac.at/ AlgDesignManual/INDEX.HTM http://pala.acad.cs.ui.ac.id/contest/ OnlineJudge/v1/108.html
SDA/ALG-AN/DN/V1.0/57
What’s Next n
Recursion (Chapter 7)
SDA/ALG-AN/DN/V1.0/58
Fundamental Data Types: String, Math, Casting
29