LEARNING Jiawei Han and Micheline Kamber. 2006. Data Mining Concepts and Techniques. San Fransisco : Elsevier M.Tim Jones. Artificial Intelligence A System Approach. Slide Kuliah Data Mining - Klasifikasi, Yudi Wibisono
Program Studi Ilmu Komputer FPMIPA UPI
RNI – IK460(Kecerdasan Buatan)
Pendekatan & Algoritma • Supervised learning – Perceptrons, backpropagation, decision tree, naive bayes
• Unsupervised learning – Hebbian learning, vector quantization, adaptive resonance theory (ART), nearest neighbor
2
DECISION TREE
Pemodelan Pohon
4
Proses Pemodelan & Klasifikasi A1
A2
...
An
Class
...
...
...
...
...
...
...
...
...
...
A1
A2
An
Class
...
Data Training
Algoritma Klasifikasi
Model Klasifikasi
Data Test
prediksi ... ...
... ...
... ...
... ...
? ?
A1
A2
...
An
Class
...
...
...
...
...
...
...
...
...
...
5
DATA TRAINING age
income
student
credit_rating
buys_computer
youth
high
no
fair
no
youth
high
no
excellent
no
middle_aged
high
no
fair
yes
senior
medium
no
fair
yes
senior
low
yes
fair
yes
senior
low
yes
excellent
no
middle_aged
low
yes
excellent
yes
youth
medium
no
fair
no
youth
low
yes
fair
yes
senior
medium
yes
fair
yes
youth
medium
yes
excellent
yes
middle_aged
medium
no
excellent
yes
middle_aged
high
yes
fair
yes
senior
medium
no
excellent
no
6
Entrophy dan Info Gain • Entrophy: kemurnian sembarang atribut dalam sebuah set atau himpunan record data. Data yang homogen memiliki nilai entrophy yang rendah, ditunjukkan dengan selisih jumlah kelas yang signifikan.
m
Entrophy ( S ) I ( S ) p j log 2 ( p j ) j 1
• Information Gain: pengurangan entropy oleh suatu atribut. Nilai info gain yang tinggi membuat data semakin homogen.
7
INFO KELAS Komposisi kelas: Yes : 9 No : 5
Info ( D) I (9,5) 9 9 5 5 log 2 ( ) log 2 ( ) 14 14 14 14 9 5 ( (0.637)) ( (1.485) 14 14 0.410 0.531 0.940
8
INFO ATRIBUT “AGE” I ( 4,0)
Atribut Nilai
Age Yes
No
Info
youth
2
3
I (2,3)
middle_aged
4
0
I (4,0)
senior
3
2
I (3,2)
I ( 2,3) 2 2 3 3 log 2 ( ) log 2 ( ) 5 5 5 5 2 3 ( ( 1.322)) ( ( 0.737) 5 5 0.529 0.442 0.971
4 4 0 0 log 2 ( ) log 2 ( ) 4 4 4 4 4 0 ( (0)) ( (0) 4 4 0
I (3,2) 3 3 2 2 log 2 ( ) log 2 ( ) 5 5 5 5 3 2 ( ( 0.737)) ( ( 1.322) 5 5 0.442 0.529 0.971 9
INFO ATRIBUT “AGE” Atribut nilai youth middle_aged senior Jumlah Record
Age Yes 2 4 3 14
No 3 0 2
Jumlah Atribut 5 4 5
Info I (2,3) 0.971 I (4,0) 0 I (3,2) 0.971
Infoage ( D)
Gain( Age )
5 4 5 I (2,3) I (4,0) I (3,2) 14 14 14 (0.357)(0.971) (0.286)(0) (0.357)(0.971) 0.347 0 0.347
Info ( D) Infoage ( D)
0.694
0.940 0.694 0.246 10
INFO ATRIBUT “INCOME” Atribut Nilai
Income Yes
No
Info
high
2
2
I (2,2)
medium
4
2
I (4,2)
low
3
1
I (3,1)
I ( 2,2)
I ( 4,2) 4 4 4 4 log 2 ( ) log 2 ( ) 6 6 6 6 4 4 ( ( 0.585)) ( ( 1.585) 6 6 0.390 0.528 0.918
2 2 2 2 log 2 ( ) log 2 ( ) 4 4 4 4 2 2 ( ( 1)) ( ( 1) 4 4 1
I (3,1) 3 3 1 1 log 2 ( ) log 2 ( ) 4 4 4 4 3 1 ( ( 0.415)) ( ( 2) 4 4 0.311 0.5 0.811 11
INFO ATRIBUT “INCOME” Atribut
Income
nilai high medium low Jumlah Record
Yes 2 4 3 14
Infoincome ( D)
No 2 2 1
Jumlah Atribut 4 6 4
Info I (2,2) 1 I (4,2) 0.918 I (3,1) 0.811
Gain( Income) Info ( D) Infoincome ( D)
4 6 4 I (2,2) I (4,2) I (3,1) 14 14 14 0.940 0.911 (0.286)(1) (0.429)(0.918) (0.286)(0.811) 0.029 0.286 0.394 0.232
0.911 12
INFO ATRIBUT “STUDENT” Atribut Nilai
Student Yes
No
Info
yes
6
1
I (6,1)
no
3
4
I (3,4)
I (6,1) 6 6 1 1 log 2 ( ) log 2 ( ) 7 7 7 7 6 1 ( ( 0.222)) ( ( 2.807) 7 7 0.191 0.401 0.592
I (3,4) 3 3 4 4 log 2 ( ) log 2 ( ) 7 7 7 7 3 4 ( ( 1.222)) ( ( 0.807) 7 7 0.524 0.461 0.985 13
INFO ATRIBUT “STUDENT” Atribut
Student
nilai
Yes
No
Jumlah Atribut
Yes No Jumlah Record
6 3 14
1 4
7 7
Info I (6,1) I (3,4)
0.592 0.985
Info student( D)
Gain( Student )
7 7 I (6,1) I (3,4) 14 14 (0.5)(0.592) (0.5)(0.985) 0.296 0.493
Info ( D) Info student( D) 0.940 0.788 0.152
0.788 14
INFO ATRIBUT “CREDIT-RATING” Atribut Nilai
Credit-Rating Yes No
Info
Fair
6
2
I (6,2)
Excellent
3
3
I (3,3)
I (6,2) 6 6 2 2 log 2 ( ) log 2 ( ) 8 8 8 8 6 2 ( ( 0.415)) ( ( 2) 8 8 0.311 0.5 0.811
I (3,3) 3 3 3 3 log 2 ( ) log 2 ( ) 6 6 6 6 3 3 ( ( 1)) ( ( 1) 6 6 0.5 0.5 1 15
INFO ATRIBUT “CREDIT-RATING” Atribut
Credit-Rating
nilai
Yes
No
Jumlah Atribut
Fair Excellent Jumlah Record
6 3 14
2 3
8 6
Info
I (6,2) I (3,33)
0.788 0.152
Info student( D)
Gain(Credit Rating )
8 6 I (6,2) I (3,3) 14 14 (0.571)(0.592) (0.429)(0.985) 0.464 0.429
Info ( D) Infocredit rating ( D) 0.940 0.892 0.048
0.892 16
Model Klasifikasi
17
TEOREMA BAYES
Peluang Bersyarat #
Kuliah (K)
Tidak Kuliah (¬K)
Perempuan (P)
300
300
Laki-laki (L)
300
100
Semesta () = 300 + 300 + 300 + 100 = 1000 Kuliah (K) = 300 + 300 = 600 Perempuan (P) = 300 + 300 = 600 PK 300 P (PK) = = 1000 Dengan peluang bersyarat P (PK) = P (P | K) . P (K) =
300 600 . 600 1000 19
Teorema Bayes P (P | K) P (K)
: posterior : a priori
K
P
PK
P (P | K)
=
P (PK) P (K)
Karena PK = KP P (P | K)
=
P (P | K)
=
P (K | P). P (P) P (K) P (K | P). P (P)
P (K | P). P (P)+P (K | L). P (L) 20
Naive Bayes Terdapat sejumlah kelas (Ci -> kelas ke i)
P( A| C )P(C ) i i P(C | A) i P( A) Karena P(A) konstan untuk setiap Ci maka bisa ditulis, pencarian max untuk:
P(C | A) P( A| C )P(C ) i i i n P( X | C i) P(a | C i) P(a1 | C i) P(a2 | C i) ... P(an | C i) k k 1
DATA TRAINING age
income
student
credit_rating
buys_computer
youth
high
no
fair
no
youth
high
no
excellent
no
middle_aged
high
no
fair
yes
senior
medium
no
fair
yes
senior
low
yes
fair
yes
senior
low
yes
excellent
no
middle_aged
low
yes
excellent
yes
youth
medium
no
fair
no
youth
low
yes
fair
yes
senior
medium
yes
fair
yes
youth
medium
yes
excellent
yes
middle_aged
medium
no
excellent
yes
middle_aged
high
yes
fair
yes
senior
medium
no
excellent
no
22
Latihan Klasifikasi menggunakan algoritma naive bayes, Kelas: C1 = yes C2 = no Atribut: A1 (age) = youth A2 (income) = medium A3 (student) = yes A4 (credit_rating) = fair 23
Tugas Kelompok • • • • • •
Reinforcement Learning ANN: Hopfield Auto-Associative Model ANN: Hebbialn Particle Swarm Optimization Artificial Immune System Fuzzy Quantification
24