Classification. Decision Tree. Decision Tree. Konsep Decision Tree. Penggunaan Decision Tree. When To Consider Decision Tree?

Classification

Decision Tree Sesi 09 Dosen Pembina : Danang Junaedi IF-UTAMA

1

Konsep Decision Tree

IF-UTAMA

2

Decision Tree

• Mengubah data menjadi pohon keputusan (decision tree) dan aturan-aturan keputusan (rule)

IF-UTAMA

3

IF-UTAMA

4

Penggunaan Decision Tree

When To Consider Decision Tree?

• Diagnosa penyakit tertentu, seperti hipertensi, kanker, stroke dan lain-lain • Pemilihan produk seperti rumah, kendaraan, komputer dan lain-lain • Pemilihan pegawai teladan sesuai dengan kriteria tertentu • Deteksi gangguan pada komputer atau jaringan komputer seperti Deteksi Entrusi, deteksi virus (trojan dan varians) • dll IF-UTAMA

5

IF-UTAMA

6

1

Gambaran Pemakaian Decision Tree

IF-UTAMA

Sample

7

Information Theory

IF-UTAMA

IF-UTAMA

8

Information Theory (contd)

9

Information Theory (contd)

IF-UTAMA

10

Konsep Data dalam Decision Tree • Data dinyatakan dalam bentuk tabel dengan atribut dan record. • Atribut menyatakan suatu parameter yang dibuat sebagai kriteria dalam pembentukan tree. Misalkan untuk menentukan main tenis, kriteria yang diperhatikan adalah cuaca, angin dan temperatur. Salah satu atribut merupakan atribut yang menyatakan data solusi per-item data yang disebut dengan target atribut. • Atribut memiliki nilai-nilai yang dinamakan dengan instance. Misalkan atribut cuaca mempunyai instance berupa cerah, berawan dan hujan.

IF-UTAMA

11

IF-UTAMA

12

2

Mengubah Data Î Tree

Proses Dalam Decision Tree 1. Mengubah bentuk data (tabel) menjadi model tree. – ID3 Algorithm – C.45 Algorithm – etc

2. Mengubah model tree menjadi rule – Disjunction (v Æ OR) – Conjunction (^ Æ AND)

3. Menyederhanakan Rule (Pruning) IF-UTAMA

13

Tree Æ Rule

IF-UTAMA

IF-UTAMA

14

Tree Æ Rule (contd)

15

IF-UTAMA

16

How Decision Tree Induction For Classification

IF-UTAMA

17

IF-UTAMA

18

3

ID3 Algorithm

IF-UTAMA

ID3 Algorithm (contd)

19

ID3 Algorithm

IF-UTAMA

20

ID3 Algorithm Ilustration Diagram

Given a set of examples, S, categorised in categories ci, then: 1. Choose the root node to be the attribute, A, which scores the highest for information gain relative to S. 2. For each value v that A can possibly take, draw a branch from the node. 3. For each branch from A corresponding to value v, calculate Sv. Then: – If Sv is empty, choose the category cdefault which contains the most examples from S, and put this as the leaf node category which ends that branch. – If Sv contains only examples from a category c, then put c as the leaf node category which ends that branch. – Otherwise, remove A from the set of attributes which can be put into nodes. Then put a new node in the decision tree, where the new attribute being tested in the node is the one which scores highest for information gain relative to Sv (note: not relative to S). This new node starts the cycle again (from 2), with S replaced by Sv in the calculations and the tree gets built iteratively like this.

• The algorithm terminates either when all the attributes have been exhausted, or the decision tree perfectly classifies the examples. IF-UTAMA

21

IF-UTAMA

22

Pembentukan Tree

Entropy

• Spesifikasikan masalah Îmenentukan Atribut dan Target Atribut berdasarkan data yang ada • Hitung nilai Entropy dari setiap kriteria dengan data sample yang ditentukan. • Hitung Information Gain dari setiap kriteria • Node terpilih adalah kriteria dengan Information Gain yang paling tinggi. • Ulangi sampai diperoleh node terakhir yang berisi target atribut

• Entropy(S) adalah jumlah bit yang diperkirakan dibutuhkan untuk dapat mengekstrak suatu kelas (+ atau -) dari sejumlah data acak pada ruang sample S. • Entropy bisa dikatakan sebagai kebutuhan bit untuk menyatakan suatu kelas. Semakin kecil nilai Entropy maka semakin baik untuk digunakan dalam mengekstraksi suatu kelas. • Panjang kode untuk menyatakan informasi secara optimal adalah –log2 p bits untuk messages yang mempunyai probabilitas p.

IF-UTAMA

23

IF-UTAMA

24

4

Entropy (contd)

IF-UTAMA

Entropy (contd)

25

Information Gain

IF-UTAMA

26

Example Training Data Set

IF-UTAMA

27

Example (contd)

IF-UTAMA

IF-UTAMA

28

Example (contd)

29

IF-UTAMA

30

5

Example (contd)

IF-UTAMA

Example (contd)

31

Example (contd)

IF-UTAMA

32

Example (contd)

33

Example (contd)

IF-UTAMA

IF-UTAMA

IF-UTAMA

34

Example (contd)

35

IF-UTAMA

36

6

Extracting Classification Rule Form Tree

Data Mentah

Sample

Atribut

Target Atribut

Decision Tree?? IF-UTAMA

37

IF-UTAMA

Entropy Awal

Entropy Usia • Jumlah instance = 8 • Instance Usia

• Jumlah instance = 8 • Jumlah instance positif = 3 • Jumlah instance negatif = 5

– Muda

Entropy (Hipertensi ) = − Pins tan ce _ positif log 2 Pins tan ce _ positif − Pins tan ce _ negatif log 2 Pins tan ce _ negatif

• Instance positif = 1 • Instance negatif = 3

– Tua • Instance positif = 2 • Instance negatif = 2

⎛⎛ 3 ⎞ ⎛ 5 ⎞⎞ ⎛ 3 ⎞⎞ ⎛⎛ 5 ⎞ = −⎜⎜ ⎜ ⎟ × log 2 ⎜ ⎟ ⎟⎟ − ⎜⎜ ⎜ ⎟ × log 2 ⎜ ⎟ ⎟⎟ ⎝ 8 ⎠⎠ ⎝ 8 ⎠⎠ ⎝⎝ 8 ⎠ ⎝⎝ 8 ⎠ = −(0.375 × log 2 0.375) − (0.625 × log 2 0.625)

• Entropy Usia

Entropy(Muda ) = − Pins tan ce _ positif log 2 Pins tan ce _ positif − Pins tan ce _ negatif log 2 Pins tan ce _ negatif

= −(0.375 × -1.415) − (0.625 × - 0.678)

Entropy(Tua ) = − Pins tan ce _ positif log 2 Pins tan ce _ positif − Pins tan ce _ negatif log 2 Pins tan ce _ negatif

= 0,531 + 0,424 = 0,955 IF-UTAMA

39

– Entropy(muda) = 0.906 – Entropy(tua) = 1 IF-UTAMA

Gain Usia Gain(S , Usia ) = Entropy (S ) −

∑

v∈Muda ,Tua

38

40

Entropy Berat Sv S

Entropy (S v )

• Jumlah instance = 8 • Intance Berat – Overweight

S S = Entropy (S ) − Muda Entropy (S Muda ) − Tua Entropy (STua ) S S 4 4 = (0.955) − (0.906) − (1) 8 8 = 0.955 − 0.453 − 0.5 = 0.002


– Average • Instance positif = 0 • Instance negatif = 2

– Underweight • Instance positif = 0 • Instance negatif = 2

Entropy (Overweight ) = − Pins tan ce _ positif log 2 Pins tan ce _ positif − Pins tan ce _ negatif log 2 Pins tan ce _ negatif

Entropy ( Average ) = − Pins tan ce _ positif log 2 Pins tan ce _ positif − Pins tan ce _ negatif log 2 Pins tan ce _ negatif

Entropy (Underweight ) = − Pins tan ce _ positif log 2 Pins tan ce _ positif − Pins tan ce _ negatif log 2 Pins tan ce _ negatif IF-UTAMA

41

– Entropy(Overweight)=0.906 – Entropy(Average)=0.5 IF-UTAMA – Entropy(Underweight)=0.5

42

7

Gain Usia Gain(S , Berat ) = Entropy (S ) −

Sv

∑

v∈Overwight , Average ,Underweight

= Entropy (S ) −

S Overweight

Entropy (S Overweight ) −

S 4 (0.906) − 2 (0.5) − 2 (0.5) 8 8 8 = 0.955 − 0.453 − 0.125 − 0.125

S Average S

S

Entropy Jenis Kelamin

Entropy (S v )

Entropy (S average ) −

SUnderweight S

Entropy (SUnderweight )

= (0.955) −

• Jumlah instance = 8 • Intance Jenis Kelamin – Pria • Instance positif = 2 • Instance negatif = 4

– Wanita

= 0,252


Entropy (Pr ia ) = − Pins tan ce _ positif log 2 Pins tan ce _ positif − Pins tan ce _ negatif log 2 Pins tan ce _ negatif

Entropy (Wanita ) = − Pins tan ce _ positif log 2 Pins tan ce _ positif − Pins tan ce _ negatif log 2 Pins tan ce _ negatif

– Entropy(Pria)=1 – Entropy(Wanita)=0.75 IF-UTAMA

43

IF-UTAMA

44

Gain Usia Gain(S , JenisKela min ) = Entropy (S ) −

∑

v∈Pr ia ,Wanita

Sv S

Entropy (S v )

S Pr ia S Entropy (S Pr ia ) − Wanita Entropy (SWanita ) S S 6 2 = (0.955) − (1) − (0.75) 8 8 = 0.955 − 0.75 − 0.188 = 0,017

• Atribut yang dipilih adalah atribut berat karena nilai Information Gainnya paling tinggi

= Entropy (S ) −

Berat Underweight

Overweight Average

• Jumlah Instance untuk Overweight = 4 • Jumlah Instance untuk Average = 2 • Jumlah Instance untuk Underweight = 2 • Hitung Gain paling tinggi untuk dijadikan cabang berikutnya

IF-UTAMA

45

Node untuk cabang Overweight

IF-UTAMA

46

Decision Tree yang dihasilkan

• Jumlah instance = 4 • Instance (Berat = Overwight ) & Usia = – Muda • Instance positif = 1 • Instance negatif = 0

– Tua • Instance positif = 2 • Instance negatif = 1

Clasification Rule ????

• Instance (Berat = Overwight ) & Jenis Kelamin = – Pria • Instance positif = 2 • Instance negatif = 1

– Wanita • Instance positif = 1 • Instance negatif = 0

IF-UTAMA

47

IF-UTAMA

48

8

Strength of Decision Tree

IF-UTAMA

Weakness of Decision Tree

49

Studi Kasus

IF-UTAMA

IF-UTAMA

50

Studi Kasus

51

IF-UTAMA

52

Referensi 1. Dr. Mourad YKHLEF.2009. Decision Tree Induction System.King Saud University 2. Achmad Basuki, Iwan Syarif. 2003. Decision Tree. Politeknik Elektronika Negeri Surabaya (PENS) – ITS 3. Simon Colton.2004. Decision Tree Learning.4. Tom M. Mitchell. 1997. Machine Learning. Mc-Graw Hill

IF-UTAMA

53

9

Classification. Decision Tree. Decision Tree. Konsep Decision Tree. Penggunaan Decision Tree. When To Consider Decision Tree?

Recommend Documents