ANALISIS PERBANDINGAN ALGORITMA DECISION TREE J48 DAN NAÏVE BAYES DALAM MENGKLASIFIKASIKAN POLA PENYAKIT
SKRIPSI
Oleh :
Frista Yulianora
1401128832
Muchammad Hasbi Latif
1401136065
Rika Jubel Febriana
1401136790
Kelas : 07 PAM
Universitas Bina Nusantara Jakarta 2014
aDAFTAR ISI
Halaman Judul Luar ..................................................................................................i Halaman Judul Dalam .............................................................................................. ii Halaman Persetujuan Pembimbing .......................................................................... iii Halaman Pernyataan Dewan Penguji ........................................................................ iv Halaman Pernyataan dan Persetujuan Publikasi LTA ..............................................vii Abstrak ................................................................................................................. viii Kata Pengantar.......................................................................................................... x Daftar Isi.................................................................................................................xii Daftar Tabel ............................................................................................................ xv Daftar Gambar ......................................................................................................xvii Daftar Lampiran ..................................................................................................xviii BAB 1 PENDAHULUAN........................................................................................ 1 1.1 Latar Belakang ................................................................................................ 1 1.2 Ruang Lingkup ................................................................................................ 2 1.3 Tujuan dan Manfaat ......................................................................................... 3 1.4 Metodologi ...................................................................................................... 3 1.4.1 Metode Pengumpulan Data. ...................................................................... 3 1.4.2 Instrumen Penelitian. ................................................................................ 3 1.4.3 Kerangka Pikir .......................................................................................... 4 1.4.4 Tahapan Perancangan................................................................................ 4 1.5 Sistematika Penulisan ...................................................................................... 5 BAB 2 LANDASAN TEORI ................................................................................... 7 2.1 Pengertian Data Mining ................................................................................... 7 2.2 Fungsi Data Mining ......................................................................................... 9 2.3 Data Preprocessing ....................................................................................... 11 2.3.1 Data Preprocessing: An Overview ........................................................... 11 2.3.2 Data Cleaning ......................................................................................... 12 2.3.3 Data Integration ...................................................................................... 14 2.3.4 Data Reduction........................................................................................ 14 2.3.5 Data Transformation and Data Discretization ......................................... 14 2.4 Teknik Data Mining ...................................................................................... 15
2.4.1 Association Rule Mining ......................................................................... 15 2.4.2 Classification .......................................................................................... 15 2.4.3 Clustering ............................................................................................... 17 2.4.4 Regresi.................................................................................................... 18 2.5 Metode Data Mining ..................................................................................... 18 2.5.1 Naïve Bayes ............................................................................................ 18 2.5.2 Decision Tree .......................................................................................... 21 2.5.3 K-Means ................................................................................................. 23 2.6 Framework Data Mining ............................................................................... 25 2.7 WEKA .......................................................................................................... 26 2.7.1 Format Input WEKA ............................................................................... 27 2.7.2 Algoritma Decision Tree J48 pada WEKA .............................................. 28 2.7.3 Algoritma Naïve Bayes pada WEKA ...................................................... 28 2.7.4 Test Option WEKA ................................................................................. 29 2.8 Structure Query Language (SQL) .................................................................. 29 2.9 Data Warehouse ............................................................................................ 30 2.10 OLTP (Online Transaction Processing) ....................................................... 30 2.11 ETL (Extraction Transformation Loading) .................................................. 31 2.12 Star Schema ................................................................................................. 32 2.13 Rumah Sakit ................................................................................................ 32 2.14 Rekam Medik .............................................................................................. 33 2.15 ICD-10 ........................................................................................................ 33 BAB 3 ANALISIS SISTEM BERJALAN ............................................................ 35 3.1 Latar Belakang Rumah Sakit ......................................................................... 35 3.1.1 Sejarah Rumah Sakit ............................................................................... 35 3.1.2 Visi ......................................................................................................... 36 3.1.3 Misi ........................................................................................................ 36 3.1.4 Tujuan Rumah Sakit................................................................................ 37 3.2 Struktur Organisasi ........................................................................................ 38 3.3 Sumber Data .................................................................................................. 51 BAB 4 HASIL DAN PEMBAHASAN .................................................................. 55 4.1 Arsitektur Data Mining ................................................................................. 55 4.2 Functional Data Warehouse .......................................................................... 56
4.1.1 Data Selection......................................................................................... 56 4.1.2 Data Cleaning......................................................................................... 57 4.1.3 Data Transformation............................................................................... 59 4.3 Data Mining .................................................................................................. 64 4.3.1 Decision Tree J48 ................................................................................... 65 4.3.1.1 Confidence ...................................................................................... 65 4.3.1.2 Cross Validation dan Confusion Matrix ........................................... 66 4.3.2 Naïve Bayes ............................................................................................ 72 4.4 Perbandingan Algoritma Decision Tree J48 dan Naïve Bayes ........................ 77 4.4.1 ROC AREA ............................................................................................ 85 BAB 5 SIMPULAN DAN SARAN ....................................................................... 95 5.1 Simpulan ....................................................................................................... 95 5.2 Saran ............................................................................................................. 95 DAFTAR PUSTAKA ............................................................................................ 97 LAMPIRAN-LAMPIRAN................................................................................... L1 DAFTAR RIWAYAT HIDUP SURAT SURVEI
DAFTAR TABEL Tabel 3.1 Tabel Rekam Medis ................................................................................ 52 Tabel 4.1 Kategori Umur ........................................................................................ 59 Tabel 4.2 Pengelompokkan ICDX........................................................................... 61 Tabel 4.3 Pengelompokkan Wilayah ....................................................................... 62 Tabel 4.4 Hasil Confidence Decision Tree J48 ........................................................ 66 Tabel 4.5 Decision Tree J48 menggunakan K-Fold Cross Validation ...................... 66 Tabel 4.6 Hasil Perhitungan Precision Decesion Tree J48 ....................................... 69 Tabel 4.7 Hasil Perhitungan Recall Decesion Tree J48............................................ 70 Tabel 4.8 Hasil Perhitungan F-Measure Decesion Tree J48 .................................... 71 Tabel 4.9 Naïve Bayes menggunakan K-Fold Cross Validation............................... 72 Tabel 4.10 Hasil Perhitungan Precision Naïve Bayes .............................................. 74 Tabel 4.11 Hasil Perhitungan Recall Naïve Bayes ................................................... 75 Tabel 4.11 Hasil Perhitungan F-Measure Naïve Bayes ............................................ 76 Tabel 4.13 Perbandingan Precision Decision Tree J48 dan Naïve Bayes ................. 78 Tabel 4.14 Perbandingan Recall Decision Tree J48 dan Naïve Bayes...................... 81 Tabel 4.15 Perbandingan F-Measure Decision Tree J48 dan Naïve Bayes ............... 83 Tabel 4.16 Perbandingan Algoritma Decision Tree J48 dan Naïve Bayes ................ 94
DAFTAR GAMBAR Gambar 2.1 Classification-Decision Tree .................................................................. 8 Gambar 2.2 Clustering ............................................................................................ 10 Gambar 2.3 Product Association ............................................................................. 10 Gambar 2.4 Time Series .......................................................................................... 11 Gambar 2.5 Web Navigation Sequence ................................................................... 11 Gambar 2.6 Bentuk Data preprocessing.................................................................. 12 Gambar 2.7 Rumus Precision dan Recall ................................................................ 16 Gambar 2.8 Rumus F-Measure ............................................................................... 16 Gambar 2.9 Rumus F-Measure Dataset .................................................................. 17 Gambar 2.10 Rumus Akurasi .................................................................................. 17 Gambar 2.11 Rumus Classifier Naïve Bayesian (1) ................................................. 18 Gambar 2.12 Rumus Classifier Naïve Bayesian (2) ................................................. 19 Gambar 2.13 Rumus Classifier Naïve Bayesian (3) ................................................ 20 Gambar 2.14 Rumus Classifier Naïve Bayesian (4) ................................................. 20 Gambar 2.15 Rumus Classifier Naïve Bayesian (5) ................................................. 20 Gambar 2.16 Rumus Classifier Naïve Bayesian (6) ................................................. 21 Gambar 2.17 Rumus Entropy .................................................................................. 22 Gambar 2.18 Rumus Information Gain ................................................................... 22 Gambar 2.19 Tampilan Awal GUI WEKA .............................................................. 27 Gambar 3.1 Struktur Organisasi RSAL Dr. Mintohardjo ......................................... 38 Gambar 3.2 Sumber Data (1) .................................................................................. 51 Gambar 3.3 Sumber Data (2) .................................................................................. 52 Gambar 4.1 Arsitektur Data Mining ........................................................................ 55 Gambar 4.2 Data setelah Tahap Data Selection ....................................................... 57 Gambar 4.3 Data yang mengandung Missing Value ................................................ 58 Gambar 4.4 SQL Query Pengelompokkan Umur..................................................... 60 Gambar 4.5 SQL Query Pengelompokkan Wilayah................................................. 63 Gambar 4.6 Hasil Data Transformation................................................................... 63 Gambar 4.7 Star Schema ......................................................................................... 64 Gambar 4.8 Hasil Confusion Matrix Decision Tree J48 di WEKA .......................... 67 Gambar 4.9 Hasil Confusion Matrix Naïve Bayes di WEKA ................................... 73 Gambar 4.10 Grafik Perbandingan Precision DT J48 dan Naïve Bayes ................... 79
Gambar 4.11 Grafik Perbandingan Recall DT J48 dan Naïve Bayes ........................ 81 Gambar 4.12 Grafik Perbandingan F-Measure DT J48 dan Naïve Bayes ................. 84 Gambar 4.13 Kurva ROC UOO-U99 ...................................................................... 85 Gambar 4.14 Kurva ROC A00-B99 ........................................................................ 86 Gambar 4.15 Kurva ROC C00-D48 ........................................................................ 87 Gambar 4.16 Kurva ROC E00-E90 ......................................................................... 87 Gambar 4.17 Kurva ROC G00-G99 ........................................................................ 88 Gambar 4.18 Kurva ROC I00-I99 ........................................................................... 89 Gambar 4.19 Kurva ROC I00-I99 ........................................................................... 89 Gambar 4.20 Kurva ROC K00-K93 ........................................................................ 90 Gambar 4.21 Kurva ROC L00-L99 ......................................................................... 91 Gambar 4.22 Kurva ROC M00-M99 ....................................................................... 91 Gambar 4.23 Kurva ROC N00-N99 ........................................................................ 92 Gambar 4.24 Kurva ROC R00-R99......................................................................... 92 Gambar 4.25 Kurva ROC S00-T98 ......................................................................... 93 Gambar 4.26 Kurva ROC Z00-Z990 ....................................................................... 93
DAFTAR LAMPIRAN
LAMPIRAN- LAMPIRAN .................................................................................. L1 Hasil Wawancara ............................................................................................... L1
KATA PENGANTAR Puji dan Syukur penulis panjatkan kehadirat Tuhan Yang Maha Esa, karena berkat Rahmat-Nya penulisan skripsi ini dapat diselesaikan sesuai dengan harapan dan tepat waktu. Adapun maksud dan tujuan dari penulisan skripsi ini adalah sebagai salah satu syarat kelulusan studi Jurusan Sistem Informasi Jenjang Pendidikan S1. Penulis menyadari bahwa skripsi ini dapat diselesaikan bukan hanya karena kerja penulis, melainkan berkat bimbingan, arahan, bantuan dan dukungan dari berbagai pihak. Oleh karena itu, pada kesempatan ini penulis ingin mengucapkan terima kasih kepada semua pihak, terutama kepada : 1. Bapak Prof. Dr. Ir. Harjanto Prabowo, MM, selaku Rektor Universitas Bina Nusantara. 2. Bapak Johan, S.Kom, MM selaku Ketua Jurusan Sistem Informasi Universitas Bina Nusantara. 3. Ibu Mediana Aryuni, S.Kom, M.Kom selaku dosen pembimbing Skripsi yang telah banyak meluangkan waktunya untuk memberikan saran dan bimbingan dalam penulisan skripsi ini. 4. Orang Tua, keluarga dan teman-teman yang selalu memberikan dukungan dan semangat baik secara moril ataupun materil untuk menyelesaikan skripsi ini tepat waktu. 5. Kepada pihak RSAL DR. Mintohardjo yang telah memberikan izin untuk melakukan survey dan memberikan data yang diperlukan sehingga skripsi ini dapat diselesaikan dengan baik. 6. Seluruh Dosen di Universitas Bina Nusantara yang telah mengajari dan memberikan ilmu selama masa kuliah. 7. Semua pihak-pihak yang tidak dapat penulis sebutkan satu per satu karena begitu banyak bantuan dan dukungan dari berbagai pihak untuk kelancaran penulisan skripsi ini, pada kesempatan ini penulis mengucapkan banyak terima kasih atas bantuan dan dukungan yang telah diberikan baik secara langsung atau tidak langsung.
Penulis menyadari bahwa terdapat ketidaksempurnaan dalam Skripsi ini, oleh karena itu, penulis mengharapkan kritik dan saran yang bersifat membangun agar kelak Skripsi ini dapat menjadi lebih baik.
Akhir kata penulis mengucapkan banyak terima kasih dan berharap Skripsi yang disusun oleh penulis dapat bermanfat bagi para pembaca.