APLIKASI PENGKLASIFIKASIAN DOKUMEN INFO PADA TWITTER MENGGUNAKAN ALGORITMA NAIVE BAYES
TUGAS AKHIR
Sebagai Persyaratan Guna Meraih Gelar Sarjana Strata 1 Teknik Informatika Universitas Muhammadiyah Malang
Oleh: ARIF DARMAWAN NIM. 09560385
JURUSAN TEKNIK INFORMATIKA FAKULTAS TEKNIK UNIVERSITAS MUHAMMADIYAH MALANG 2014/2015
LEMBAR PENGESAHAN APLIKASI PENGKLASIFIKASIAN DOKUMEN INFO PADA TWITTER MENGGUNAKAN ALGORITMA NAIVE BAYES TUGAS AKHIR Sebagai Persyaratan Guna Meraih Gelar Sarjana Strata1 Teknik Informatika Universitas Muhammadiyah Malang
Disusun Oleh: ARIF DARMAWAN NIM. 09560385
Tugas Akhir ini telah diuji dan dinyatakan lulus melalui sidang majelis penguji pada tanggal 10 Oktober 2014 Menyetujui, Penguji I
Penguji II
Agus Eko Minarno, Skom, M.Kom. NIDN. 0729048801
Wahyu Andhyka, S.Kom, M.Kom NIDN. 0720068701
Mengetahui, Ketua Jurusan Teknik Informatika
Yuda Munarko, S.Kom, M.Sc. NIP. 108.0611.0443
KATA PENGANTAR
Dengan memanjatkan puji syukur kehadirat Allah SWT. Atas limpahan rahmat dan hidayah-NYA sehingga peneliti dapat mneyelesaikan tugas akhir yang berjudul : “APLIKASI PENGKLASIFIKASIAN
DOKUMEN INFO PADA TWITTER MENGGUNAKAN ALGORITMA NAIVE BAYES” Di dalam tulisan ini disajikan pokok – pokok bahasan yang meliputi latar belakang penelitian, landasan teori penelitian, analisa dan perancangan aplikasi, implementasi dan pengujian aplikasi, kesimpulan dan saran serta daftar pustaka. Peneliti menyadari sepenuhnya bahwa dalam penulisan tugas akhir ini masih banyak kekurangan dan keterbatasan. Oleh karena itu peneliti mengharapkan saran yang membangun agar tulisan ini bermanfaat bagi perkembangan ilmu pengetahuan kedepan.
Malang, 1 Oktober 2014
Arif Darmawan
DAFTAR ISI Halaman ABSTRAK ............................................................................................................. i ABSTRACT .......................................................................................................... ii LEMBAR PERSEMBAHAN ............................................................................. iii KATA PENGANTAR ......................................................................................... iv DAFTAR ISI .........................................................................................................v DAFTAR GAMBAR ......................................................................................... viii DAFTAR TABEL .................................................................................................x
BAB I PENDAHULUAN ......................................................................................1 1.1. Latar Belakang ..............................................................................................1 1.2. Rumusan Masalah .........................................................................................2 1.3. Tujuan ............................................................................................................2 1.4. Batasan Masalah ............................................................................................2 1.5. Metodologi ....................................................................................................3 1.5.2. Pengumpulan Data ..................................................................................3 1.5.3. Analis Data .............................................................................................3 1.5.4. Implementasi ...........................................................................................3 1.5.5. Uji Coba dan Evaluasi ............................................................................4 1.6. Sistematika Penulisan ....................................................................................4
BAB II LANDASAN TEORI ...............................................................................6 2.1. Bahasa Pemrograman Java ...........................................................................6 2.2. MySQL..........................................................................................................6 2.3. Twitter ...........................................................................................................7 2.4. Data Mining ..................................................................................................7 2.4.1. Text Mining ............................................................................................7 2.4.2. Preprocessing ..........................................................................................7 1. Case Folding ...........................................................................................7 2. Tokenizing...............................................................................................8
3. Filtering / Stopwords removal .................................................................8 4. Stemming ...............................................................................................8 2.5. Klasifikasi .................................................................................................8 2.6. Naïve Bayes Classifier ................................................................................9 2.7. Confusion Matrix .....................................................................................10
BAB III ANALISA DAN PERANCANGAN SISTEM ....................................12 3.1. Analisis Data ...............................................................................................12 3.2. Diagram Sistem ...........................................................................................16 3.2.1. Preprocessing ........................................................................................17 3.2.1.1. Case Folding ..................................................................................17 3.2.1.2. Tokenizing......................................................................................18 3.2.1.3. Filtering ..........................................................................................18 3.2.1.4. Stemming .......................................................................................19 3.3. Perancangan Sistem ....................................................................................20 3.3.1. Use Case Diagram .............................................................................20 3.3.2. Activity Diagram ...............................................................................21 3.3.2.1. Activity Diagram Tambah Data ..................................................21 3.3.2.2. Activity Diagram Klasifikasi ......................................................22 3.3.2.3. Activity Diagram Lihat Data Terklasifikasi ................................22 3.3.3. Class Diagram ...................................................................................23 3.3.4. Sequence Diagram ............................................................................24 3.3.4.1. Sequence Diagram Tambah Data ...............................................24 3.3.4.2. Sequence Diagram Klasifikasi ...................................................25 3.3.4.3. Sequence Diagram Lihat Data Transaksi ...................................25 3.3.5. ERD (Entity Relationship Diagram) .................................................26 3.3.6. Desain User Interface ........................................................................26
BAB IV IMPLEMENTASI DAN PENGUJIAN ...............................................28 4.1. Kebutuhan Perangkat Keras ........................................................................28 4.2. Kebutuhan Perangkat Lunak .......................................................................28 4.3. Implementasi Sistem ...................................................................................28
4.4. Pembuatan Database ...................................................................................28 4.5. Pembuatan Koneksi Database .....................................................................32 4.6. Perancangan dan Pengujian.........................................................................32 4.6.1. Tampilan Tambah Data .....................................................................32 4.6.2 Tampilan Pilih Tweet .........................................................................33 4.6.3. Tampilan Klasifikasi .........................................................................34 4.6.4. Tampilan Simpan ..............................................................................38 4.6.5 Tampilan Cek Tabel ...........................................................................39 4.6.6. Tampilan Cek Tabel+Reply ..............................................................40 4.7. Data Uji ....................................................................................................40 4.8. Perhitungan Confussion Matrix ...............................................................47
BAB V PENUTUP ................................................................................................52 5.1. Kesimpulan ..................................................................................................52 5.2. Saran ............................................................................................................52
DAFTAR PUSTAKA .........................................................................................53
BIOGRAFI PENULIS ........................................................................................54
DAFTAR GAMBAR Halaman Gambar 3.1. Diagram Sistem .................................................................................16 Gambar 3.2. Alur Case Folding .............................................................................17 Gambar 3.3. Alur Tokenizing ................................................................................18 Gambar 3.4. Alur Filtering .....................................................................................18 Gambar 3.5. Alur Stemming ..................................................................................19 Gambar 3.6. Use Case Diagram .............................................................................20 Gambar 3.7. Activity Diagram Baca Timeline ......................................................21 Gambar 3.8. Activity Diagram Klasifikasi ............................................................22 Gambar 3.9. Activity Diagram Lihat Data Terklasifikasi ......................................22 Gambar 3.10. Class Diagram .................................................................................23 Gambar 3.11. Sequence Diagram Tambah Data ....................................................24 Gambar 3.12. Sequence Diagram Klasifikasi ........................................................25 Gambar 3.13. Sequence Diagram Lihat Data Terklasifikasi..................................25 Gambar 3.14. Gambar Relasi Database .................................................................26 Gambar 3.15. Desain User Interface ......................................................................26 Gambar 4.1. Tabel Database ..................................................................................29 Gambar 4.2. Script Tabel Case Folding .................................................................29 Gambar 4.3. Tabel Case Folding............................................................................29 Gambar 4.4. Script Tabel Katadasar ......................................................................29 Gambar 4.5. Tabel Katadasar .................................................................................29 Gambar 4.6. Script Tabel Stopword.......................................................................30 Gambar 4.7. Tabel Stopword .................................................................................30 Gambar 4.8. Script Tabel Kategori ........................................................................30 Gambar 4.9. Tabel Kategori ...................................................................................30 Gambar 4.10. Script Tabel Tweet ..........................................................................30 Gambar 4.11. Tabel Tweet .....................................................................................31 Gambar 4.12. Script Tabel Kategori Tweet ...........................................................31 Gambar 4.13. Tabel Kategori Tweet ......................................................................31 Gambar 4.14. Script Tabel Kategori Tweet Reply .................................................31
Gambar 4.15. Tabel Kategori Tweet Reply ...........................................................32 Gambar 4.16. Koneksi Database ............................................................................32 Gambar 4.17. Script Tambah Data .........................................................................33 Gambar 4.18. Tampilan Tambah Data ...................................................................33 Gambar 4.19. Script Event Klik .............................................................................33 Gambar 4.20. Tampilan Pilih Tweet ......................................................................33 Gambar 4.21. Script Case Folding .........................................................................34 Gambar 4.22. Script Tokenizing ............................................................................35 Gambar 4.23. Script Filtering (Stopword) .............................................................35 Gambar 4.24. Script Stemming ..............................................................................35 Gambar 4.25. Script Klasifikasi .............................................................................46 Gambar 4.26. Tampilan Hasil Klasifikasi ..............................................................38 Gambar 4.27. Script Simpan ..................................................................................38 Gambar 4.28. Tampilan Simpan ............................................................................39 Gambar 4.29. Script Cek Tabel ..............................................................................39 Gambar 4.30. Tampilan Cek Tabel ........................................................................40 Gambar 4.31. Script Cek Tabel + Reply ................................................................40 Gambar 4.32. Tampilan Cek Tabel + Reply ..........................................................40 Gambar 4.33. Grafik Accuracy ..............................................................................48 Gambar 4.34. Grafik Pressision .............................................................................48 Gambar 4.35. Grafik Recall ...................................................................................49 Gambar 4.36. Grafik F-Measure ............................................................................49
DAFTAR TABEL Halaman Tabel 3.1. Contoh Data Latih .................................................................................12 Tabel 3.2. Hasil Kosakata Data Latih ....................................................................13 Tabel 3.3. Nilai P (Vj) Tiap Kategori ....................................................................13 Tabel 3.4. Nilai Probabilistik .................................................................................13 Tabel 3.5. Deskripsi Use Case Diagram ................................................................21 Tabel 3.6. Deskripsi Class Diagram .......................................................................23 Tabel 4.1. Tabel Contoh Data Uji ..........................................................................41 Tabel 4.2. Tabel Confussion Matrix ......................................................................47 Tabel 4.3. Perhitungan Confussion Matrix ............................................................48
DAFTAR PUSTAKA Firmansyah, Dimas Ricky. 2013. Implementasi Text Minning Klasifikasi Objek Wisata dengan Metode Naïve Bayes Classifier Di dinas Pariwisata dan Kebudayaan Jawa Barat. Seminar: Teknik Informatika : Unikom Hamzah, A. 2010. Deteksi Bahasa untuk Dokumen Berbahasa Indonesia. Seminar Nasional Informatika 2010 (semnasIF 2010). Khodra, L.M., & Wibisono, Y. 2005, Clustering berita Berbahasa Indonesia. Internal Publication, Fakultas Matematika dan Ilmu Pengetahuan Alam, Universitas Pendidikan Indonesia, Bandung, Jawa Barat. Kunaifi, Aang. 2009. Klasifikasi Email Berbahasa Indonesia Menggunakan Text Mining Dan Algoritma Kmeans. Surabaya: Politeknik Elektronika Negeri Surabaya Musthafa, Aziz. 2009. Klasifikasi Otomatis Dokumen Berita Kejadian Berbahasa Indonesia. Skripsi. Jurusan Teknik Informatika UIN Malang : Malang. Rachli, Muhamad.2007. Email Filtering menggunakan Naïve Bayesian. Bandung : Tugas Akhir Jurusan Teknik Elektro Institut Teknologi Bandung Santoso, Budi. 2007. Data Mining Teknik Pemanfaatan Data untuk Keperluan Bisnis. Yogyakarta : Graha Ilmu. Saraswati, Ni Wayan Sumartini. 2011. Text Mining dengan Metode Naïve Bayes Classifier dan Support Vector Machines untuk Sentiment Analysis. Thesis. Universitas Udayana: Denpasar Wibisono, Y. 2005, Klasifikasi Berita Berbahasa Indonesia menggunakan Ontologi ,Fakultas Matematika dan Ilmu Pengetahuan Alam, Universitas Pendidikan Indonesia,Bandung, Jawa Barat. Y.C. Tang. 2006. Granular Support Vector Machines based on Granular Computing, Soft Computing and Statistical Learning. A Dissertation Submitted in Partial FulFillment of Requirements for degree of Doctor of Philosop of Arts and Sciences : Georgia State University.