IMPLEMENTASI ALGORITMA NAÏVE BAYES UNTUK EKSTRAKSI KALIMAT OPINI PADA ARTIKEL BERBAHASA INDONESIA
Tugas Akhir Sebagai Persyaratan Guna Meraih Gelar Sarjana Strata 1 Teknik Informatika Universitas Muhammadiyah Malang
Oleh :
Niken Kusuma Dewi 201010370311007
JURUSAN INFORMATIKA FAKULTAS TEKNIK UNIVERSITAS MUHAMMADIYAH MALANG
2015
KATA PENGANTAR
Puji syukur kehadirat Allah SWT atas segala limpahan rahmat dan hidayah-Nya, serta shalawat dan salam kepada Rasulullah SAW sehingga penulis dapat menyelesaikan tugas akhir yang berjudul : “IMPLEMENTASI ALGORITMA NAÏVE BAYES UNTUK EKSTRAKSI KALIMAT OPINI PADA ARTIKEL BERBAHASA INDONESIA” Di dalam tulisan ini disajikan pokok-pokok bahasan yang meliputi text mining, opinion mining, dan juga penerapan algoritma naive bayes untuk ekstraksi kalimat opini pada artikel berbahasa Indonesia. Pada bagian akhir juga dijelaskan mengenai skenario pengujian sistem untuk mengukur nilai akurasidari sistem. Penulisan tugas akhir ini dimaksudkan sebagai salah satu syarat kelulusan dari pembelajaran jenjang S1 pada Program Studi Teknik Informatika di Universitas Muhammadiyah Malang. Penulis menyadari sepenuhnnya bahwa dalam penulisan tugas akhir ini masih banyak kekurangan dan keterbatasan. Oleh karena itu penulis mengharapkan saran yang membangun agar tulisan ini bermanfaat bagi perkembangan ilmu pengetahuan kedepan.
Malang, Januari 2015
Penulis
DAFTAR ISI
Halaman Judul Lembar Persetujuan ...............................................................................
i
Lembar Pengesahan................................................................................
ii
Lembar Pernyataan Keaslian ................................................................
iii
Abstrak......................................................................................................
iv
Abstract ......................................................................................................
v
Lembar Persembahan ............................................................................
vi
Kata Pengantar ........................................................................................
vii
Daftar Isi ...................................................................................................
viii
Daftar Gambar .........................................................................................
xi
Daftar Tabel .............................................................................................
xiii
BAB I
PENDAHULUAN 1.1. Latar Belakang Penelitian .....................................................
1
1.2. Rumusan Masalah .................................................................
2
1.3. Batasan Masalah ..................................................................
2
1.4. Tujuan Penelitian ..................................................................
2
1.5. Metodologi Penelitian ...........................................................
3
1.5.1. Studi Literatur .......................................................
4
1.5.2. Pengumpulan Data ................................................
3
1.5.3. Analisis Data dan Sistem ......................................
3
1.5.4. Implementasi .........................................................
3
1.5.5. Pengujian ...............................................................
4
1.5.6. Pembuatan Laporan ..............................................
4
1.6. Sistematika Penulisan ..........................................................
4
BAB II
LANDASAN TEORI
2.1. Studi Literatur ........................................................................
6
2.2. Text Mining ............................................................................
6
2.2.1 Text Processing .........................................................
7
2.2.2 Proses Pembobotan Term ........................................
10
2.3. Pengertian Klasifikasi ............................................................
10
2.4. Sentiment Analysis .................................................................
10
2.5. Algoritma Naive Bayes .........................................................
12
2.5.1. Model Probabilistik Naïve Bayes ...........................
12
2.5.2. Membangun Klasifikasi Dari Model Probabilistik
14
2.5.3. Klasifikasi Dokumen Multikelas ...........................
15
BAB III
ANALISA DAN PERANCANGAN SISTEM
3.1. Analisa Sistem ......................................................................
18
3.1.1. Persiapan Data ......................................................
19
3.1.2. Analisa Perhitungan Naïve Bayes .........................
19
3.2. Perancangan Sistem ..............................................................
20
3.3. Perancangan Sistem Klasifikasi ............................................
21
3.3.1. Flowchart Sistem ..................................................
21
3.3.2. Perancangan Use Case Diagram ............................
22
3.3.3. Activity Diagram ....................................................
23
3.3.4. Sequence Diagram .................................................
25
3.3.5. Class Diagram .......................................................
27
3.4. Analisa Data ..........................................................................
28
3.5. Rancangan User Interface......................................................
30
BAB IV 4.1
4.2
IMPLEMENTASI DAN PENGUJIAN Implementasi Sistem .............................................................
31
4.1.1
Implementasi Preprocessing ....................................
33
4.1.2
Implementasi Stemming ............................................
32
4.1.3
Implementasi Algoritma Naive Bayes .....................
35
4.1.4
Implementasi Manage Data latih Opini dan Fakta ..
38
Pengujian Sistem ..................................................................
40
4.2.1 Pengujian Fungsionalitas Sistem ..............................
40
4.2.1.1 Halaman Awal .............................................
40
4.2.1.2 Halaman Proses Ekstraksi Kalimat Artikel ..
41
4.2.1.3 Tampilan Manage Data Latih Opini dan Fakta 42
BAB V
4.2.1.4 Halaman Create Data ....................................
44
4.2.1.5 Halaman Update Data .................................
44
4.2.1.6 Halaman Delete Data ...................................
44
4.2.2 Pengujian Keberhasilan Sistem.................................
45
KESIMPULAN DAN SARAN
5.1. Kesimpulan ..........................................................................
52
5.2. Saran ....................................................................................
52
DAFTAR PUSTAKA ...............................................................................
53
DAFTAR GAMBAR
Gambar 2.1
Tahapan Text Mining ..................................................................
7
Gambar 2.2
Contoh dari Tokenisasi ...............................................................
8
Gambar 2.3
Contoh dari Stopwords ................................................................
9
Gambar 2.4
Contoh dari Stemming .................................................................
10
Gambar 3.1
Arsitektur Sistem .......................................................................
20
Gambar 3.2
Flowchart Sistem .......................................................................
22
Gambar 3.3
Use Case Diagram dari system ..................................................
23
Gambar 3.4
Activity Diagram Manage Data Latih .........................................
24
Gambar 3.5
Activity Diagram Klasifikasi Kalimat ........................................
25
Gambar 3.6
Sequence Diagram dari Pelatihan Data Opimi ...........................
26
Gambar 3.7
Sequence Diagram Klasiikasi Data Uji ......................................
27
Gambar 3.8
Class Diagram ...........................................................................
27
Gambar 3.9
Conceptual Data Model Sistem .................................................
29
Gambar 3.10 Physical Data Model Sistem ......................................................
29
Gambar 3.11 Rancangan User Interface ..........................................................
30
Gambar 4.1
Implementasi Casefolding ..........................................................
31
Gambar 4.2
Implementasi Penghapusan URL dan Tokenizing .....................
32
Gambar 4.3
Implementasi Hapus Stopwords .................................................
32
Gambar 4.4
Implementasi Fungsi del_inflection_suffixes() .........................
33
Gambar 4.5
Implementasi Fungsi del_derrivation_suffixes() .......................
34
Gambar 4.6
Implementasi Fungsi del_derrivation_prefix() ..........................
34
Gambar 4.7
Implementasi Perhitungan Kalimat Artikel ................................
37
Gambar 4.8
Implementasi Create Data .........................................................
38
Gambar 4.9
Implementasi Delete Data .........................................................
38
Gambar 4.10 Implementasi Update Data ........................................................
39
Gambar 4.11 Halaman Awal ...........................................................................
40
Gambar 4.12 Tampilan Memasukkan Artikel .................................................
40
Gambar 4.13 Proses Perhitungan Ekstraksi .....................................................
41
Gambar 4.14 Hasil Ekstraksi ...........................................................................
41
Gambar 4.15 Tampilan Manage Data Latih Fakta ..........................................
42
Gambar 4.16 Tampilan Manage Data Latih Fakta ...........................................
42
Gambar 4.17 Tampilan Create Data ...............................................................
43
Gambar 4.18 Tampilan Update Data ..............................................................
43
Gambar 4.19 Tampilan Delete Data ................................................................
44
DAFTAR TABEL
Tabel 3.1 Penjelasan Use Case Diagram ........................................................
23
Tabel 4.1 Tabel Hasil Uji Ekstraksi Kalimat Artikel Opini dan Fakta ...........
49
DAFTAR PUSTAKA
[1]
Arif, Ramadhana Sanja, Kenneth Setiawan Sarashadi, et al, “KLASIFIKASI OTOMATIS DOKUMEN BERITA KEJADIAN BERBAHASA INDONESIA MENGGUNAKAN METODE NAIVE BAYES“. Ilmu Komputer Universitas Brawijaya.
[2]
Wiebe, Janyce, Rebecca F. Bruce, and Thomas P. O'Hara. Development and use of a gold-standard data set for subjectivity classifications. In Proceedings of the Association for Computational Linguistics (ACL-1999).1999.
[3]
Bing Liu. Sentiment Analysis and Opinion Mining. University of Illinois at Chicago. 2012
[4]
Wibisono, Y. (2005). Klasifikasi Berita Berbahasa Indonesia menggunakan Naïve
Bayes
Classifier.
[Online].
Tersedia
di:
http://fpmipa.upi.edu/staff/yudi/yudi_0805.pdf [5]
Rozi, Imam Fahrur. (2013). “Implementasi Rule-Based Document Subjectivity pada Sistem Opinion Mining”.
[6]
Bo Pang, Lilian Lee, and Shivakumar Vaithyanathan. (2002). Tumbhs up? Sentiment Classification using Machine Learning, in Proceedings of the ACL02 conference on Empirical methods in natural language processing, vol. Volume 10, pp. 79-86, Morristown, NJ, USA.
[7]
Ni Wayan Sumartini Saraswati, 2011, “Text Mining Dengan Naïve Bayes Classifier dan Support Vector Machine Untuk Sentiment Analysis”, Program Studi Teknik Elektro, Universitas Udayana.
[8]
Hamzah Amir, 2012, “Klasifikasi Teks Dengan Naïve Bayes Classifier (NBC) Untuk Pengelompokkan Teks Berita Dan Abstract Akademis”, Fakultas Teknologi Industri, Institut Sains dan Teknologi AKPRIND
[9]
fairuzabadi,
2009,
Preprocessing
Data
Mining,
(https://fairuzelsaid.files.wordpress.com/2009/10/bab-4-preprocessing-datamining.pdf, diakses tanggal 10 November 2014)
[10]
Dharwiyanti Sri, Wahono Romi Satria, 2003. “Pengantar Unified Modeling Language”. Kuliah Umum IlmuKomputer.Com.
[11]
Tan, P.N., Steinbach, M. dan Kumar. V, 2006, “Introduction to Data Mining”, Boston: Pearson Addison Wesley.