PENELITIAN IPTEKS
LAPORAN PENELITIAN IPTEKS
KLASIFIKASI KALIMAT SOAL MENGGUNAKAN ALGORITMA NAIVE BAYES BERBASIS KERNEL DENSITY ESTIMATION DENGAN SELEKSI FITUR
OLEH : Bowo Nurhadiyono,S.Si, M.Kom, NPP : 0686.11.1996.102 T Sutojo, S.Si., M.Kom, NPP : 0686.11.1996.094 Catur Supriyanto, S.Kom, M.CS, NPP : 0686.11.2011.415
FAKULTAS ILMU KOMPUTER UNIVERSITAS DIAN NUSWANTORO SEMARANG MEI 2013
RINGKASAN DAN SUMMARY Proses otomatisasi penentuan kalimat soal berbasis Taksonomi Bloom perlu dilakukan untuk membantu dosen atau guru dalam proses penyusunan soal ujian. Penelitian ini mencoba melakukan otomatisasi penentuan kalimat soal berbasis Taksonomi Bloom dengan menggunakan algoritma Naïve Bayes berbasis kernel density estimation dengan seleksi fitur. Hasil dari penelitian ini menunjukan hasil yang lebih baik dari peneliti sebelumnya. Dengan dataset yang sama, penelitian ini mampu mencapai tingkat akurasi sebesar 77,8% melebihi akurasi peneliti sebelumnya sebesar 65,26%. Untuk penelitian mendatang, kalimat soal berbahasa Indonesia dapat digunakan sebagai dataset sehingga kedepannya perangkat lunak tersebut dapat diimplementasikan. Pemilihan algortima klasifikasi yang lain perlu dilakukan sehingga tingkat akurasi penentual kalimat soal berdasar Taksonomi Bloom ini dapat ditingkatkan.
Kata Kunci: Kernel Density Estimation, Bloom
Klasifikasi,Naïve Bayes, Seleksi Fitur,Taksonomi
PRAKATA Alhamdulillah, kami ucapkan kepada Allah SWT atas kehendakNya kami dapat menyelesaikan penelitian ini tepat pada waktu yang telah ditentukan, selain itu kami juga mengucapkan banyak terima kasih kepada : 1.
Bapak DR. Ir. Edi Noersasongko, M.Kom, selaku Rektor Universitas Dian Nuswantoro
2.
Bapak DR. Abdul Syukur, selaku Dekan Fakultas Ilmu Komputer Universitas Dian Nuswantoro
3.
Bapak Tyas Catur Pramudi, S.Si., M.Kom, selaku Ketua LP2M Universitas Dian Nuswantoro
4.
Bapak DR. Heru Agus Santoso, M.Kom, selaku Ketua Program Studi Teknik Informatika Fakultas Ilmu Komputer Universitas Dian Nuswantoro
5.
Teman-Teman Dosen Fakultas Ilmu Komputer
Yang telah membantu kami baik langsung maupun tidak langsung dalam penelitian yang kami lakukan, semoga kebaikan Bapak-Bapak mendapat ganjaran dari Allah SWT, amin....
Semarang, 25 Mei 2013 Tim Peneliti
DAFTAR ISI
HALAMAN ENGESAHAN ......................................................................................... i RINGKASAN ................................................................................................................ ii PRAKATA ..................................................................................................................... iii DAFTAR ISI................................................................................................................... iv DAFTAR TABEL .......................................................................................................... vi DAFTAR GAMBAR .................................................................................................... vii DAFTAR LAMPIRAN.................................................................................................
viii
BAB I : PENDAHULUAN 1.1. Latar Belakang ...................................................................................................... 1 1.2. Perumusan Masalah .............................................................................................. 2 1.3. Batasan Penelitian ................................................................................................ 3 BAB II : TINJAUAN PUSTAKA 2.1. Pendahuluan ......................................................................................................... 4 2.2. Taksonomi Bloom ................................................................................................ 4 2.3. Text Prossesing ...................................................................................................
4
2.4. Term Weighting ................................................................................................... 5 2.5. Seleksi Fitur .......................................................................................................... 6 2.5.1. Seleksi Fitur Berbasis Ranking ............................................................................ 6 2.5.2. Seleksi Fitur Berbasis Wrapper .......................................................................... 7 2.6. Algoritma Naive Bayes ........................................................................................ 8 2.6.1. Algoritma Naive Bayes Berbasis Kerenl Density Estimation.............................. 9 BAB III : TUJUAN DAN MANFAAT PENELITIAN 3.1. Tujuan .................................................................................................................. 11 3.2. Manfaat Penelitian ................................................................................................ 11 BAB IV : METODE PENELITIAN 4.1. Pendahuluan ......................................................................................................... 12 4.2. Metode Pengumpulan Data .................................................................................. 12 4.3. Pengolahan Awal Data ......................................................................................... 13 4.4. Metode yang Diusulkan ........................................................................................ 13 4.5. Evaluasi ..............................................................................................................
13
BAB V : HASIL DAN PEMBAHASAN 5.1. Pendahuluan ......................................................................................................... 15 5.2. Eksperimen Menggunakan Rapidminer ............................................................... 15 5.3. Hasil dan Pembahasan ......................................................................................... 18 BAB VI : KESIMPULAN DAN SARAN ..................................................................... 23 1.1. Pendahuluan ........................................................................................................
23
1.2. Pencapaian Penelitian .......................................................................................... 23 DAFTAR PUSTAKA ................................................................................................... 24 LAMPIRAN ................................................................................................................... 26
DAFTAR TABEL
Tebel 2.1 Kata kunci dan Level Kognitif Taksonomi Bloom………………………..... 4 Tabel 4.1 Distribusi Dataset …………………………………………………………… 14 Tabel 4.2 Confussion Matrix………………………………………………………….... 14 Tabel 5.1 Kinerja dari Naive Bayes dgn Forward Selection &Backward Elimination
18
Tabel 5.2 Confussion Matrix Forward Selection ........................................................... 19 Tabel 5.3 Confussion Matrix Backward Elimination .................................................... 19 Tabel 5.4 Akurasi Naïve Bayes dengan Beberapa Seleksi Fitur ...................…………. 21 Tabel 5.5 Kappa Naïve Bayes dengan Beberapa Seleksi Fitur ………………………… 21 Tabel 5.6 Waktu Komputasi Naïve Bayes dengan Beberapa Seleksi Fitur …………… 21
DAFTAR GAMBAR
Gambar 2.1 Term Document Matrix ………………………………………………….. 8 Gambar 2.2 Algoritma Forward Selection .................................................................... 8 Gambar 2.3 Algoritma Backward Elimination ............................................................. 8 Gambar 4.1 Model Klasifikasi yang Diusulkan ............................................................ 13 Gambar 5.1 Urutan Proses Klasifikasi di Rapidminer ................................................. 16 Gambar 5.2 Penentuan Label Dataset di Rapidminer ................................................... 16 Gambar 5.3 Uraian Prepocessing Dokumen di Rapidminer ......................................... 17 Gambar 5.4 Proses Validasi Menggunakan 10-Fold Validation ................................... 17 Gambar 5.5 Proses Training dan Testing di Rapidminer ............................................... 18 Gambar 5.6 Akurasi …………………………………………….........………………... 20 Gambar 5.7 Kappa …………………………………………………………………..... 20
DAFTAR LAMPIRAN
Lampiran I : Riwayat Penelitian Ketua Peneliti dan Anggota Peneliti .......................
26
Lampiran II : Dataset Penelitian ..................................................................................
29
Lampiran III : Draft Artikel Ilmiah ………………......……………………………….. 43
BAB I PENDAHULUAN 1.4. Latar Belakang Taksonomi Bloom banyak dipakai pada bidang pendidikan. Taksonomi Bloom digunakan sebagai dasar perencanaan dan desain tujuan suatu pembelajaran. Untuk menghitung keberhasilan pelajar pada pelajaran tertentu, dibutuhkan pengujian seperti kuis atau tes yang memainkan peranan penting. Pada umumnya, pengajar menyediakan soal tes untuk beberapa tingkat pengujian yang sama dengan pelajaran yang sudah ditempuh untuk menentukan siswa tersebut mencapai level pengetahuan tertentu [1]. Menurut Benjamin Bloom yang dikenal teorinya dengan Taksonomi kognitif bloom, beliau menuturkan bahwa dalam menyusun alat evaluasi untuk mencapai hasil belajar, hendaknya memiliki beberapa tingkat berpikir menengah hingga tinggi. Aspek kognitif itu meliputi pengetahuan (knowledge), pemahaman (comprehension), aplikasi (application), analisis (analysis), sintesis (synthesis) dan evaluasi (evaluation). Akan tetapi proses mempersiapkan dan mendesain soal tes yang memenuhi kriteria taksonomi bloom sangat menghabiskan waktu dan sulit untuk dilakukan [1], mengingat banyaknya jumlah soal yang perlu dikategorikan sesuai taksonomi bloom secara manual. Algoritma pembelajaran machine learning diusulkan untuk mengatasi masalah tersebut. Dengan menggunakan algoritma machine learning penentuan kategori pada soal tersebut dapat dilakukan secara otomatis tanpa bantuan manusia.Yahya dan Osman mengusulkan penggunaan algoritma Support Vector Machine (SVM) untuk penentuan soal berdasar taksonomi bloom[2]. Dalam penelitian tersebut, SVM menghasilkan tingkat akurasi yang baik, namun memiliki kompleksitas yang cukup tinggi [3]. Penelitian lainnya adalah penelitian Chai Jing Hui[1] [4] yang menggunakan algoritma jaringan syaraf tiruan untuk penentuan label taksonomi bloom secara otomatis.Masalah utama yang diangkat pada penelitian tersebut adalah besarnya ruang vektor yang dihasilkan dari banyaknya kumpulan soal. Penelitian tersebut menggunakan Vector Space Model (VSM) untuk merepresentasikan kumpulan soal atau dokumen. Setiap soal akan membentuk satu vektor, dimana elemen dari vektor tersebut dapat berupa jumlah tiap kata (word/token) untuk satu dokumen tersebut. Untuk mengatasi masalah tersebut, Chai Jing Hui mengusulkan penggunaan document frequency (DF) dan Category Frequency-Document Frequency
Method (CF-DF) sebagai unsupervised feature selection untuk mengurangi kompleksitas dari algoritma neural network. Rendahnya tingkat akurasi yang dihasilkan pada penelitian Chai Jing Hui [1] menjadi alasan penelitian ini dilakukan. Penelitian ini mengusulkan penggunaan algoritma naïve bayesberbasis kernel density estimation dengan fitur seleksi. Algoritma naïve bayes dipilih dikarenakan algoritma ini sangat cocok untuk short data text. Seperti penelitian yang dilakukan oleh Kuruvilla Mathew dan Biju Issac [5], algoritma naïve bayes mampu melakukan prediksi spam mail dengan tingkat akurasi melebihi 98%. Penelitian ini menggunakan menggunakan algoritma naïve bayes berbasis kernel density estimation untuk mendapatkan akurasi yang lebih baik. Permasalahan yang kedua yang di angkat pada penelitian ini adalah masalah komputasi algoritma naïve bayes berbasis kernel density estimation. Seperti yang dipaparkan oleh Jingli Lu, et al.[6] bahwa penggunaan data yang besar menyebabkan lemahnya komputasi algoritma naïve bayes berbasis kernel density estimation. Sehingga penelitian ini juga mengusulkan dua tahapan seleksi fitur, yaitu seleksi fitur berbasis filter dan wrapper. Jenis seleksi fitur berbasis rangking yang dipilih adalah supervised feature selection. Penelitian Yang dan Pederson [7] menyebutkan bahwa kemampuan supervised feature selection lebih baik daripada unsupervised feature selection untuk menghasilkan tingkat akurasi algoritma klasifikasi yang tinggi.Supervised feature selection yang digunakan pada penelitian ini adalah chi squared statistic dan infomation gain. Tahapan kedua seleksi fitur yang digunakan adalah Backward Elimination dan Forward Selection. Dalam penelitian Khoerudin[8], kedua fitur seleksi ini mampu meningkatkan akurasi algoritma Naïve Bayes.
1.5. Perumusan Masalah Dari uraian latar belakang tersebut maka masalah yang melatar belakangi penelitian ini adalah masih rendahnya tingkat akurasi penentuan kalimat soal yang dilakukan oleh Chai Jing Hui[1].Penelitian tersebut menggunakan algoritma backpropagation neural network dengan seleksi fitur Document Frequency (DF) dan Category Frequency – Document Frequency (CF-DF). Dengan dataset yang sama, penelitian ini mengusulkan algoritma naïve bayes berbasis kernel density estimation dengan dua tahapan seleksi fitur
1.6. Batasan Penelitian Batasan masalah pada penelitian ini adalah: 1. Proses klasifikasi hanya pada level kognitif taksonomi bloom (knowledge, comprehension, application, analysis, synthesis, evaluation) 2. Dataset yang digunakan diambil dari penelitian Chai Jing Hui[1]
BAB II TINJAUAN PUSTAKA 2.1. Pendahuluan BAB ini memaparkan beberapa teori dan penelitian terkait berkenaan dengan kognitif taksonomi bloom, algoritma Naïve Bayes, seleksi fitur, dan metode-metode yang digunakan untuk text mining.Beberapa daftar pustaka dalam penelitian ini diambil dari jurnal ataupun karya ilmiah mahasiswa
2.2. Taksonomi Bloom Tujuan pembelajaran dalam taksonomi bloom terdiri dari 3 domain, yaitu kognitif, afektif dan psikomotorik. Domain kognitif berhubungan dengan kemampuan intelektual, domain afektif berhubungan dengan kemampuan emosional, dan domain psikomotorik berhubungan dengan kemampuan secara fisik [9]. Penelitian ini focus pada domain kognitif taksonomi bloom. Tabel 2.1 menunjukan kata kunci dari domain kognitif taksonomi bloom.
2.3. Text Prossecing Preprocessing adalah tahapan mengubah suatu dokumen kedalam format yang sesuai agar dapat diproses oleh algoritma clustering. Terdapat 3 tahapan preprocessing dalam penelitian ini, yaitu: Tokenization, merupakan tahapan penguraian string teks menjadi term atau kata. Stopword Removal, merupakan tahapan penghapusan kata-kata yang tidak relevan dalam penentuan topik sebuah dokumen dan yang sering muncul pada sebuah dokumen, misal "and", "or", "the", "a", "an" pada dokumen berbahasa inggris. Tabel 2.1 Kata kunci dari Level Kognitif Taksonomi Bloom Category
Keywords
Knowledge
Defines, describes, identifies, knows, labels, lists, matches, names, outlines, recalls, recognizes, reproduces, selects, states.
Comprehension Comprehends converts, defends, distinguishes, estimates, explains, extends, generalizes, gives examples, infers, interprets, paraphrases, predicts, rewrites, summarizes, translates. Application
Applies, changes, computes, constructs, demonstrates, discovers, manipulates, modifies, operates, predicts, prepares, produces, relates, shows, solves, uses.
Analysis
Analyzes, breaks down, compares, contrasts, diagrams, deconstructs, differentiates, discriminates, distinguishes, identifies, illustrates, infers, outlines, relates, selects, separates.
Synthesis
Categorizes, combines, compiles, composes, creates, devises, designs, explains, generates, modifies, organizes, plans, rearranges, reconstructs, relates, reorganizes, revises, rewrites, summarizes, tells, writes.
Evaluation
Appraises, compares, concludes, contrasts, criticizes, critiques, defends, describes, discriminates, evaluates, explains, interprets, justifies, relates, summarizes, supports.
Stemming, merupakan tahapan pengubahan suatu kata menjadi akar kata nya dengan menghilangkan imbuhan awal atau akhir pada kata tersebut, misal eating → eat, extraction → extract. Penelitian ini menggunakan algoritma porter stemmer. 2.4. Term Weighting Model Vector space model banyak digunakan dalam sistem temu kembali dokumen teks [10]. VSM adalah model yang digunakan untuk mengukur kemiripan antar dokumen. VSM mengubah koleksi dokumen kedalam matrik term-document [11]. Matrik termdocument Gambar 2.1 tersebut memiliki dimensi mxn dimana m adalah jumlah term dan n adalah jumlah dokumen. Terdapat 3 metode pembobotan atau term weighting dalam VSM yaitu Term Frequency (TF), Invers Document Frequency (IDF) dan Term Frequency Invers Document Frequency (TFIDF). TF adalah banyaknya kemunculan suatu term dalam suatu dokumen, IDF adalah perhitungan logaritma antara pembagian jumlah total dokumen dengan cacah dokumen yang mengandung suatu term, dan TFIDF adalah perkalian antara TF dengan IDF. Semakin besar bobot TFIDF pada suatu term, semakin penting term tersebut untuk digunakan pada tahapan. Dalam penelitian ini digunakan TFIDF sebagai metode term weighting dimana TFIDF lebih sering digunakan. Rumus
perhitungan TFIDF ditunjukan pada rumus (1), dimana N adalah jumlah dokumen dan df adalah jumlah dokumen yang mengandung term t .
Gambar 2.1 Term Document Matrix
TFIDF TF IDF
(1) IDF log( N / df )
(2) 2.5. Seleksi Fitur Seleksi Fitur atau yang lebih dikenal dengan feature selection, subset selection, attribute selection or variable selection adalah proses memilih fitur yang tepat untuk digunakan dalam proses klasifikasi atau klastering. Tujuan dari seleksi fitur ini adalah untuk mengurangi tingkat kompleksitas dari sebuah algoritma klasifikasi, meningkatkan akurasi dari algoritma klasifikasi tersebut, dan mampu mengetahui fitur-fitur yang paling berpengaruh terhadap tingkat akurasi [12]. Penelitian ini fokus pada penggunaan seleksi fitur berbasis search atau pencarian dan individual feature rangking. Untuk lebih detailnya dijelaskan pada subbab berikut ini. 2.5.1. Seleksi Fitur Berbasis Ranking Penelitian ini menggunakan 2 metode supervised feature selection yakni chi square dan information gain. Kedua metode ini digunakan untuk mengurangi penggunaan fitur dalam proses klasifikasi ataupun klustering dokumen teks. Information gain
mampu
memberikan akurasi yang lebih baik untuk klasifikasi sentiment analisis [13]. Formula
rumus chi square ditunjukan pada rumus (3) dan (4), sedangkan rumus information gain ditunjukan pada rumus (5). 2 t, c
N ( A D B C )2 ( A B) (C D) ( A C ) ( B D)
(3) avg 2
m
P(C ) i
2
(t , c)
i 1
(4) n
IG(t )
n
n
P(C ) log P(C ) P(t ) P(C | t ) log P(C | t ) P(t ) P(C | t ) log P(C | t) i
i 1
i
i
i
i 1
i
i
i 1
(5) Dimana A adalah banyaknya soal di kategori C dan mengandung t , B adalah banyaknya soal bukan di kategori C dan mengandung t , C adalah banyaknya soal di kategori C dan tidak mengandung t , D adalah banyaknya soal bukan di kategori C dan tidak mengandung t , N adalah total jumlah soal, P(Ci ) adalah probabilitas kategori Ci , P(t )
adalah probabilitas term t , P(Ci | t ) adalah probabilitas term t di kategori Ci , P(Ci | t )
= probabilitas bukan term t di kategori Ci . 2.5.2. Seleksi Fitur Berbasis Wrapper Makalah ini mengusulkan seleksi fitur berbasis wrapper untuk meningkatkan kinerja Naïve Bayes dengan Kernel Density Estimation. Seleksi fitur berbasis wrapper menggunakan classifier untuk mengevaluasi subset fitur dengan mengukur cross-validasi. Pada penelitian ini menggunakan forward selection dan backward elimination untuk metode seleksi fitur berbasis wrapper. Algoritma forward selection dimulai dengan subset fitur kosong. Dalam setiap iterasi, menambahkan satu fitur setiap langkah ke depan sampai sejumlah standar fitur tercapai. Untuk satu langkah, setiap fitur kandidat secara terpisah ditambahkan ke bagian saat ini dan kemudian dievaluasi. Fitur yang diinduksi peningkatan tertinggi disertakan dalam hasil bagian.
1. Berawal dari himpunan set kosong Fk 0 2. Iterasi a. Memilih fitur yang terbaik j untuk di masukan ke Fk dengan nilai penurunan yang signifikan j arg max J Fk j b. Update Fk 1 Fk j , k k 1
Gambar 2.2 Algoritma Forward Selection Backward elimination berbeda dengan forward selection. Backward elimination dimulai dari subset fitur yang utuh. Fitur dalam subset akan dihapus jika fitur tersebut menyebabkan akurasi dari algoritma klasifikasi menurun. Gambar 2.2 dan Gambar 2.3 menunjukan algoritma dari forward selection dan backward elimination. 1. Berawal dari himpunan set fitur Fk 1,..., n and k n 2. Iterasi a. Hapus fitur yang dapat mengurangi akurasi j arg max J Fk j
b. Update Fk 1 Fk j , k k 1 Gambar 2.3 Algoritma Backward Elimination 2.6. Algoritma Naive Bayes Naïve bayes merupakan algoritma machine learning sederhana yang menerapkan teori probabilitas. Algoritma naïve bayes termasuk kedalam algoritma klasifikasi, yang artinya memerlukan proses training dalam melakukan prediksi. Klasifikasi Bayesian ini dihitung berdasarkan teorema Bayes (1). P(C | d ) P(C )
P(d | C ) P(d )
(6) Dimana P(C | d ) adalah probabilitas suatu class C terhadap suatu dokumen d , P(C ) adalah pobabilitas dari suatu class C yang dihitung dengan membagi banyaknya dokumen
d di dalam class C dengan banyaknya dokumen di semua class, P(d ) adalah probabilitas
dari suatu dokumen, nilai P(d ) dapat dihiraukan dikarenakan nilainya yang selalu sama. P(d | C ) adalah probabilitas suatu dokumen d di dalam class C .
Dokumen-dokumen tersebut dapat direpresentasikan sebagai kumpulan dari berbagai teks, sehingga menhasilkan persamaan (7). Sehingga persamaan (6) dapat ditulis menjadi persamaan (8).
P(d | C ) P(wi | C ) (7)
P(C | d ) P(C ) P(wi | C ) (8) Dimana P(wi | C ) adalah probabilitas term (word) ke- i yang muncul dari suatu dokumen di dalam class C , yang dapat dihitung dengan persamaan (8).
P( wi | C )
Tc M N
(9) Dimana Tc banyaknya kata wi yang muncul di class C, M adalah banyaknya kata yang muncul di class C, N adalah banyaknya term unik pada seluruh dokumen, adalah konstanta positif yang biasanya bernilai 1 atau 0.5. nilai ini untuk mencegah hasil 0 pada perhitungan probabilitas.
2.6.1. Algoritma Naive Bayes Berbasis Kernel Density Estimation Naïve Bayes berbasis Kernel Density Estimation (KDE) atau biasa dikenal Naïve Bayes Fleksibel digunakan untuk mengatasi masalah data bertipe kuantitatif[11]. Untuk menangani data kuantitatif tersebut, naïve bayes menggunakan pendekatan distribusi normal Gaussian. A Comparative Study among Different Kernel Functions in Flexible Naïve Bayesian Classification James N.K. Liu, Yu-Lin He, Xi-Zhao Wang, Yan-Xing Hu 1 f g ( x, i , c ) e 2 c
( x i )2 2 c 2
(10)
p( D d | C c)
1 g (x, i , c ) n i
Dimana i adalah range untuk data training untuk atribut X di class C dan i xi ,
c
1 dimana nc adalah banyaknya dokumen training didalam class C . nc
(11)
BAB III TUJUAN DAN MANFAAT PENELITIAN 3.1. Tujuan Berdasarkan uraian latar belakang dan rumusan masalah diatas, maka tujuan dari penelitian ini adalah sebagai berikut: 1. Untuk mengevaluasi hasil penelitian Chai Jing Hui dengan menggunakan Algoritma Naive Bayes untuk meningkatkan akurasi yang lebih baik
3.2. Manfaat Penelitian Adapun manfaat dari penelitian ini yaitu menghasilkan model penentuan kalimat soal berbasis level kognitif taksonomi bloom yang lebih akurat sehingga dapat digunakan dalam membangun aplikasi pembuatan soal yang memenuhi standar taksonomi bloom. Aplikasi tersebut nantinya dapat membantu guru dan dosen dalam pembuatan soal ujian.
BAB IV METODE PENELITIAN 4.1. Pendahuluan Banyak algoritma kecerdasan buatan dikembangkan. Salah satunya algoritma Naïve Bayes. Dalam domain information retrieval, algoritma ini sangat baik untuk dataset yang terdiri dari kalimat-kalimat yang pendek. Permasalahan algoritma Naïve Bayes terletak pada data kuantitatif. Sehingga beberapa peneliti mengembangkan Naïve Bayes berbasis Kernel Density Estimation untuk mengatasi data kuantitatif. Penelitian ini mengusulkan algoritma tersebut untuk meningkatkan akurasi pada penelitian sebelumnya yang dilakukan oleh Chai Jing Hui [1]. Dengan dataset yang sama, penelitian ini tidak hanya mengusulkan algoritma klasifikasi yang berbeda, namun juga mengusulkan dua tahapan seleksi fitur.
4.2. Metode Pengumpulan Data Dataset yang digunakan pada penelitian ini mengacu pada penelitian yang dilakukan oleh Chai Jing Hui [1]. Dataset kalimat soal berjumlah 274 yang terbagi menjadi 192 kalimat soal untuk data training dan 82 kalimat soal untuk data testing, seperti ditunjukan pada Tabel 4.1.Detail kalimat soal yang diujikan dapat dilihat di lampiran. Dataset mengggunakan bahasa inggris, sehingga stopword yang digunakan untuk preprocessing data juga berbahasa inggris. Tabel 4.1 : Distribusi Dataset Label / Kategori Training Testing Knowledge
17
11
Comprehension
31
12
Application
29
13
Analysis
32
16
Synthesis
45
14
Evaluation
38
16
Total
192
82
4.3. Pengolahan Awal Data Pada bagian ini dijelaskan tentang tahap awal data mining. Pengolahan awal datameliputi proses input data ke format yang dibutuhkan, pengelompokan dan penentuan atribut data, serta pemecahan data (split) untuk digunakan dalam prosespembelajaran (training) dan pengujian (testing).10-fold validation digunakan untuk split data, dimana 90% data digunakan untuk training dan 10% data digunakan untuk testing[15]..
4.4. Metode yang Diusulkan Pada bagian ini dijelaskan tentang metode yang diusulkan untuk digunakan padapenentuan tingkat kesulitan soal berdasarkan kognitif bloom. Penjelasan meliputi pengaturan dan pemilihan nilai dari parameter-parameter dan arsitektur melalui uji coba. Gambar 4.1 menunjukan usulan model klasifikasi soal menggunakan jaringan syaraf tiruan dan feature selection.
Kumpulan Soal
TextProce ssing
Supervised Selesi Fitur
Term Weighting
Backward/forward selection
10-Fold Validation Evaluasi
Naïve Bayes berbasis kernel density estimation
Gambar 4.1. Model Klasifikasi yang Diusulkan 4.5. Evaluasi Pengukuran kinerja algoritma pada penelitian ini menggunakan confusion matrix (Tabel4.2). Confusion matrix dapat menghasilkan dua alat ukur yaitu accuracy dan kappa coefficient. Kappa adalah koefisien untuk mengevaluasi perbedaan antara 2 pendapat. Kappa dapat diaplikasikan untuk mengetahui kinerja dari algoritma [16] .
Tabel4.2 Confussion Matrix Actual Positive Actual Negative Predicted Positive Predicted Negative Accuracy
Kappa
Pc
TP
FP
FN
TN
TP TN TP TN FP FN
Accuracy Pc 1 Pc
(TP FN ) (TP FP) ( FP TN ) ( FN TN ) NN
(12)
(13)
(14)
BAB V HASIL DAN PEMBAHASAN 5.1. Pendahuluan Bab ini memaparkan proses dan hasil eksperimen yang sudah dilakukan. Eksperimen dilakukan dengan menggunakan tool rapidminer. Eksperimen setting perlu dilakukan untuk menghasilkan akurasi yang paling tinggi untuk metode yang diusulkan. Eksperimen setting dapat dilakukan dengan kombinasi metode atau melakukan penentuan jumlah fitur pada proses seleksi fitur. Harapan dari proses seleksi fitur ini adalah semakin sedikit jumlah fitur yang digunakan, semakin rendah waktu komputasi dan semakin tinggi akurasi yang dicapai.
5.2. Eksperimen Menggunakan Rapidminer Eksperimen pada penelitian ini menggunakan Rapidminer versi 5.3.005. Eksperimen terdiri daribeberapa tahapan, sebagai berikut: 1. Penentuan dataset kalimat soal 2. Preprocessing kalimat soal: tokenization, stopword removal, dan stemming 3. Seleksi fitur menggunakan metode chi square atau metode information gain 4. Seleksi fitur menggunakan algoritma backward elimination atau algoritma forward selection 5. Training dataset menggunakan algoritma naïve bayes berbasis kernel density estimation 6. Testing dataset menggunakan algoritma naïve bayes berbasis kernel density estimation 7. Analisis tingkat akurasi metode yang diusulkan Seperti yang dijelaskan pada subbab sebelumnya, kombinasi metode fitur seleksi dilakukan untuk mendapatkan akurasi yang tinggi. Kombinasi metode yang dilakukan pada penelitian ini, yaitu: 1. Klasifikasi kalimat soal menggunakan fitur seleksi chi square (CHI) 2. Klasifikasi kalimat soal menggunakan fitur seleksi chi square dan backward elimination (CHI+BFS) 3. Klasifikasi kalimat soal menggunakan fitur seleksi chi square dan forward selection (CHI+FFS) 4. Klasifikasi kalimat soal menggunakan fitur seleksi information gain (IG)
5. Klasifikasi kalimat soal menggunakan fitur seleksi information gain dan backward elimination (IG+BFS) 6. Klasifikasi kalimat soal menggunakan fitur seleksi information gain dan forward selection (IG+FFS)
Gambar 5.1 Urutan Proses Klasifikasi di Rapidminer
Langkah klasifikasi kalimat soal yang diusulkan seperti yang terlihat pada Gambar 5.1. Proses klasifikasi diawali dengan penentuan dataset yang disimpan di dalam file excel. Dalam penentuan dataset oleh rapidminer, beberapa setingan perlu dilakukan, yaitu mengubah tipe data soal menjadi text dan tipe atribut menjadi label Gambar 5.2. Untuk text preprocessing ditunjukan pada Gambar 5.3 yaitu menggunakan tokenization, stopword removal, dan stemming.
Gambar 5.2 Penentuan Label Dataset di Rapidminer
Gambar 5.3 Urutan Preprocessing Dokumen di Rapidminer
Gambar 5.4. Proses Validasi Menggunakan 10-Fold Validation
10-fold validation dilakukan untuk meningkatkan akurasi klasifikasi. Proses 10-fold validation ditunjukan pada Gambar 5.4. Proses ini diletakan didalam proses optimize selection. Dengan menggunakan 10-fold validation, dataset tidak perlu dibagi menjadi dataset training dan testing secara manual. Pada penelitian ini menggunakan 10-fold validation yang artinya 90% dataset dijadikan data training dan 10% dataset dijadikan data testing. Didalam proses 10-fold validation terdapat proses klasifikasi yang dilakukan oleh algoritma naïve bayes berbasis kernel density estimation seperti yang terlihat pada Gambar 5.4.
5.3. Hasil dan Pembahasan Percobaan pertama dilakukan untuk mengetahui tingkat akurasi dari penggunaan fitur seleksi berbasis wrapper. Pada tabel 5.1 menunjukan bahwa penggunaan forward selection mampu meningkatkan akurasi algoritma Naïve Bayes hingga mencapai 67.18%. Algoritma forward selection mampu mengahasilkan tingkat akurasi dan waktu komputasi yang lebih baik dibandingkan dengan algoritma backward elimination yang hanya mencapai tingkat akurasi sebesar 60.9%. Tabel 5.2 dan tabel 5.3 menunjukan confusion matrix dari penggunaan forward selection dan backward selection pada algoritma Naïve Bayes.
Gambar 5.5 Proses Training dan Testing di rapidminer
Tabel 5.1 Kinerja dari Naïve Bayes dengan forward selection dan backward elimination Accuracy
Kappa
ComputationTime (seconds)
NBK
57.61% +/- 5.64% 0.484 +/- 0.072
2
NBK+FS
67.18% +/- 7.77% 0.598 +/- 0.096
748.8
NBK+BE 60.9% +/- 11.95% 0.525 +/- 0.142
1861.8
Tabel 5.2 Confussion MatrixForward Selection true Knowledge
true Comprehen
true Application
true Analysis
sion
Pred Knowledge pred. Comprehension
22 0
true Synthe
true Evalua
sis
tion
class precision
0
0
1
2
1
84.62%
30
0
3
3
1
81.08%
Pred Application
0
1
26
0
0
0
96.30%
Pred Analysis
0
2
1
27
4
4
71.05%
Pred Synthesis
6
11
12
17
49
18
43.36%
Pred Evaluation
0
0
2
0
1
30
90.91%
78.57%
68.18%
63.41%
56.25%
83.05%
55.56%
class recall
Tabel 5.3 Confussion MatrixBackward Elimination true Knowledge
true Comprehension
true Application
true Analysi s
true Synthesis
true Evaluation
class precision
pred. Knowled ge
18
2
3
1
0
1
72.00%
pred. Compreh ension
4
27
4
6
4
2
57.45%
pred. Applicati on
1
1
20
6
1
4
60.61%
pred. Analysis
2
8
3
25
4
7
51.02%
pred. Synthesis
1
6
5
3
40
3
68.97%
pred. Evaluati on
2
0
6
7
10
37
59.68%
64.29%
61.36%
48.78%
52.08%
67.80%
68.52%
class recall
Pada percobaan kedua, fitur seleksi berbasis filter digunakan untuk mengurangi waktu komputasi fitur seleksi berbasis wrapper. Tujuan penggunaan 2 seleksi fitur ini untuk lebih meningkatkan kinerja algoritma Naïve Bayes.Gambar 5.6 dan Gambar 5.7 menunjukan grafik tingkat akurasi dan kappa untuk mengetahui metode yang paling optimal. Detail angka dari grafik tersebut dipaparkan pada Tabel 5.4 untuk hasil akurasi dan Tabel 5.5 untuk hasil kappa. Dengan melihat hasil perbandingan tersebut, hasil yang paling optimal dicapai oleh kombinasi seleksi fitur antara chi square dan backward fitur seleksi dengan tingkat akurasi sebesar 77,8% dan kappa sebesar 0,729. Kombinasi fitur seleksi tersebut juga menunjukan tingkat akurasi yang stabil untuk perubahan jumlah fitur yang digunakan. 100,00 90,00 Accuracy
80,00 70,00 60,00 50,00 40,00 10
20
30
40
50
60
70
80
90
% Term CHI
CHI+FFS
CHI+BFS
IG
IG+FFS
IG+BFS
Gambar 5.6 Akurasi 1 0,9
Kappa
0,8 0,7 0,6 0,5 0,4 10
20
30
40
50
60
70
80
90
% Term CHI
CHI+FFS
CHI+BFS
IG
Gambar 5.7 Kappa
IG+FFS
IG+BFS
Tabel 5.4 Akurasi Naïve Bayes dengan beberapa seleksi fitur % Term
CHI
CHI+FFS
10 73,28
67,61
75,17 73,69
65,71
75,94
20 76,59
72,29
77,8 73,33
73,32
76,28
30 71,49
62,87
74,48 68,28
68,69
72,59
40
71,1
71,26
74,5 60,19
73,43
63,8
50 71,48
66,43
73,07 60,52
68,32
63,1
60 67,46
74,11
70,09 60,86
70,11
63,48
70
66,4
54,42
70,11 57,92
72,63
59,81
80 64,58
70,79
66,79 56,85
67,12
59,22
90
72,69
68,23 56,11
67,53
58,78
65,3
CHI+BFS
IG
IG+FFS IG+BFS
Tabel 5.5 KappaNaïve Bayes dengan beberapa seleksi fitur % Term CHI
CHI+FFS
CHI+BFS IG
IG+FFS IG+BFS
10 0,674
0,602
0,697 0.681
0,577
0,705
20 0,714
0,659
0,729 0.676
0,674
0,712
30 0,653
0,541
0,688 0.614
0,615
0,666
40 0,647
0,646
0,688 0.515
0,675
0,559
50 0,652
0,585
0,671 0.518
0,61
0,55
60 0,602
0,683
0,633 0.522
0,632
0,554
70
0,59
0,434
0,634 0.486
0,664
0,51
80 0,568
0,641
0,594 0.473
0,595
0,502
90 0,577
0,665
0,613 0.464
0,602
0,498
Tabel 5.6 Waktu Komputasi Naïve Bayes dengan beberapa seleksi fitur % Term CHI CHI+FFS CHI+BFS IG IG+FFS IG+BFS 10
1
64,2
15
1
56
29
20
1
206,4
60,6
1
210
78,6
30
1
236,2
137,4
1
262,2
136,2
40
1
784
488,4
2
486,6
275,4
50
1
618,6
499,2
2
454,8
496,8
60
1
1690
1095,6
2
672
622,2
70
1
1943,6
1506,6
2
1200,4
1115,4
80
1
1926,6
1654,2
2
1850,8
1510,2
90
1
2026,6
1789,2
2
1969,6
1652,4
BAB VI KESIMPULAN DAN SARAN
6.1. Pendahuluan Proses otomatisasi penentuan kalimat soal berdasarkan level kognitif taksonomi bloom diperlukan untuk mempermudah dosen atau guru dalam mendesain atau menyusun soal ujian. Proses otomatisasi tersebut juga diperlukan untuk suatu lembaga pendidikan yang memiliki bank soal yang cukup besar.Level kognitif taksonomi bloom terdiri dari 6 tipe, yaitu knowledge, comprehension, application, analysis, synthesis, dan evaluation. Beberapa penelitian dilakukan untuk proses otomatisasi tersebut yaitu dilakukan dengan menggunakan algoritma machine learning.Penelitian ini mengusulkan algoritma naïve bayes berbasis kernel density estimation dengan menambahkan proses seleksi fitur. Seleksi fitur mampu menurunkan waktu komputasi dan meningkatkan akurasi dari proses penentuan soal berbasis taksonomi bloom. Pada BAB ini, pencapaian hasil penelitian dan saran untuk penelitian di masa mendatang akan dipaparkan.
6.2. Kesimpulan Dengan dataset yang sama dengan penelitian sebelumnya, usulan metode pada penelitian ini mampu meningkatkan hasil akurasi proses klasifiasi soal berbasis taksnomi bloom.Dari eksperimen yang dilakukan, gabungan fitur seleksi antara chi square dan backward elimination mampu mendapatkan tingkat akurasi yang paling baik yaitu sebesar 77,8%. Hasil akurasi ini lebih baik dari hasil penelitian sebelumnya yang hanya mencapai 65,26%.
6.3. Saran Penelitian di Masa Mendatang Menjadi tantangan kami untuk mengimplentasikan metode yang kami usulkan ini menjadi perangkat lunak yang dapat digunakan oleh para dosen atau guru dalam meendesain soal. Untuk penelitian mendatang, kalimat soal berbahasa Indonesia dapat digunakan sebagai dataset sehingga kedepannya perangkat lunak tersebut dapat diimplementasikan. Pemilihan algortima klasifikasi yang lain perlu dilakukan sehingga tingkat akurasi penentual kalimat soal berdasar taksonomi bloom ini dapat ditingkatkan.
DAFTAR PUSTAKA
[1]
Chai Jing Hui, “Feature Reduction for Neural Network in Determining the Bloom’s Cognitif Level of Questions Items, “ 2009
[2]
Anwar Ali Yahya and Addin Osman, "Automatic Classification of Questions into Bloom's Cognitive Levels using Support Vector Machines," in The International Arab Conference on Information Technology, 2011.
[3]. M. Janaki Meena and K. R. Chandran, “Naïve Bayes Text Calssification with Positive Features Selected by Statistical Methode “ in First International Conference on Advanced Computing, 2009 [4]. Norazah Yusof and Chai Jing Hui, “Determination of Bloom’s Cognitive Level of Question Items using Artificial Neural Network“, in International Conference on Intelligent System Design and Applications, 2010, pp.866-870. [5]. Kuruvilla Mathew and Biju Issac, “Intelligent Spam Classification for Mobile Text Message, “
in International Conference on Computer Science and Network
Technology, 2011 [6]
Jingli Lu, Ying Yang and Geoffrey I. Webb, “Incremental Discretization for Naive Bayes Classifier, “ Advance Data Mining and Applications, Vol 4093, pp.223-238, 2006
[7]
Y. Yang and J. O. Pedersen, "A Comparative Study on Feature Selection in Text Categorization," in Proc. 14th Int’l Conf. Machine Learning, 1997
[8]
Asep Khoerudin, "Analisis Tingkat Kesukaan Konsumen dengan Metode Bayesian Network
(Studi
Kasus
Produk
Biskuit),"
Bogor
Agricultural
University,
Undergraduate Thesis 2011 [9]. David Andrich, “Framework Relating Outcomes Based Education and The Taxonomy of Educational Objectivies “, Studies in Educational Evaluation, vol.28, pp.35-59, 2002 [10] L.S. Wang, "Relevance Weighting of Multi-Term Queries for Vector Space Model," in Proc. of Computational Intelligence and Data Mining, 2009, pp. 0-6.
[11] S. G, D. G, and S. Kp, R. Peter, "Evaluation of SVD and NMF Methods for Latent Semantic Analysis," International Journal of Recent Trends in Engineering, vol. 1, pp. 308-310, 2009. [12] Amir Navot, "On the Role of Feature Selection in Machine Learning," Doctor of Philosophy Thesis 2006. [13] P. Koncz and J. Paralic, "An approach to feature selection for sentiment analysis," in International Conference on Intelligent Engineering Systems, 2011, pp. 357–362. [14] James N.K. Liu, Yu-Lin He, Xi-Zhao Wang, and Yan-Xing Hu, "A Comparative Study among Different Kernel Functions in Flexible Naïve Bayesian Classification ," in International Conference on Machine Learning and Cybernetics, 2011. [15] H. Drucker, D. Wu, and V. N. Vapnik, "Support vector machines for spam categorization," IEEE transactions on neural networks, vol. 10, no. 5, 1999. [16] R. Parimala and Dr. R. Nallaswamy, “A Study of Spam E-mail Classification Using Feature Selection Package “, Global Journal of Computer Science and Technology, vol.11, no.7, 2011.
LAMPIRAN I RIWAYAT HIDUP PERSONALIA PENELITI KETUA PENELITI : a. Nama Lengkap
:
Bowo Nurhadiyono, S.Si., M.Kom
b. Tempat / Tanggal Lahir
:
Tegal / 21 Desember 1967
c. Jenis Kelamin
:
Laki-Laki
d. NPP
:
0686.11.1996.102
e. Disiplin Ilmu
:
Teknik Informatika
f. Pangkat/Golongan
:
Penata Tk I / III D
g. Jabatan
:
Lektor
h. Fakultas / Program Studi
:
Fakultas Ilmu Komputer/Teknik Informatika
i. Riwayat Pendidikan -
S1 FMIPA Universitas Padjadjaran Bandung (Lulus 1996)
-
S2 Teknik Informatika Universitas Dian Nuswantoro (Lulus 2004)
j. Publikasi Ilmiah
:
1. Implementasi Bilangan Bulat pada Algoritma RSA Secara Aljabar, Jurnal Teknologi Informasi, Techno.COM, Vol 9, N0 1, Pebruari 2010, ISSN : 1412-2693 2. Implementasi Matriks Pada Matematika Bisnis dan Ekonomi, Jurnal Teknologi Informasi. Techno.COM, Vol II No 2, Mei 2012, ISSN : 1412-2693 3. Penerapan Integrasi Numerik Menggunakan Metode Segi Empat (Rectangle Rule) untuk Menghitung Luas Daerah Tidak Beraturan, Jurnal Teknologi Informasi. Techno.COM, Vol 11 No 3, November 2012, ISSN : 1412-2693
Semarang, 25 Mei 2013 Ketua Peneliti,
Bowo Nurhadiyono, S.Si.,M.Kom NPP: 0686.11.1996.102
ANGGOTA PENELITI 1 a. Nama Lengkap
:
T Sutojo, S.Si., M.Kom
b. Tempat / Tanggal Lahir
:
Surabaya / 27 Juli 1968
c. Jenis Kelamin
:
Laki-Laki
d. NPP
:
0686.11.1996.094
e. Disiplin Ilmu
:
Teknik Informatika
f. Pangkat/Golongan
:
Penata / III C
g. Jabatan
:
Lektor
h. Fakultas / Program Studi
:
Fakultas Ilmu Komputer/Teknik Informatika
i. Riwayat Pendidikan 1. Jurusan Fisika FMIPA Universitas Airlangga Surabaya, 1995 2. Teknik Informatika Universitas Dian Nuswantoro Semarang, 2006 j. Publikasi Ilmiah
:
1. Aplikasi Algoritma Pewarnaan Graf Welch Powell untuk Optimasi Penjadwalan pada Ujian Semester Fakultas Ilmu Komputer Universitas Dian Nuswantoro Semarang, 2011 2. Perbandingan Sensitifitas Filter Deteksi Tepi Sobel dengan Filter Deteksi Tepi Prewittt Untuk Citra yang mengandung Noise Gaussian, "Techno-Com", vol. 10, no. 1, edisi Mei 2009, ISSN 1412-2693
Semarang, 25 Mei 2013 Anggota Peneliti 1,
T Sutojo, S.Si., M.Kom NPP : 0686.11.1996.094
ANGGOTA PENELITI 2 a. Nama Lengkap
:
Catur Supriyanto, S.Kom., M.Cs
b. Tempat / Tanggal Lahir
:
Semarang / 21 Oktober 1984
c. Jenis Kelamin
:
Laki-Laki
d. NPP
:
0686.11.2011.415
e. Disiplin Ilmu
:
Machine Learning, Information Retrieval
f. Pangkat/Golongan
:
Penata / III B
g. Jabatan
:
Asisten Ahli
h. Fakultas / Program Studi
:
Fakultas Ilmu Komputer/Teknik Informatika
i. Riwayat Pendidikan -
S1 Universitas Dian Nuswantoro Semarang (Lulus 2009)
-
S2 University Teknikal Malaysia, Malaka (Lulus 2011)
j. Publikasi Ilmiah -
:
A Comparison of Rabin Karp and Semantic-Based Plagiarism Detection, Tahun 2012, ICSIIT, ISBN/ISSN 978-602-97124-1-4
-
Integrating Feature-Based Document Summarization as Feature Reduction in Document Clustering, Tahun 2012, CITEE, ISBN/ISSN 2088-6578
-
Performance Enhancement of Image Clustering using Singular Value Decomposition in Color Histogram Content-Based Image Retrieval, IJCCE Vol. 1 No. 4 Tahun 2012, ISBN/ISSN 2010-3743
Semarang, 25 Mei 2013 Anggota Peneliti 2,
Catur Supriyanto, S.Kom., M.Cs NPP: 0686.11.2011.415
LAMPIRAN II DATASET PENELITIAN
No
Kalimat Soal
Kategori
1
Define the meaning of the Olympic Motto.
Knowledge
2
Identify the correct definition of osmosis.
Knowledge
3
Identify the standard peripheral components of a computer.
Knowledge
4
List the characteristics peculiar to the Cubist movement.
Knowledge
5
List the levels in Bloom's Taxonomy.
Knowledge
6
List the steps involved in titration.
Knowledge
7
List three characteristics that are unique to the Cubist movement.
Knowledge
8
Make a list of facts you learned from the story.
Knowledge
Name the food groups and at least two items of food in each 9
group.
Knowledge
10
Name the four major food groups.
Knowledge
11
Name the main characters in the story.
Knowledge
12
Quote what ICT stands for.
Knowledge
13
Recite a policy.
Knowledge
14
State the pattern for expressing time in French.
Knowledge
15
State the rule for balls and strikes in baseball.
Knowledge
16
State the rule for using a semicolon in a sentence.
Knowledge
17
Write the equation for the Ideal Gas Law.
Knowledge
18
Define compound interest.
Knowledge
19
Define mercantilism.
Knowledge
20
Define stream bank, floodplain and substrate.
Knowledge
21
Draw and label a diagram of a typical stream.
Knowledge
22
Identify the five major prophets of the Old Testament.
Knowledge
23
Identify the part of a eukaryotic cell.
Knowledge
24
Label any Olympic sporting apparatus with design features.
Knowledge
25
Name the artist who painted the Mona Lisa.
Knowledge
26
Recite the poem "Auto Wreck".
Knowledge
No
Kalimat Soal
Kategori
27
Recite the principles of the Data Protection Act.
Knowledge
28
State the definition of application in Bloom's Taxonomy.
Knowledge
29
Compare Calliope with Howie. Use the word bank.
Comprehension
30
Describe how Phillip and Timothy survived on the Cay
Comprehension
31
Describe in your own words how to copy text from one program Comprehension into another
32
Describe in your own words what happens when a stream's Comprehension velocity slows
33
Describe in your own words what is meant by a sprained ankle
Comprehension
34
Describe nuclear transport to a lay person
Comprehension
35
Describe what took place as the Hato was sinking
Comprehension
36
Explain how Timothy saved Phillip's life
Comprehension
37
Explain in your own words what a recessive gene is
Comprehension
Explain in your own words what do you mean by the term 38
economics?
Comprehension
39
Explain in your own words what is meant by mercantilism.
Comprehension
40
Explain what a poem means.
Comprehension
41
Illustrate this caption: "Olympic as a Media Event in the Comprehension Information Society"
42
Illustrate what you think the main idea was.
Comprehension
43
In one sentence explain the main idea of a written passage.
Comprehension
44
In one sentence give the point of a written passage.
Comprehension
45
Interpret the pictures.
Comprehension
46
Outline the most important insight of the tale.
Comprehension
47
Paraphrase what Hamlet is saying in his soliloquy.
Comprehension
48
Restate the Olympic motto in your own words.
Comprehension
49
Retell the story in your words.
Comprehension
50
Rewrite the principles of test writing.
Comprehension
51
State in your own words the rule for balls and strikes in baseball.
Comprehension
52
Summarize a story in own words.
Comprehension
No
Kalimat Soal
Kategori
53
Summarize the basic tenets of collaborative conservation.
Comprehension
54
Summarize the basic tenets of deconstructionism.
Comprehension
55
Summarize this magazine article.
Comprehension
56
Tell in your own words the beginning of the book.
Comprehension
57
Translate a written text aloud from L2 to English.
Comprehension
58
Translate an equation into a computer spreadsheet.
Comprehension
59
Translate an equation into a spreadsheet formula.
Comprehension
60
Translate the following passage from The Iliad into English
Comprehension
61
Describe in prose what is shown in graph form.
Comprehension
Describe in your own words how to borrow a book from the 62
library.
Comprehension
63
Explain in one's own words how to create a query in a database.
Comprehension
64
Express your opinion of 'Drugs in Sport' through poetry.
Comprehension
65
From a blueprint describe the article depicted.
Comprehension
66
Given a graph of production trends in automobiles, describe what Comprehension the graph represents in a memo to your boss.
67
Outline in your own words how the Leggo's Tomato Paste Comprehension advertisement sells their product.
68
Restate main idea of story in own words.
Comprehension
69
State in your own words where in the library to find previous Comprehension issues of journals that are no longer in the display racks.
70
Tell how Phillip kept himself alive after Timothy died.
Comprehension
71
Tell in your own words how the setting of the story made it more Comprehension interesting.
72
Tell what is meant by the definition of an isosceles triangle.
Comprehension
73
Apply shading to produce depth in drawing
Application
74
Apply the storytelling technique here to a little story of your own
Application
75
Apply your understanding the Olympic spirit to develop a new Application motto or slogan
76
Calculate the deflection of a beam under uniform loading
Application
No
Kalimat Soal
Kategori
77
Calculate the rate of habitat fragmentation within the Colorado Application Front Range in the last decade
78
Categorise the pictures and add them to the wall display
Application
79
Choose a country that does not compete at the Olympics and Application explain why that country is not an Olympic member
80
Classify celebrations into family and community categories
Application
81
Compute the area of actual circles
Application
82
Demonstrate using the OPAC and your knowledge of library Application organization to find a book about turtles
83
Design a market strategy for your product using a known strategy Application as a model.
84
Dramatize being their mothers.
Application
85
Dramatize some of the problems a homeowner might encounter Application by building in a floodplain. Draw 3 pictures showing the beginning, middle and ending of the
86
story.
Application
87
Draw a picture of the gown Cinderella wore to the ball.
Application
88
In a teaching simulation with your peers role-playing 6th grade Application students, demonstrate the principle of reinforcement in classroom interactions and prepare a half page description of what happened during the simulation that validated the principle.
89
Make a diorama to illustrate an important event.
Application
90
Make a paper-mache map to include relevant information about Application an event.
91
Make a scrapbook about the areas of study.
Application
92
Make up a puzzle game using the ideas from the study area.
Application
93
Model an Olympic Village for the new Millenium.
Application
94
Predict what happens to X if Y increases.
Application
95
Pretend you are one of the characters in the book. Write a diary
Application
about the happenings in your life for two consecutive days.
No
Kalimat Soal
Kategori
96
Solve for the ten following fraction multiplication problems. Application Please make sure to show all your work. Take a collection of photographs to demonstrate a particular
97
point.
Application
98
Use a manual to calculate an employee's vacation time.
Application
99
Use the distance travelled and cost of petrol to calculate the cost Application of a coach trip using a spreadsheet.
100 Use the rule for a semicolon in a sentence.
Application
101 Apply laws of statistics to evaluate the reliability of a written test. Application 102 Choose any U.S. president and explain how he exercised his Application power as Commander in Chief of the Armed Forces. 103 Classify frogs toads and other amphibians.
Application
104 Classify animals into two groups.
Application
105 Construct a model to demonstrate how it will work.
Application
106 Demonstrate how this could work in an industry setting?
Application
107 Derive a kinetic model from experimental data.
Application
108 Describe an experiment to answer the question of the effects of Application weight on the fall of an object. 109 Draw a picture of the bears' house.
Application
110 Have you ever met a person who had the courage Timothy had in Application the story? Explain. 111 Relate the principle of reinforcement to classroom interactions.
Application
112 Use a costing model to vary prices on goods to maximise profits Application and minimise costs. 113 Who can use what we know about sonnets and finish this poem?
Application
114 Analyse safe and dangerous aspects of these features
Analysis
115 Analyse the characteristics of frogs
Analysis
116 Analyse the movements and sounds of a frog
Analysis
117 Break down the main actions of the story
Analysis
No
Kalimat Soal
Kategori
118 Categorise all Olympic sports in order of difficulty. Describe the Analysis reasons behind your selection 119 Compare and contrast animals that the class has made
Analysis
120 Compare and contrast our school to other communities
Analysis
121 Compare and contrast two characters in the book
Analysis
122 Compare how different children come to school
Analysis
123 Compare the place where the story happened with where you live
Analysis
124 Compare this book to the last book you read
Analysis
125 Compare three celebrations using a Venn diagram
Analysis
126 Compare two of the characters in this book.
Analysis
127 Conduct an investigation to produce information to support a
Analysis
view. 128 Construct a graph to illustrate selected information.
Analysis
Contrast building in the coastal zone with building in a river 129 floodplain.
Analysis
130 Contrast Olympic athletes of today with athletes of past Olympic Analysis games. 131 Decide which parts of the book include the five W's (who, what, when, where, why) and the H (how). Then write a good paragraph for a newspaper article including these facts.
Analysis
132 Explain the patrilocal society in terms of lineage and dominance Analysis of the sexes. 133
Explain the term conjugal families, by making reference to the Analysis different types of societies to which they could belong.
134
From the argument given below, analyze the positive and Analysis negative points presented concerning the abolition of guns and write a brief (2-3 page) narrative of your analysis.
135
From the short presidential debate transcribed below:
Analysis
Differentiate the passages that attacked a political opponent personally, and those that attacked an opponent's political programs. 136
How does your pet differ from other animals?
Analysis
No
Kalimat Soal
Kategori
137
How was life different in your town 100 years ago?
Analysis
138
How would you describe Timothy?
Analysis
139
How would you distinguish between polymyositis and viral Analysis myositis in a 42-year-old man with weakness and a rash?
140
If your story happened in a foreign land, compare that land to the Analysis United States.
141
In a good paragraph, state the main idea of the book.
142 Investigate innovations that can enhance future Olympics.
Analysis Analysis
143 Point Out a sport that should be included in the Olympics and Analysis give reasons for its inclusion. 144 Recognise reliable sources of information.
Analysis
145 With your group investigate three different kinds of celebrations Analysis eg religious, cultural, national, community or family. 146 Break down the components of a standard film camera and Analysis explain how they interact to make the machine work. 147 By comparing the map of the tectonic plates to the earthquake Analysis map, what inferences can you make? 148 Can you find four different feelings Pa Lia had during the story?
Analysis
149 Compare herbatious and carnivorous animals on a Venn diagram.
Analysis
150 Compare two dog food commercials. What is the difference Analysis between them and how do they both sell their products? 151 Distinguish between micro and macro economics.
Analysis
152 Examine what helps to make a good Olympics? Think about Analysis money, schedules, sports and people. 153 How would this story be different if it had happened in a different Analysis country? 154 If your story occurred long ago, compare that time with today in a good paragraph. If it was a modern story, compare it with a long time ago and tell what would be different.
Analysis
No
Kalimat Soal
Kategori
155 Infer what would happen if the media were banned from Olympic Analysis sport. Look at the words in the word bank that describe people. Write 156 the words that describe Pa Lia, Calliope, and Howie in the correct Analysis column 157 Make a diagnosis or analyze a case study.
Analysis
Pick one of the main characters. Think of a shape that fits that 158 person's traits. Draw the shape. Then describe the character inside the shape. 159 Recognize logical fallacies in reasoning.
Analysis Analysis
Select the athlete of the century and analyse why you chose this Analysis person. 161 Using the previous example, if interest were compounded Analysis monthly instead of daily, what would the difference in interest be? 162 Can you invent another character for the story? Synthesis 160
163 Choose a character. Rewrite a scene from the story from this character's point of view. 164 Combine any two sports to develop a new Olympic sport
Synthesis
165 Combine elements of drama, music, and dance into a stage presentation. 166 Compose a simple rap or rhyme about zoo animals.
Synthesis
167 Compose music for a frog play
Synthesis
168 Construct a device that would assist an athlete in their training.
Synthesis
169 Construct an original work which incorporates five common materials in sculpture. Create a new product. Give it a name and plan a marketing 170 campaign. Create a storyboard for a sequel to your book. Use the same 171 characters 172 Create and perform a play about frogs
Synthesis
173 Create plan of local environment by drawing around boxes
Synthesis
Synthesis
Synthesis
Synthesis Synthesis Synthesis
174 Create a chart that compares things that use electricity and things Synthesis that do NOT use electricity 175 Design a new animal to live in the jungle.
Synthesis
176 Design a poster for this book
Synthesis
No
Kalimat Soal
Kategori
177 Design and make an animal that moves.
Synthesis
178 Design costumes for the characters.
Synthesis
179 Develop a plan for a new Olympic Bid System.
Synthesis
180 Develop a way to teach the concept of "adjectives".
Synthesis
181 Explain how the biological concept of symbiotic relationships Synthesis could be used to help solve socially created problems like water pollution, overflowing garbage landfills, or homelessness. 182 Explain why it is likely that a matriarchal family system would Synthesis be found in a matrilocal or matrilineal society. 183 Given a set of data derive an hypothesis to explain them.
Synthesis
Given two opposing theories design an experiment to compare 184 them.
Synthesis
185 How could we determine the number of pennies in a jar without Synthesis counting them? 186 How would you change the story to create a different ending?
Synthesis
187 How would you restructure the school day to reflect children's Synthesis developmental needs? 188 Identify one problem in the book and give an alternate solution Synthesis one not given by the author. 189 Integrate training from several sources to solve a problem.
Synthesis
190 List the events of the story in sequence.
Synthesis
191 Make a radio announcement that advertise the book. Write it out.
Synthesis
192 Name one character. Rewrite the story from this character's point Synthesis of view. 193 Organize this book into three or more sections and give your own Synthesis subtitle for each section. 194 Prepare a book jacket that illustrates the kind of book as well as Synthesis the story. 195 Pretend you are a librarian recommending this book to someone. Synthesis Write a paragraph telling what you would say.
No
Kalimat Soal
196 Revise and process to improve the outcome.
Kategori Synthesis
197 Revise how to complete a complex task in order to improve the Synthesis outcome. 198 Suppose Phillip wasn't rescued shortly after Timothy's death. Synthesis How long could he have survived? 199 Using information from the book about one of the main Synthesis characters, rewrite the ending of the book. 200 Write a letter to the editor on a social issue of concern to you.
Synthesis
201 Write a logically organized argument in favor of a given position.
Synthesis
202 Write a poem about this book.
Synthesis
203 Write a set of rules to prevent what happened in the story.
Synthesis
204 Write a short story relating a personal experience in the style of a Synthesis picaresque novel. 205 Write a song about 'Old MacDonald' who had a bulldozer instead Synthesis of a farm. Write an essay in not more than 250 words about India and 206 Technological Advancement. Use active voice as much as Synthesis possible. 207 Compose a class story.
Synthesis
208 Compose a rhythm or put new words to a known melody.
Synthesis
Create a new song for the opening line of "Mary had a little 209 lamb".
Synthesis
210 Describe what it must have been like to be blind and all alone on Synthesis the Cay. 211 Design a machine to perform a specific task.
Synthesis
212 Develop a hypothesis.
Synthesis
213 Develop one plausible ending for all three short stories below.
Synthesis
214 Devise plans to market or make artwork more valuable.
Synthesis
215 Draw a painting that uses various principles of perspective to Synthesis achieve its effect. 216 How could you re-write this story with a city setting?
Synthesis
No
Kalimat Soal
Kategori
How would the U.S.A. be different if the South had won the Civil 217 War?
Synthesis
218 Invent a machine to do a specific task.
Synthesis
219 Make up a new language code and write material using it.
Synthesis
220 Revises and process to improve the outcome.
Synthesis
After examining the videotape of a play in a football game, 221 determine the degree to which the defensive team performed Evaluation effectively and suggest ways in which it could have responded more effectively 222 Appraise data in support of a hypothesis
Evaluation
223 Appraise the speech's effectiveness based upon the class' criteria.
Evaluation
Assess the appropriateness of an author's conclusions based on Evaluation 224 the evidence given 225 Assess
the
relative
effectiveness
of
different
graphical Evaluation
representations of the same data or biological concept 226 Assess the strengths and weaknesses of the current Olympics and Evaluation recommend action that should be taken in future Olympics: What can be improved, reformed or rejected? 227 Award the contract to the best proposal. Rank the principles of Evaluation "good sportsmanship" in order of importance to you 228 Can you defend the idea that Simon's incident with the pig's head Evaluation is the most mystical in the story? 229 Choose and illustrate the two most important events in the story.
Evaluation
230 Critique the other student's (or your own) speech, based on the Evaluation criteria we have studied this semester. 231 Decide whether you could have survived on the island blind and Evaluation alone. Write about things that would have been challenging making sure to judge the difficulty level for yourself. Decide whether you learned enough about electricity from this 232 book.
Evaluation
233 Decide which candidate would best fill the position of principal.
Evaluation
No
Kalimat Soal
Kategori
Describe the economic consequence of a neolocal society. 234 Support your description with information you have learned from this course.
Evaluation
235 Design a healthy menu that you think most people would enjoy Evaluation using the healthy eating guide. 236 Evaluate a work of art, giving the reasons for your evaluation.
Evaluation
237 Evaluate appropriate and inappropriate actions of characters.
Evaluation
238 Evaluate two Internet sources of information about the Egyptians. Evaluation Which would be a better choice for your purpose and why? 239 Evaluate whether their model is a true representation of the local Evaluation environment. 240 Evaluate your own or a peer's essay in terms of the principles of Evaluation composition discussed during the semester. Explain choices made in making recommendations to an end 241 user.
Evaluation
242 Given an argument on any position, enumerate the logical Evaluation fallacies in that argument. 243 Given the data available on a research question, take a position Evaluation and defend it. 244 Given the data we've looked at on this topic, evaluate how Evaluation appropriate this conclusion is and defend your answer. 245 In a given clinical situation, select the most reasonable Evaluation intervention and predict the main effects and possible side effects. 246 Judge aesthetic qualities and relationship to future values. Evaluation 247 Justify and nominate ways to prevent animal extinction.
Evaluation
248 Predict what will happen next in.
Evaluation
Recommend how our classroom or playground could be 249 improved.
Evaluation
250 Select the best proposal for a proposed water treatment plant.
Evaluation
251 Solve terrorism. What strategies should be put in place at future
Evaluation
Olympic Games.
No
Kalimat Soal
Kategori
252 Tell about the most exciting part of the book. being sure to give Evaluation at least three reasons why? 253 Two pieces of sculpture from different eras and artists are Evaluation displayed. Study thse two pieces, use the compare-contrast method to determine which piece you prefer and write a 2-3 page report that describes your thinking process as you studied these pieces. Utilize the skills you have learned as we have studied various pieces of sculpture ver the past two weeks. 254 Using the basic principles of socialism discussed in this course, Evaluation evaluate the US economic system by providing key arguments to support your judgment. 255 Was Hemingway a great American writer? First you will need to Evaluation define greatness. 256 Would you have liked to have had Cinderella for a sister? Evaluation Explain why or why not. 257 Write a list of criteria to judge the Willy raps.
Evaluation
258 Write a review for the story and specify the type of audience that Evaluation would enjoy this book. 259 After designing an experiment, examining the results, and Evaluation drawing conclusions, determine in what ways the experiment could be conducted more effectively in order to draw more productive conclusions in the future. Construct a poster that will advertise your new food product in an Evaluation 260 exciting and irresistible way. 261 Critique an experimental design or a research proposal.
Evaluation
Decide whether you are in favor of building on a floodplain; Evaluation 262 defend your position in a debate. Establish criteria for making this choice and defend your final 263 selection.
Evaluation
264 Evaluate board games and justify why rules are important.
Evaluation
No
Kalimat Soal
Kategori
265 Examine the stated positions of both major political candidates Evaluation with regard to a particular issue and state good reasons (based on principles discussed in class) for why one candidates position is more likely to be effective than the other's. 266 Explain and justify a new budget.
Evaluation
267 Judge whether it would be possible to survive on an island alone Evaluation and blinded. Write about it. 268 Judge whether Olympic Ideals are realistic or unrealistic for the Evaluation contemporary elite athlete. 269 Listen to two classmates conversing on tape and critique their Evaluation performance on the basis of the skills covered this semester. 270 Predict whether Phillip will ever go back to visit the Cay after Evaluation several years of recovery. Assuming he gets his sight back. 271 Select the most effective solution.
Evaluation
272 Use a judge and jury to discuss the statement 'Children enjoy Evaluation Anthony Browne's books because of the illustrations'. 273 What criteria would you use to assess the validity of a business Evaluation contract? 274 What is part of this book did you like best. Tell why you like it?
Evaluation
LAMPIRAN III DRAFT ARTIKEL ILMIAH
Wrapper based Feature Selection for Naive Bayes with Kernel Density Estimationon Question Classification According to Bloom’s Cognitive Level Catur Supriyanto,Bowo Nurhadiono
Sukardi
Faculty of Computer Science Dian Nuswantoro University Semarang, Indonesia
[email protected],
[email protected]
Program of Informatics Engineering AdhiGunaInstitute of Informatics and Computing Palu, Indonesia
[email protected]
Abstract—Thisstudyinvestigateswrapper based feature selection to improve Naïve Bayes withKernel Density Estimation. The performance of the wrapper based feature selectionwas evaluated on question classification according tocognitive levels of Bloom’s taxonomy. Compare to another classifier, Naïve Bayes has good accuracy and speed for large training dataset, but poor for small training dataset. The best features of dataset need to be selected to improve the accuracy of Naïve Bayes. This paper used forward selection and backward elimination as wrapper based feature selection methodto improve the accuracy of Naïve Bayes.The result showed that wrapper based feature selection improved the performance of the Naïve Bayes withKernel Density Estimation. Forward selection outperforms backward elimination with 67.18% accuracy rate while backward elimination achieves only 60.9%. Keywords—bloom’s cognitive level; naïve bayes; kernel density estimation; wrapper based feature selection
Introduction Bloom’s taxonomy has been developed by Benjamin Samuel Bloom [1] and widely used to categorize the question item set based on student’s deep understanding. Six levels of Bloom’s taxonomy are Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation. Higher level of bloom’s taxonomy indicates the deeper level of student’s knowledge[2]. Bloom’s taxonomy is used for designing and assessing learning objectives. Teacher or lecturer needsto design the learning objective and question item set to know the student understanding. In fact, it is difficult to design the quality of test items based on
bloom’s taxonomy. Therefore, intelligence system was used to support the teacher for designing the test items. Automatic questions classification into Bloom's cognitive levels using Support Vector Machines (SVM) has been studied by Yahya and Osman [3]. In their research, SVM produced good performance, but the complexity of SVM is very high [4].Another research on Bloom's cognitive levels using machine learning algorithm was done byNorazahYusof and Chai Jing Hui[5][6].They usedBackpropagationNeural Network to determine item set question into six levels of Bloom’s taxonomy. They proposed Document Frequency (DF) and Category Frequency-Document Frequency Method (CF-DF) as feature selection to reduce the complexity ofBackpropagation algorithm. The result showed that DF feature reduction method can be considered as a more effective feature reduction method than the whole feature set or the CF-DF feature reduction method.However,execution time of training Backpropagation algorithm is very slow[7]. Then, the accuracy of their question classification still can be improved. In our work, Naïve Bayes with Kernel Density Estimationusing wrapper based feature selection is proposed. Naïve Bayes has good accuracy and speed for the large training dataset andtherefore Naïve Bayes cannot overcome small training
dataset [8]. This paper applieswrapped based feature selections to improve the performance of Naïve Bayes on small dataset. Based on classification criterion, feature selection can be classified into filter and wrapper based feature election [9]. Filter based feature selection selects the informative features by ranking them according to a criterion function [10]. The wrapper method takes feature selection and pattern classification as a whole and evaluates feature subsets based on classification results directly [11]. Wrapper methods are widely recognized as a superior alternative in supervised learning problems[12].Wrapper methods select the best feature based on classification results. This paper used forward selection and backward elimination as wrapper based method. The difference between forward selection and backward elimination is forward selection starts with the empty set of features, and vice versa. The remainder of this paper is organized as follows. Section 2 introduces the theoretical background. Section 3 presents our conducted experiment and the result. Section 4 is devoted for conclusion.
Theoretical Background Cognitive Level of Bloom’s Taxonomy Educational objectives in the Bloom’s Taxonomy contains of three domains: cognitive, affective and psychomotor. Cognitive domain involves the intellectual skills, while the acts as the emotional and attitudinal areaffective domain component and the psychomotor domain involves the physical skills [13]. This paper focused on cognitive domain of bloom’s taxonomy. Table I shows the keyword relate to cognitive domain of bloom’s taxonomy. TABLE I.
KEYWORDS USED IN KOGNITIVE DOMAIN[14]
shows, solves, uses.
Analysis
Analyzes, breaks down, compares, contrasts, diagrams, deconstructs, differentiates, discriminates, distinguishes, identifies, illustrates, infers, outlines, relates, selects, separates.
Synthesis
Categorizes, combines, compiles, composes, creates, devises, designs, explains, generates, modifies, organizes, plans, rearranges, reconstructs, relates, reorganizes, revises, rewrites, summarizes, tells, writes.
Evaluation
Appraises, compares, concludes, contrasts, criticizes, critiques, defends, describes, discriminates, evaluates, explains, interprets, justifies, relates, summarizes, supports.
Naïve Bayes Classifier Naïve Bayes is a simple probabilistic machine learning algorithm[15]. Naïve Bayes is one of classification algorithm that needs training data to predict the unknown data. Naïve bayes classification is computed based on Bayesian theory. P(C | d )
P(C ) P(d | C ) P(d )
or the probability that a given document D belongs to a given class C. P(d ) is the probability of a document, we can notice that P(d ) is a Constance divider to every calculation, so we can ignore it. P(C ) is the probability of a class, we can compute it from the number of documents in the category divided by documents number in all categories. P(d | C ) represents the probability of document given class, and documents can be modeled as sets of words, thus the P(d | C ) can be written like:
Category
Keywords
Knowledge
Defines, describes, identifies, knows, labels, lists, matches, names, outlines, recalls, recognizes, reproduces, selects, states.
P( d | C )
Comprehends converts, defends, distinguishes, estimates, explains, extends, generalizes, gives examples, infers, interprets, paraphrases, predicts, rewrites, summarizes, translates.
P(C | d ) P(C )
Comprehension
Application
Applies, changes, computes, constructs, demonstrates, discovers, manipulates, modifies, operates, predicts, prepares, produces, relates,
(1)
P(C | d ) is the probability of class given a document,
P( w | C )
(2)
i
P( w | C ) i
(3)
P(wi | C ) is the probability that the i-th word of a given document occurs in a document from class C, and this can be computed as follows:
P( wi | C )
Tc M N
3. Forward selection start from the empty set Fk 0
(4) Where Tc is the number of times the word wi that occur inclass C , M is the number of words in category C , N is The size of the vocabulary table, is the positive constant, usually 1, or 0.5 to avoid zero probability.
Naïve Bayes withKernel Density Estimation Kernel Density Estimation (KDE) can manipulate quantitative attributes for naïve-Bayes [16]. To deal with the quantitative data, naïve Bayes use normal Gaussian distribution. f g ( x, i , c )
1 2 c
1 P( D d | C c) n
e
( x i )2 2 c 2
(5)
g (x, , ) i
c
(6) Where i is the range of training data for the attribute i
x in class C , i xi and c
1 nc
, nc is the number
of document in class C .
Wrapper based Feature Selection This paper proposed wrapper based feature selection to improve the performance of Naïve Bayes with Kernel Density Estimation.Wrapper based feature selection utilizes classifier algorithm to evaluate the feature subset by measuring the cross-validation. Commonly wrapper based feature selection are forward selection and backward elimination. The forward selection algorithm starts with an empty feature subset. In each iteration, addingone feature each forward step until a predefined number of features is reached. For one step, each candidate feature is separately added to the current subset and then evaluated. The feature that induced the highest improvement is included in the resulting subset.
4. Iterate a. Select the next best feature j to add to Fk with most significant cost reduction j arg max J Fk j
b. Update Fk 1 Fk j , k k 1 Fig. 1. The Algorihm of Forward Selection
Backward elimination is difference from forward selection. Backward elimination starts with the full features subset. The feature is removed when the feature decrease the accuracy of classifier. Fig. 1 and Fig. 2 show the algorithm of forward selection and backward elimination, respectively. 3. Start with the full set Fk 1,..., n and k n 4. Iterate a. Remove the worst feature j arg max J Fk j
b. Update Fk 1 Fk j , k k 1 Fig. 2. The Algorihm of Backward Elimination
Experiment And Result Question Dataset Dataset of this paper was collected from Chai Jing Hui[6]. The question item dataset contains 274 items belongs to six bloom taxonomy level. The distribution of the question item dataset is shown in table II. In order to have good performance of the experiment, k-fold cross-validation has been used. We used 10-fold cross-validation using stratified sampling in the experiment. TABLE II.
DISTRIBUTION OF QUESTION DATASET
Name of Category Knowledge Comprehension Application Analysis Synthesis Evaluation
Number of Question 28 44 41 48 59 54
Performance Measures The performance evaluation of this studywas measured using confusion matrix (Table
III).Confusion matrix can produces two performance measurements: accuracy and kappa coefficient. Kappa is a coefficient to evaluate the agreement between two different observers. Kappa can be employed as a classifier performance measure[17]. TABLE III.
CONFUSION MATRIX Actual Positive
Actual Negative
Predicted Positive
TP
FP
Predicted Negative
FN
TN
TP TN TP TN FP FN Accuracy Pc Kappa 1 Pc
Accuracy
Pc
(7) (8)
(TP FN ) (TP FP) ( FP TN ) ( FN TN ) (9) NN
Experiment results This subsection presents the result of experiment. Our experiment used tokenization, stopword removal, and stemming for preprocessing of question item set document.This paper used stratified 10-fold validation to test the classification result. Cross validation is standard evaluation technique in pattern classification, in which the dataset is split into n parts (folds) of equal size, n 1 folds are used to train the classifier, and the nth fold that was held out is then used to test it [18]. Experiments were conducted with RapidMiner 5.3.005. Three conditions were observed by Naïve Bayes performance: Naïve Bayes Kernel Density Estimation without feature selection (NBK), Naïve Bayes Kernel Density Estimation using Forward Selection (NBK+FS), and Naïve Bayes Kernel Density Estimation using Backward Elimination (NBK+BE). TABLE IV. THE PERFORMANCE OF FORWARD SELECTION AND BACKWARD ELIMINATION ON NAÏVE BAYES WITH KERNELDENSITY ESTIMATION Kappa
60.9%+/- 11.95%
0.525+/- 0.142
1861.8
From Table IV, it can be seen that the using of forward selection and backward elimination improved the accuracy of Naïve Bayes with Kernel Density Estimation. Forward selection has the best accuracy and lower computation compared to backward elimination.Our approach also improves the performance of Chai Jing Hui approach[5]. The accuracy of our proposed method reaches 67.18%.Meanwhile, the performance of Chai Jing Hui only reached 65.9%. Table V and Table VI showed the detailed confusion matrix of Naïve Bayes Using Forward Selection and Naïve Bayes using Backward Elimination, respectively.
Conclusion
Where TP is true positive, TN is true negative, FP is false positive, FN is false negative, and N is total documents.
Accuracy
NBK+BE
Computation Time (seconds)
NBK
57.61%+/- 5.64%
0.484+/- 0.072
2
NBK+FS
67.18%+/- 7.77%
0.598+/- 0.096
748.8
This paper studies the wrapper based feature selection on Naïve Bayes with Kernel Density Estimation to work on question classification according to cognitive level bloom’s taxonomy. We compared the performance of forward selection and backward elimination due to accuracy, kappa, and computational time. The result shows that Naïve Bayes using forward selection has a better accuracyand lower computation timecompared to backward elimination. We expect its performance can still be improved. In the future, more experiment should be conducted to improve the accuracy of Naïve Bayes and reduce the computational time of wrapper based feature selection.
(1)
References
[1] Benjamin Samuel Bloom, Taxonomy of Educational Objectives, Handbook I. The Cognitive Domain. New York: David McKay Co Inc., 1956. [2] Timothy Highley and Anne E. Edlin, "Discrete Mathematics Assessment Using Learning Objectives Based on Bloom’s Taxonomy," in Proceedings of the 39th IEEE international conference on Frontiers in education conference, 2009. [3] Anwar Ali Yahya and Addin Osman, "Automatic classification of questions into
Bloom's cognitive levels using support vector machines ," in The international arab conference on information technology, 2011. [4] M. Janaki Meena and K. R. Chandran, "Naïve Bayes Text Classification with Positive Features Selected by Statistical Method," in First International Conference on Advanced Computing, 2009.
Biopharmaceutical Statistics, vol. 18, no. 5, pp. 827-840, 2008. [11] K. Z. Mao, "Orthogonal Forward Selection and Backward Elimination Algorithms for Feature Subset Selection," IEEE Transactions on Systems, Man, and Cybernetics, vol. 34, no. 1, pp. 629-634, February 2004.
[5] Norazah Yusof and Chai Jing Hui, "Determination of Bloom’s Cognitive Level of Question Items using Artificial Neural Network," in International Conference on Intelligent Systems Design and Applications, 2010, pp. 866-870.
[12] Ranjit Abraham, Jay B. Simha, and S. Sitharama Iyengar, "Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining," International Journal of Computational Intelligence Research, vol. 5, no. 2 , pp. 116– 129, 2009.
[6] Chai Jing Hui, "Feature Reduction for Neural Network in Determining the Bloom’s Cognitive Level of Question Items," Universiti Teknologi Malaysia, Computer Science Master's Project Report 2009.
[13] David Andrich, "Framework relating outcomes based education and the taxonomy of educational objectives. ," Studies in educational Evaluation, vol. 28, pp. 35-59, 2002.
[7] Ji Peirong, Wang Peng, Zhao Qin, and Zhao Li, "A new parallel back-propagation algorithm for neural networks," in IEEE International Conference on Grey Systems and Intelligent Services, 2011, pp. 807-810.
[14] Clark. C, "Learning Domains of Bloom’s Taxonomy: The Three Types of Learning," , 2001.
[8] Li Lei, Huang Yu-guang, and Liu Zhong-wan, "Chinese text classification for small sample set," The Journal of China University of Posts and Communications, vol. 18, no. 1, pp. 83–89, 2011.
[15] Fadi Thabtah, Mohammad Ali H. Eljinini, Mannam Zamzeer, and Wa’el Musa Hadi, "Naïve Bayesian Based on Chi Square to Categorize Arabic Data," Communications of the IBIMA, vol. 10, no. 20, pp. 158-163, 2009.
[9] R.Kohavi and G. H. John, "Wrappers for feature subset selection," Artif. Intell., vol. 97, no. 1-2, pp. 273–324, 1997.
[16] Jingli Lu, Ying Yang, and Geoffrey I. Webb, "Incremental discretization for Naïve-Bayes classifier," in Proceedings of the Second international conference on Advanced Data Mining and Applications, 2006, pp. 223-238.
[10] Yuh-Jye Lee, Chien-Chung Chang, and ChiaHuang Chao, "Incremental Forward Feature Selection with Application to Microarray Gene Expression Data," Journal of
[17] R. Parimala and Dr. R. Nallaswamy, "A Study of Spam E-mail classification using Feature Selection package," Global Journal of Computer Science and Technology, vol. 11, no.
7 , 2011. [18] D.M.Chandwadkar and Dr. M.S.Sutaone, "Role of Features and Classifiers on Accuracy of Identification of Musical Instruments," in CISP2012, 2012.
TABLE V.
CONFUSION MATRIX FOR NAÏVE BAYES USING FORWARD SELECTION
true Knowledge
true Comprehension
true Application
true Analysis
true Synthesis
true Evaluation
class precision
22
0
0
1
2
1
84.62%
pred. Comprehension
0
30
0
3
3
1
81.08%
pred. Application
0
1
26
0
0
0
96.30%
pred. Analysis
0
2
1
27
4
4
71.05%
pred. Synthesis
6
11
12
17
49
18
43.36%
pred. Evaluation
0
0
2
0
1
30
90.91%
78.57%
68.18%
63.41%
56.25%
83.05%
55.56%
pred. Knowledge
class recall
TABLE VI.
CONFUSION MATRIX FOR NAÏVE BAYES USING BACKWARD ELIMINATION
true Knowledge
true Comprehension
true Application
true Analysis
true Synthesis
true Evaluation
class precision
18
2
3
1
0
1
72.00%
pred. Comprehension
4
27
4
6
4
2
57.45%
pred. Application
1
1
20
6
1
4
60.61%
pred. Analysis
2
8
3
25
4
7
51.02%
pred. Synthesis
1
6
5
3
40
3
68.97%
pred. Evaluation
2
0
6
7
10
37
59.68%
64.29%
61.36%
48.78%
52.08%
67.80%
68.52%
pred. Knowledge
class recall