LAPORAN PENELITIAN IPTEKS KLASIFIKASI KALIMAT SOAL MENGGUNAKAN ALGORITMA NAIVE BAYES BERBASIS KERNEL DENSITY ESTIMATION DENGAN SELEKSI FITUR

PENELITIAN IPTEKS

LAPORAN PENELITIAN IPTEKS

KLASIFIKASI KALIMAT SOAL MENGGUNAKAN ALGORITMA NAIVE BAYES BERBASIS KERNEL DENSITY ESTIMATION DENGAN SELEKSI FITUR

OLEH : Bowo Nurhadiyono,S.Si, M.Kom, NPP : 0686.11.1996.102 T Sutojo, S.Si., M.Kom, NPP : 0686.11.1996.094 Catur Supriyanto, S.Kom, M.CS, NPP : 0686.11.2011.415

FAKULTAS ILMU KOMPUTER UNIVERSITAS DIAN NUSWANTORO SEMARANG MEI 2013

RINGKASAN DAN SUMMARY Proses otomatisasi penentuan kalimat soal berbasis Taksonomi Bloom perlu dilakukan untuk membantu dosen atau guru dalam proses penyusunan soal ujian. Penelitian ini mencoba melakukan otomatisasi penentuan kalimat soal berbasis Taksonomi Bloom dengan menggunakan algoritma Naïve Bayes berbasis kernel density estimation dengan seleksi fitur. Hasil dari penelitian ini menunjukan hasil yang lebih baik dari peneliti sebelumnya. Dengan dataset yang sama, penelitian ini mampu mencapai tingkat akurasi sebesar 77,8% melebihi akurasi peneliti sebelumnya sebesar 65,26%. Untuk penelitian mendatang, kalimat soal berbahasa Indonesia dapat digunakan sebagai dataset sehingga kedepannya perangkat lunak tersebut dapat diimplementasikan. Pemilihan algortima klasifikasi yang lain perlu dilakukan sehingga tingkat akurasi penentual kalimat soal berdasar Taksonomi Bloom ini dapat ditingkatkan.

Kata Kunci: Kernel Density Estimation, Bloom

Klasifikasi,Naïve Bayes, Seleksi Fitur,Taksonomi

PRAKATA Alhamdulillah, kami ucapkan kepada Allah SWT atas kehendakNya kami dapat menyelesaikan penelitian ini tepat pada waktu yang telah ditentukan, selain itu kami juga mengucapkan banyak terima kasih kepada : 1.

Bapak DR. Ir. Edi Noersasongko, M.Kom, selaku Rektor Universitas Dian Nuswantoro

2.

Bapak DR. Abdul Syukur, selaku Dekan Fakultas Ilmu Komputer Universitas Dian Nuswantoro

3.

Bapak Tyas Catur Pramudi, S.Si., M.Kom, selaku Ketua LP2M Universitas Dian Nuswantoro

4.

Bapak DR. Heru Agus Santoso, M.Kom, selaku Ketua Program Studi Teknik Informatika Fakultas Ilmu Komputer Universitas Dian Nuswantoro

5.

Teman-Teman Dosen Fakultas Ilmu Komputer

Yang telah membantu kami baik langsung maupun tidak langsung dalam penelitian yang kami lakukan, semoga kebaikan Bapak-Bapak mendapat ganjaran dari Allah SWT, amin....

Semarang, 25 Mei 2013 Tim Peneliti

DAFTAR ISI

HALAMAN ENGESAHAN ......................................................................................... i RINGKASAN ................................................................................................................ ii PRAKATA ..................................................................................................................... iii DAFTAR ISI................................................................................................................... iv DAFTAR TABEL .......................................................................................................... vi DAFTAR GAMBAR .................................................................................................... vii DAFTAR LAMPIRAN.................................................................................................

viii

BAB I : PENDAHULUAN 1.1. Latar Belakang ...................................................................................................... 1 1.2. Perumusan Masalah .............................................................................................. 2 1.3. Batasan Penelitian ................................................................................................ 3 BAB II : TINJAUAN PUSTAKA 2.1. Pendahuluan ......................................................................................................... 4 2.2. Taksonomi Bloom ................................................................................................ 4 2.3. Text Prossesing ...................................................................................................

4

2.4. Term Weighting ................................................................................................... 5 2.5. Seleksi Fitur .......................................................................................................... 6 2.5.1. Seleksi Fitur Berbasis Ranking ............................................................................ 6 2.5.2. Seleksi Fitur Berbasis Wrapper .......................................................................... 7 2.6. Algoritma Naive Bayes ........................................................................................ 8 2.6.1. Algoritma Naive Bayes Berbasis Kerenl Density Estimation.............................. 9 BAB III : TUJUAN DAN MANFAAT PENELITIAN 3.1. Tujuan .................................................................................................................. 11 3.2. Manfaat Penelitian ................................................................................................ 11 BAB IV : METODE PENELITIAN 4.1. Pendahuluan ......................................................................................................... 12 4.2. Metode Pengumpulan Data .................................................................................. 12 4.3. Pengolahan Awal Data ......................................................................................... 13 4.4. Metode yang Diusulkan ........................................................................................ 13 4.5. Evaluasi ..............................................................................................................

13

BAB V : HASIL DAN PEMBAHASAN 5.1. Pendahuluan ......................................................................................................... 15 5.2. Eksperimen Menggunakan Rapidminer ............................................................... 15 5.3. Hasil dan Pembahasan ......................................................................................... 18 BAB VI : KESIMPULAN DAN SARAN ..................................................................... 23 1.1. Pendahuluan ........................................................................................................

23

1.2. Pencapaian Penelitian .......................................................................................... 23 DAFTAR PUSTAKA ................................................................................................... 24 LAMPIRAN ................................................................................................................... 26

DAFTAR TABEL

Tebel 2.1 Kata kunci dan Level Kognitif Taksonomi Bloom………………………..... 4 Tabel 4.1 Distribusi Dataset …………………………………………………………… 14 Tabel 4.2 Confussion Matrix………………………………………………………….... 14 Tabel 5.1 Kinerja dari Naive Bayes dgn Forward Selection &Backward Elimination

18

Tabel 5.2 Confussion Matrix Forward Selection ........................................................... 19 Tabel 5.3 Confussion Matrix Backward Elimination .................................................... 19 Tabel 5.4 Akurasi Naïve Bayes dengan Beberapa Seleksi Fitur ...................…………. 21 Tabel 5.5 Kappa Naïve Bayes dengan Beberapa Seleksi Fitur ………………………… 21 Tabel 5.6 Waktu Komputasi Naïve Bayes dengan Beberapa Seleksi Fitur …………… 21

DAFTAR GAMBAR

Gambar 2.1 Term Document Matrix ………………………………………………….. 8 Gambar 2.2 Algoritma Forward Selection .................................................................... 8 Gambar 2.3 Algoritma Backward Elimination ............................................................. 8 Gambar 4.1 Model Klasifikasi yang Diusulkan ............................................................ 13 Gambar 5.1 Urutan Proses Klasifikasi di Rapidminer ................................................. 16 Gambar 5.2 Penentuan Label Dataset di Rapidminer ................................................... 16 Gambar 5.3 Uraian Prepocessing Dokumen di Rapidminer ......................................... 17 Gambar 5.4 Proses Validasi Menggunakan 10-Fold Validation ................................... 17 Gambar 5.5 Proses Training dan Testing di Rapidminer ............................................... 18 Gambar 5.6 Akurasi …………………………………………….........………………... 20 Gambar 5.7 Kappa …………………………………………………………………..... 20

DAFTAR LAMPIRAN

Lampiran I : Riwayat Penelitian Ketua Peneliti dan Anggota Peneliti .......................

26

Lampiran II : Dataset Penelitian ..................................................................................

29

Lampiran III : Draft Artikel Ilmiah ………………......……………………………….. 43

BAB I PENDAHULUAN 1.4. Latar Belakang Taksonomi Bloom banyak dipakai pada bidang pendidikan. Taksonomi Bloom digunakan sebagai dasar perencanaan dan desain tujuan suatu pembelajaran. Untuk menghitung keberhasilan pelajar pada pelajaran tertentu, dibutuhkan pengujian seperti kuis atau tes yang memainkan peranan penting. Pada umumnya, pengajar menyediakan soal tes untuk beberapa tingkat pengujian yang sama dengan pelajaran yang sudah ditempuh untuk menentukan siswa tersebut mencapai level pengetahuan tertentu [1]. Menurut Benjamin Bloom yang dikenal teorinya dengan Taksonomi kognitif bloom, beliau menuturkan bahwa dalam menyusun alat evaluasi untuk mencapai hasil belajar, hendaknya memiliki beberapa tingkat berpikir menengah hingga tinggi. Aspek kognitif itu meliputi pengetahuan (knowledge), pemahaman (comprehension), aplikasi (application), analisis (analysis), sintesis (synthesis) dan evaluasi (evaluation). Akan tetapi proses mempersiapkan dan mendesain soal tes yang memenuhi kriteria taksonomi bloom sangat menghabiskan waktu dan sulit untuk dilakukan [1], mengingat banyaknya jumlah soal yang perlu dikategorikan sesuai taksonomi bloom secara manual. Algoritma pembelajaran machine learning diusulkan untuk mengatasi masalah tersebut. Dengan menggunakan algoritma machine learning penentuan kategori pada soal tersebut dapat dilakukan secara otomatis tanpa bantuan manusia.Yahya dan Osman mengusulkan penggunaan algoritma Support Vector Machine (SVM) untuk penentuan soal berdasar taksonomi bloom[2]. Dalam penelitian tersebut, SVM menghasilkan tingkat akurasi yang baik, namun memiliki kompleksitas yang cukup tinggi [3]. Penelitian lainnya adalah penelitian Chai Jing Hui[1] [4] yang menggunakan algoritma jaringan syaraf tiruan untuk penentuan label taksonomi bloom secara otomatis.Masalah utama yang diangkat pada penelitian tersebut adalah besarnya ruang vektor yang dihasilkan dari banyaknya kumpulan soal. Penelitian tersebut menggunakan Vector Space Model (VSM) untuk merepresentasikan kumpulan soal atau dokumen. Setiap soal akan membentuk satu vektor, dimana elemen dari vektor tersebut dapat berupa jumlah tiap kata (word/token) untuk satu dokumen tersebut. Untuk mengatasi masalah tersebut, Chai Jing Hui mengusulkan penggunaan document frequency (DF) dan Category Frequency-Document Frequency

Method (CF-DF) sebagai unsupervised feature selection untuk mengurangi kompleksitas dari algoritma neural network. Rendahnya tingkat akurasi yang dihasilkan pada penelitian Chai Jing Hui [1] menjadi alasan penelitian ini dilakukan. Penelitian ini mengusulkan penggunaan algoritma naïve bayesberbasis kernel density estimation dengan fitur seleksi. Algoritma naïve bayes dipilih dikarenakan algoritma ini sangat cocok untuk short data text. Seperti penelitian yang dilakukan oleh Kuruvilla Mathew dan Biju Issac [5], algoritma naïve bayes mampu melakukan prediksi spam mail dengan tingkat akurasi melebihi 98%. Penelitian ini menggunakan menggunakan algoritma naïve bayes berbasis kernel density estimation untuk mendapatkan akurasi yang lebih baik. Permasalahan yang kedua yang di angkat pada penelitian ini adalah masalah komputasi algoritma naïve bayes berbasis kernel density estimation. Seperti yang dipaparkan oleh Jingli Lu, et al.[6] bahwa penggunaan data yang besar menyebabkan lemahnya komputasi algoritma naïve bayes berbasis kernel density estimation. Sehingga penelitian ini juga mengusulkan dua tahapan seleksi fitur, yaitu seleksi fitur berbasis filter dan wrapper. Jenis seleksi fitur berbasis rangking yang dipilih adalah supervised feature selection. Penelitian Yang dan Pederson [7] menyebutkan bahwa kemampuan supervised feature selection lebih baik daripada unsupervised feature selection untuk menghasilkan tingkat akurasi algoritma klasifikasi yang tinggi.Supervised feature selection yang digunakan pada penelitian ini adalah chi squared statistic dan infomation gain. Tahapan kedua seleksi fitur yang digunakan adalah Backward Elimination dan Forward Selection. Dalam penelitian Khoerudin[8], kedua fitur seleksi ini mampu meningkatkan akurasi algoritma Naïve Bayes.

1.5. Perumusan Masalah Dari uraian latar belakang tersebut maka masalah yang melatar belakangi penelitian ini adalah masih rendahnya tingkat akurasi penentuan kalimat soal yang dilakukan oleh Chai Jing Hui[1].Penelitian tersebut menggunakan algoritma backpropagation neural network dengan seleksi fitur Document Frequency (DF) dan Category Frequency – Document Frequency (CF-DF). Dengan dataset yang sama, penelitian ini mengusulkan algoritma naïve bayes berbasis kernel density estimation dengan dua tahapan seleksi fitur

1.6. Batasan Penelitian Batasan masalah pada penelitian ini adalah: 1. Proses klasifikasi hanya pada level kognitif taksonomi bloom (knowledge, comprehension, application, analysis, synthesis, evaluation) 2. Dataset yang digunakan diambil dari penelitian Chai Jing Hui[1]

BAB II TINJAUAN PUSTAKA 2.1. Pendahuluan BAB ini memaparkan beberapa teori dan penelitian terkait berkenaan dengan kognitif taksonomi bloom, algoritma Naïve Bayes, seleksi fitur, dan metode-metode yang digunakan untuk text mining.Beberapa daftar pustaka dalam penelitian ini diambil dari jurnal ataupun karya ilmiah mahasiswa

2.2. Taksonomi Bloom Tujuan pembelajaran dalam taksonomi bloom terdiri dari 3 domain, yaitu kognitif, afektif dan psikomotorik. Domain kognitif berhubungan dengan kemampuan intelektual, domain afektif berhubungan dengan kemampuan emosional, dan domain psikomotorik berhubungan dengan kemampuan secara fisik [9]. Penelitian ini focus pada domain kognitif taksonomi bloom. Tabel 2.1 menunjukan kata kunci dari domain kognitif taksonomi bloom.

2.3. Text Prossecing Preprocessing adalah tahapan mengubah suatu dokumen kedalam format yang sesuai agar dapat diproses oleh algoritma clustering. Terdapat 3 tahapan preprocessing dalam penelitian ini, yaitu: Tokenization, merupakan tahapan penguraian string teks menjadi term atau kata. Stopword Removal, merupakan tahapan penghapusan kata-kata yang tidak relevan dalam penentuan topik sebuah dokumen dan yang sering muncul pada sebuah dokumen, misal "and", "or", "the", "a", "an" pada dokumen berbahasa inggris. Tabel 2.1 Kata kunci dari Level Kognitif Taksonomi Bloom Category

Keywords

Knowledge

Defines, describes, identifies, knows, labels, lists, matches, names, outlines, recalls, recognizes, reproduces, selects, states.

Comprehension Comprehends converts, defends, distinguishes, estimates, explains, extends, generalizes, gives examples, infers, interprets, paraphrases, predicts, rewrites, summarizes, translates. Application

Applies, changes, computes, constructs, demonstrates, discovers, manipulates, modifies, operates, predicts, prepares, produces, relates, shows, solves, uses.

Analysis

Analyzes, breaks down, compares, contrasts, diagrams, deconstructs, differentiates, discriminates, distinguishes, identifies, illustrates, infers, outlines, relates, selects, separates.

Synthesis

Categorizes, combines, compiles, composes, creates, devises, designs, explains, generates, modifies, organizes, plans, rearranges, reconstructs, relates, reorganizes, revises, rewrites, summarizes, tells, writes.

Evaluation

Appraises, compares, concludes, contrasts, criticizes, critiques, defends, describes, discriminates, evaluates, explains, interprets, justifies, relates, summarizes, supports.

Stemming, merupakan tahapan pengubahan suatu kata menjadi akar kata nya dengan menghilangkan imbuhan awal atau akhir pada kata tersebut, misal eating → eat, extraction → extract. Penelitian ini menggunakan algoritma porter stemmer. 2.4. Term Weighting Model Vector space model banyak digunakan dalam sistem temu kembali dokumen teks [10]. VSM adalah model yang digunakan untuk mengukur kemiripan antar dokumen. VSM mengubah koleksi dokumen kedalam matrik term-document [11]. Matrik termdocument Gambar 2.1 tersebut memiliki dimensi mxn dimana m adalah jumlah term dan n adalah jumlah dokumen. Terdapat 3 metode pembobotan atau term weighting dalam VSM yaitu Term Frequency (TF), Invers Document Frequency (IDF) dan Term Frequency Invers Document Frequency (TFIDF). TF adalah banyaknya kemunculan suatu term dalam suatu dokumen, IDF adalah perhitungan logaritma antara pembagian jumlah total dokumen dengan cacah dokumen yang mengandung suatu term, dan TFIDF adalah perkalian antara TF dengan IDF. Semakin besar bobot TFIDF pada suatu term, semakin penting term tersebut untuk digunakan pada tahapan. Dalam penelitian ini digunakan TFIDF sebagai metode term weighting dimana TFIDF lebih sering digunakan. Rumus

perhitungan TFIDF ditunjukan pada rumus (1), dimana N adalah jumlah dokumen dan df adalah jumlah dokumen yang mengandung term t .

Gambar 2.1 Term Document Matrix

TFIDF  TF  IDF

(1) IDF  log( N / df )

(2) 2.5. Seleksi Fitur Seleksi Fitur atau yang lebih dikenal dengan feature selection, subset selection, attribute selection or variable selection adalah proses memilih fitur yang tepat untuk digunakan dalam proses klasifikasi atau klastering. Tujuan dari seleksi fitur ini adalah untuk mengurangi tingkat kompleksitas dari sebuah algoritma klasifikasi, meningkatkan akurasi dari algoritma klasifikasi tersebut, dan mampu mengetahui fitur-fitur yang paling berpengaruh terhadap tingkat akurasi [12]. Penelitian ini fokus pada penggunaan seleksi fitur berbasis search atau pencarian dan individual feature rangking. Untuk lebih detailnya dijelaskan pada subbab berikut ini. 2.5.1. Seleksi Fitur Berbasis Ranking Penelitian ini menggunakan 2 metode supervised feature selection yakni chi square dan information gain. Kedua metode ini digunakan untuk mengurangi penggunaan fitur dalam proses klasifikasi ataupun klustering dokumen teks. Information gain

mampu

memberikan akurasi yang lebih baik untuk klasifikasi sentiment analisis [13]. Formula

rumus chi square ditunjukan pada rumus (3) dan (4), sedangkan rumus information gain ditunjukan pada rumus (5).  2 t, c  

N ( A  D  B  C )2 ( A  B)  (C  D)  ( A  C )  ( B  D)

(3)  avg 2 

m

 P(C ) i

2

(t , c)

i 1

(4) n

IG(t )  

n

n

 P(C ) log P(C )  P(t ) P(C | t ) log P(C | t )  P(t ) P(C | t ) log P(C | t) i

i 1

i

i

i

i 1

i

i

i 1

(5) Dimana A adalah banyaknya soal di kategori C dan mengandung t , B adalah banyaknya soal bukan di kategori C dan mengandung t , C adalah banyaknya soal di kategori C dan tidak mengandung t , D adalah banyaknya soal bukan di kategori C dan tidak mengandung t , N adalah total jumlah soal, P(Ci ) adalah probabilitas kategori Ci , P(t )

adalah probabilitas term t , P(Ci | t ) adalah probabilitas term t di kategori Ci , P(Ci | t )

= probabilitas bukan term t di kategori Ci . 2.5.2. Seleksi Fitur Berbasis Wrapper Makalah ini mengusulkan seleksi fitur berbasis wrapper untuk meningkatkan kinerja Naïve Bayes dengan Kernel Density Estimation. Seleksi fitur berbasis wrapper menggunakan classifier untuk mengevaluasi subset fitur dengan mengukur cross-validasi. Pada penelitian ini menggunakan forward selection dan backward elimination untuk metode seleksi fitur berbasis wrapper. Algoritma forward selection dimulai dengan subset fitur kosong. Dalam setiap iterasi, menambahkan satu fitur setiap langkah ke depan sampai sejumlah standar fitur tercapai. Untuk satu langkah, setiap fitur kandidat secara terpisah ditambahkan ke bagian saat ini dan kemudian dievaluasi. Fitur yang diinduksi peningkatan tertinggi disertakan dalam hasil bagian.

1. Berawal dari himpunan set kosong Fk   0  2. Iterasi a. Memilih fitur yang terbaik j  untuk di masukan ke Fk dengan nilai penurunan yang signifikan j   arg max J  Fk  j  b. Update Fk 1  Fk  j  , k  k  1

Gambar 2.2 Algoritma Forward Selection Backward elimination berbeda dengan forward selection. Backward elimination dimulai dari subset fitur yang utuh. Fitur dalam subset akan dihapus jika fitur tersebut menyebabkan akurasi dari algoritma klasifikasi menurun. Gambar 2.2 dan Gambar 2.3 menunjukan algoritma dari forward selection dan backward elimination. 1. Berawal dari himpunan set fitur Fk  1,..., n and k  n 2. Iterasi a. Hapus fitur yang dapat mengurangi akurasi j   arg max J  Fk  j 

b. Update Fk 1  Fk  j  , k  k  1 Gambar 2.3 Algoritma Backward Elimination 2.6. Algoritma Naive Bayes Naïve bayes merupakan algoritma machine learning sederhana yang menerapkan teori probabilitas. Algoritma naïve bayes termasuk kedalam algoritma klasifikasi, yang artinya memerlukan proses training dalam melakukan prediksi. Klasifikasi Bayesian ini dihitung berdasarkan teorema Bayes (1). P(C | d )  P(C )

P(d | C ) P(d )

(6) Dimana P(C | d ) adalah probabilitas suatu class C terhadap suatu dokumen d , P(C ) adalah pobabilitas dari suatu class C yang dihitung dengan membagi banyaknya dokumen

d di dalam class C dengan banyaknya dokumen di semua class, P(d ) adalah probabilitas

dari suatu dokumen, nilai P(d ) dapat dihiraukan dikarenakan nilainya yang selalu sama. P(d | C ) adalah probabilitas suatu dokumen d di dalam class C .

Dokumen-dokumen tersebut dapat direpresentasikan sebagai kumpulan dari berbagai teks, sehingga menhasilkan persamaan (7). Sehingga persamaan (6) dapat ditulis menjadi persamaan (8).

P(d | C )   P(wi | C ) (7)

P(C | d )  P(C ) P(wi | C ) (8) Dimana P(wi | C ) adalah probabilitas term (word) ke- i yang muncul dari suatu dokumen di dalam class C , yang dapat dihitung dengan persamaan (8).

P( wi | C ) 

Tc   M  N

(9) Dimana Tc banyaknya kata wi yang muncul di class C, M adalah banyaknya kata yang muncul di class C, N adalah banyaknya term unik pada seluruh dokumen,  adalah konstanta positif yang biasanya bernilai 1 atau 0.5. nilai  ini untuk mencegah hasil 0 pada perhitungan probabilitas.

2.6.1. Algoritma Naive Bayes Berbasis Kernel Density Estimation Naïve Bayes berbasis Kernel Density Estimation (KDE) atau biasa dikenal Naïve Bayes Fleksibel digunakan untuk mengatasi masalah data bertipe kuantitatif[11]. Untuk menangani data kuantitatif tersebut, naïve bayes menggunakan pendekatan distribusi normal Gaussian. A Comparative Study among Different Kernel Functions in Flexible Naïve Bayesian Classification James N.K. Liu, Yu-Lin He, Xi-Zhao Wang, Yan-Xing Hu  1 f  g ( x, i ,  c )  e 2 c

( x  i )2 2 c 2

(10)

p( D  d | C  c) 

1  g (x, i , c ) n i

Dimana i adalah range untuk data training untuk atribut X di class C dan i  xi ,

c 

1 dimana nc adalah banyaknya dokumen training didalam class C . nc

(11)

BAB III TUJUAN DAN MANFAAT PENELITIAN 3.1. Tujuan Berdasarkan uraian latar belakang dan rumusan masalah diatas, maka tujuan dari penelitian ini adalah sebagai berikut: 1. Untuk mengevaluasi hasil penelitian Chai Jing Hui dengan menggunakan Algoritma Naive Bayes untuk meningkatkan akurasi yang lebih baik

3.2. Manfaat Penelitian Adapun manfaat dari penelitian ini yaitu menghasilkan model penentuan kalimat soal berbasis level kognitif taksonomi bloom yang lebih akurat sehingga dapat digunakan dalam membangun aplikasi pembuatan soal yang memenuhi standar taksonomi bloom. Aplikasi tersebut nantinya dapat membantu guru dan dosen dalam pembuatan soal ujian.

BAB IV METODE PENELITIAN 4.1. Pendahuluan Banyak algoritma kecerdasan buatan dikembangkan. Salah satunya algoritma Naïve Bayes. Dalam domain information retrieval, algoritma ini sangat baik untuk dataset yang terdiri dari kalimat-kalimat yang pendek. Permasalahan algoritma Naïve Bayes terletak pada data kuantitatif. Sehingga beberapa peneliti mengembangkan Naïve Bayes berbasis Kernel Density Estimation untuk mengatasi data kuantitatif. Penelitian ini mengusulkan algoritma tersebut untuk meningkatkan akurasi pada penelitian sebelumnya yang dilakukan oleh Chai Jing Hui [1]. Dengan dataset yang sama, penelitian ini tidak hanya mengusulkan algoritma klasifikasi yang berbeda, namun juga mengusulkan dua tahapan seleksi fitur.

4.2. Metode Pengumpulan Data Dataset yang digunakan pada penelitian ini mengacu pada penelitian yang dilakukan oleh Chai Jing Hui [1]. Dataset kalimat soal berjumlah 274 yang terbagi menjadi 192 kalimat soal untuk data training dan 82 kalimat soal untuk data testing, seperti ditunjukan pada Tabel 4.1.Detail kalimat soal yang diujikan dapat dilihat di lampiran. Dataset mengggunakan bahasa inggris, sehingga stopword yang digunakan untuk preprocessing data juga berbahasa inggris. Tabel 4.1 : Distribusi Dataset Label / Kategori Training Testing Knowledge

17

11

Comprehension

31

12

Application

29

13

Analysis

32

16

Synthesis

45

14

Evaluation

38

16

Total

192

82

4.3. Pengolahan Awal Data Pada bagian ini dijelaskan tentang tahap awal data mining. Pengolahan awal datameliputi proses input data ke format yang dibutuhkan, pengelompokan dan penentuan atribut data, serta pemecahan data (split) untuk digunakan dalam prosespembelajaran (training) dan pengujian (testing).10-fold validation digunakan untuk split data, dimana 90% data digunakan untuk training dan 10% data digunakan untuk testing[15]..

4.4. Metode yang Diusulkan Pada bagian ini dijelaskan tentang metode yang diusulkan untuk digunakan padapenentuan tingkat kesulitan soal berdasarkan kognitif bloom. Penjelasan meliputi pengaturan dan pemilihan nilai dari parameter-parameter dan arsitektur melalui uji coba. Gambar 4.1 menunjukan usulan model klasifikasi soal menggunakan jaringan syaraf tiruan dan feature selection.

Kumpulan Soal

TextProce ssing

Supervised Selesi Fitur

Term Weighting

Backward/forward selection

10-Fold Validation Evaluasi

Naïve Bayes berbasis kernel density estimation

Gambar 4.1. Model Klasifikasi yang Diusulkan 4.5. Evaluasi Pengukuran kinerja algoritma pada penelitian ini menggunakan confusion matrix (Tabel4.2). Confusion matrix dapat menghasilkan dua alat ukur yaitu accuracy dan kappa coefficient. Kappa adalah koefisien untuk mengevaluasi perbedaan antara 2 pendapat. Kappa dapat diaplikasikan untuk mengetahui kinerja dari algoritma [16] .

Tabel4.2 Confussion Matrix Actual Positive Actual Negative Predicted Positive Predicted Negative Accuracy 

Kappa 

Pc 

TP

FP

FN

TN

TP  TN TP  TN  FP  FN

Accuracy  Pc 1  Pc

(TP  FN )  (TP  FP)  ( FP  TN )  ( FN  TN ) NN

(12)

(13)

(14)

BAB V HASIL DAN PEMBAHASAN 5.1. Pendahuluan Bab ini memaparkan proses dan hasil eksperimen yang sudah dilakukan. Eksperimen dilakukan dengan menggunakan tool rapidminer. Eksperimen setting perlu dilakukan untuk menghasilkan akurasi yang paling tinggi untuk metode yang diusulkan. Eksperimen setting dapat dilakukan dengan kombinasi metode atau melakukan penentuan jumlah fitur pada proses seleksi fitur. Harapan dari proses seleksi fitur ini adalah semakin sedikit jumlah fitur yang digunakan, semakin rendah waktu komputasi dan semakin tinggi akurasi yang dicapai.

5.2. Eksperimen Menggunakan Rapidminer Eksperimen pada penelitian ini menggunakan Rapidminer versi 5.3.005. Eksperimen terdiri daribeberapa tahapan, sebagai berikut: 1. Penentuan dataset kalimat soal 2. Preprocessing kalimat soal: tokenization, stopword removal, dan stemming 3. Seleksi fitur menggunakan metode chi square atau metode information gain 4. Seleksi fitur menggunakan algoritma backward elimination atau algoritma forward selection 5. Training dataset menggunakan algoritma naïve bayes berbasis kernel density estimation 6. Testing dataset menggunakan algoritma naïve bayes berbasis kernel density estimation 7. Analisis tingkat akurasi metode yang diusulkan Seperti yang dijelaskan pada subbab sebelumnya, kombinasi metode fitur seleksi dilakukan untuk mendapatkan akurasi yang tinggi. Kombinasi metode yang dilakukan pada penelitian ini, yaitu: 1. Klasifikasi kalimat soal menggunakan fitur seleksi chi square (CHI) 2. Klasifikasi kalimat soal menggunakan fitur seleksi chi square dan backward elimination (CHI+BFS) 3. Klasifikasi kalimat soal menggunakan fitur seleksi chi square dan forward selection (CHI+FFS) 4. Klasifikasi kalimat soal menggunakan fitur seleksi information gain (IG)

5. Klasifikasi kalimat soal menggunakan fitur seleksi information gain dan backward elimination (IG+BFS) 6. Klasifikasi kalimat soal menggunakan fitur seleksi information gain dan forward selection (IG+FFS)

Gambar 5.1 Urutan Proses Klasifikasi di Rapidminer

Langkah klasifikasi kalimat soal yang diusulkan seperti yang terlihat pada Gambar 5.1. Proses klasifikasi diawali dengan penentuan dataset yang disimpan di dalam file excel. Dalam penentuan dataset oleh rapidminer, beberapa setingan perlu dilakukan, yaitu mengubah tipe data soal menjadi text dan tipe atribut menjadi label Gambar 5.2. Untuk text preprocessing ditunjukan pada Gambar 5.3 yaitu menggunakan tokenization, stopword removal, dan stemming.

Gambar 5.2 Penentuan Label Dataset di Rapidminer

Gambar 5.3 Urutan Preprocessing Dokumen di Rapidminer

Gambar 5.4. Proses Validasi Menggunakan 10-Fold Validation

10-fold validation dilakukan untuk meningkatkan akurasi klasifikasi. Proses 10-fold validation ditunjukan pada Gambar 5.4. Proses ini diletakan didalam proses optimize selection. Dengan menggunakan 10-fold validation, dataset tidak perlu dibagi menjadi dataset training dan testing secara manual. Pada penelitian ini menggunakan 10-fold validation yang artinya 90% dataset dijadikan data training dan 10% dataset dijadikan data testing. Didalam proses 10-fold validation terdapat proses klasifikasi yang dilakukan oleh algoritma naïve bayes berbasis kernel density estimation seperti yang terlihat pada Gambar 5.4.

5.3. Hasil dan Pembahasan Percobaan pertama dilakukan untuk mengetahui tingkat akurasi dari penggunaan fitur seleksi berbasis wrapper. Pada tabel 5.1 menunjukan bahwa penggunaan forward selection mampu meningkatkan akurasi algoritma Naïve Bayes hingga mencapai 67.18%. Algoritma forward selection mampu mengahasilkan tingkat akurasi dan waktu komputasi yang lebih baik dibandingkan dengan algoritma backward elimination yang hanya mencapai tingkat akurasi sebesar 60.9%. Tabel 5.2 dan tabel 5.3 menunjukan confusion matrix dari penggunaan forward selection dan backward selection pada algoritma Naïve Bayes.

Gambar 5.5 Proses Training dan Testing di rapidminer

Tabel 5.1 Kinerja dari Naïve Bayes dengan forward selection dan backward elimination Accuracy

Kappa

ComputationTime (seconds)

NBK

57.61% +/- 5.64% 0.484 +/- 0.072

2

NBK+FS

67.18% +/- 7.77% 0.598 +/- 0.096

748.8

NBK+BE 60.9% +/- 11.95% 0.525 +/- 0.142

1861.8

Tabel 5.2 Confussion MatrixForward Selection true Knowledge

true Comprehen

true Application

true Analysis

sion

Pred Knowledge pred. Comprehension

22 0

true Synthe

true Evalua

sis

tion

class precision

0

0

1

2

1

84.62%

30

0

3

3

1

81.08%

Pred Application

0

1

26

0

0

0

96.30%

Pred Analysis

0

2

1

27

4

4

71.05%

Pred Synthesis

6

11

12

17

49

18

43.36%

Pred Evaluation

0

0

2

0

1

30

90.91%

78.57%

68.18%

63.41%

56.25%

83.05%

55.56%

class recall

Tabel 5.3 Confussion MatrixBackward Elimination true Knowledge

true Comprehension

true Application

true Analysi s

true Synthesis

true Evaluation

class precision

pred. Knowled ge

18

2

3

1

0

1

72.00%

pred. Compreh ension

4

27

4

6

4

2

57.45%

pred. Applicati on

1

1

20

6

1

4

60.61%

pred. Analysis

2

8

3

25

4

7

51.02%

pred. Synthesis

1

6

5

3

40

3

68.97%

pred. Evaluati on

2

0

6

7

10

37

59.68%

64.29%

61.36%

48.78%

52.08%

67.80%

68.52%

class recall

Pada percobaan kedua, fitur seleksi berbasis filter digunakan untuk mengurangi waktu komputasi fitur seleksi berbasis wrapper. Tujuan penggunaan 2 seleksi fitur ini untuk lebih meningkatkan kinerja algoritma Naïve Bayes.Gambar 5.6 dan Gambar 5.7 menunjukan grafik tingkat akurasi dan kappa untuk mengetahui metode yang paling optimal. Detail angka dari grafik tersebut dipaparkan pada Tabel 5.4 untuk hasil akurasi dan Tabel 5.5 untuk hasil kappa. Dengan melihat hasil perbandingan tersebut, hasil yang paling optimal dicapai oleh kombinasi seleksi fitur antara chi square dan backward fitur seleksi dengan tingkat akurasi sebesar 77,8% dan kappa sebesar 0,729. Kombinasi fitur seleksi tersebut juga menunjukan tingkat akurasi yang stabil untuk perubahan jumlah fitur yang digunakan. 100,00 90,00 Accuracy

80,00 70,00 60,00 50,00 40,00 10

20

30

40

50

60

70

80

90

% Term CHI

CHI+FFS

CHI+BFS

IG

IG+FFS

IG+BFS

Gambar 5.6 Akurasi 1 0,9

Kappa

0,8 0,7 0,6 0,5 0,4 10

20

30

40

50

60

70

80

90

% Term CHI

CHI+FFS

CHI+BFS

IG

Gambar 5.7 Kappa

IG+FFS

IG+BFS

Tabel 5.4 Akurasi Naïve Bayes dengan beberapa seleksi fitur % Term

CHI

CHI+FFS

10 73,28

67,61

75,17 73,69

65,71

75,94

20 76,59

72,29

77,8 73,33

73,32

76,28

30 71,49

62,87

74,48 68,28

68,69

72,59

40

71,1

71,26

74,5 60,19

73,43

63,8

50 71,48

66,43

73,07 60,52

68,32

63,1

60 67,46

74,11

70,09 60,86

70,11

63,48

70

66,4

54,42

70,11 57,92

72,63

59,81

80 64,58

70,79

66,79 56,85

67,12

59,22

90

72,69

68,23 56,11

67,53

58,78

65,3

CHI+BFS

IG

IG+FFS IG+BFS

Tabel 5.5 KappaNaïve Bayes dengan beberapa seleksi fitur % Term CHI

CHI+FFS

CHI+BFS IG

IG+FFS IG+BFS

10 0,674

0,602

0,697 0.681

0,577

0,705

20 0,714

0,659

0,729 0.676

0,674

0,712

30 0,653

0,541

0,688 0.614

0,615

0,666

40 0,647

0,646

0,688 0.515

0,675

0,559

50 0,652

0,585

0,671 0.518

0,61

0,55

60 0,602

0,683

0,633 0.522

0,632

0,554

70

0,59

0,434

0,634 0.486

0,664

0,51

80 0,568

0,641

0,594 0.473

0,595

0,502

90 0,577

0,665

0,613 0.464

0,602

0,498

Tabel 5.6 Waktu Komputasi Naïve Bayes dengan beberapa seleksi fitur % Term CHI CHI+FFS CHI+BFS IG IG+FFS IG+BFS 10

1

64,2

15

1

56

29

20

1

206,4

60,6

1

210

78,6

30

1

236,2

137,4

1

262,2

136,2

40

1

784

488,4

2

486,6

275,4

50

1

618,6

499,2

2

454,8

496,8

60

1

1690

1095,6

2

672

622,2

70

1

1943,6

1506,6

2

1200,4

1115,4

80

1

1926,6

1654,2

2

1850,8

1510,2

90

1

2026,6

1789,2

2

1969,6

1652,4

BAB VI KESIMPULAN DAN SARAN

6.1. Pendahuluan Proses otomatisasi penentuan kalimat soal berdasarkan level kognitif taksonomi bloom diperlukan untuk mempermudah dosen atau guru dalam mendesain atau menyusun soal ujian. Proses otomatisasi tersebut juga diperlukan untuk suatu lembaga pendidikan yang memiliki bank soal yang cukup besar.Level kognitif taksonomi bloom terdiri dari 6 tipe, yaitu knowledge, comprehension, application, analysis, synthesis, dan evaluation. Beberapa penelitian dilakukan untuk proses otomatisasi tersebut yaitu dilakukan dengan menggunakan algoritma machine learning.Penelitian ini mengusulkan algoritma naïve bayes berbasis kernel density estimation dengan menambahkan proses seleksi fitur. Seleksi fitur mampu menurunkan waktu komputasi dan meningkatkan akurasi dari proses penentuan soal berbasis taksonomi bloom. Pada BAB ini, pencapaian hasil penelitian dan saran untuk penelitian di masa mendatang akan dipaparkan.

6.2. Kesimpulan Dengan dataset yang sama dengan penelitian sebelumnya, usulan metode pada penelitian ini mampu meningkatkan hasil akurasi proses klasifiasi soal berbasis taksnomi bloom.Dari eksperimen yang dilakukan, gabungan fitur seleksi antara chi square dan backward elimination mampu mendapatkan tingkat akurasi yang paling baik yaitu sebesar 77,8%. Hasil akurasi ini lebih baik dari hasil penelitian sebelumnya yang hanya mencapai 65,26%.

6.3. Saran Penelitian di Masa Mendatang Menjadi tantangan kami untuk mengimplentasikan metode yang kami usulkan ini menjadi perangkat lunak yang dapat digunakan oleh para dosen atau guru dalam meendesain soal. Untuk penelitian mendatang, kalimat soal berbahasa Indonesia dapat digunakan sebagai dataset sehingga kedepannya perangkat lunak tersebut dapat diimplementasikan. Pemilihan algortima klasifikasi yang lain perlu dilakukan sehingga tingkat akurasi penentual kalimat soal berdasar taksonomi bloom ini dapat ditingkatkan.

DAFTAR PUSTAKA

[1]

Chai Jing Hui, “Feature Reduction for Neural Network in Determining the Bloom’s Cognitif Level of Questions Items, “ 2009

[2]

Anwar Ali Yahya and Addin Osman, "Automatic Classification of Questions into Bloom's Cognitive Levels using Support Vector Machines," in The International Arab Conference on Information Technology, 2011.

[3]. M. Janaki Meena and K. R. Chandran, “Naïve Bayes Text Calssification with Positive Features Selected by Statistical Methode “ in First International Conference on Advanced Computing, 2009 [4]. Norazah Yusof and Chai Jing Hui, “Determination of Bloom’s Cognitive Level of Question Items using Artificial Neural Network“, in International Conference on Intelligent System Design and Applications, 2010, pp.866-870. [5]. Kuruvilla Mathew and Biju Issac, “Intelligent Spam Classification for Mobile Text Message, “

in International Conference on Computer Science and Network

Technology, 2011 [6]

Jingli Lu, Ying Yang and Geoffrey I. Webb, “Incremental Discretization for Naive Bayes Classifier, “ Advance Data Mining and Applications, Vol 4093, pp.223-238, 2006

[7]

Y. Yang and J. O. Pedersen, "A Comparative Study on Feature Selection in Text Categorization," in Proc. 14th Int’l Conf. Machine Learning, 1997

[8]

Asep Khoerudin, "Analisis Tingkat Kesukaan Konsumen dengan Metode Bayesian Network

(Studi

Kasus

Produk

Biskuit),"

Bogor

Agricultural

University,

Undergraduate Thesis 2011 [9]. David Andrich, “Framework Relating Outcomes Based Education and The Taxonomy of Educational Objectivies “, Studies in Educational Evaluation, vol.28, pp.35-59, 2002 [10] L.S. Wang, "Relevance Weighting of Multi-Term Queries for Vector Space Model," in Proc. of Computational Intelligence and Data Mining, 2009, pp. 0-6.

[11] S. G, D. G, and S. Kp, R. Peter, "Evaluation of SVD and NMF Methods for Latent Semantic Analysis," International Journal of Recent Trends in Engineering, vol. 1, pp. 308-310, 2009. [12] Amir Navot, "On the Role of Feature Selection in Machine Learning," Doctor of Philosophy Thesis 2006. [13] P. Koncz and J. Paralic, "An approach to feature selection for sentiment analysis," in International Conference on Intelligent Engineering Systems, 2011, pp. 357–362. [14] James N.K. Liu, Yu-Lin He, Xi-Zhao Wang, and Yan-Xing Hu, "A Comparative Study among Different Kernel Functions in Flexible Naïve Bayesian Classification ," in International Conference on Machine Learning and Cybernetics, 2011. [15] H. Drucker, D. Wu, and V. N. Vapnik, "Support vector machines for spam categorization," IEEE transactions on neural networks, vol. 10, no. 5, 1999. [16] R. Parimala and Dr. R. Nallaswamy, “A Study of Spam E-mail Classification Using Feature Selection Package “, Global Journal of Computer Science and Technology, vol.11, no.7, 2011.

LAMPIRAN I RIWAYAT HIDUP PERSONALIA PENELITI KETUA PENELITI : a. Nama Lengkap

:

Bowo Nurhadiyono, S.Si., M.Kom

b. Tempat / Tanggal Lahir

:

Tegal / 21 Desember 1967

c. Jenis Kelamin

:

Laki-Laki

d. NPP

:

0686.11.1996.102

e. Disiplin Ilmu

:

Teknik Informatika

f. Pangkat/Golongan

:

Penata Tk I / III D

g. Jabatan

:

Lektor

h. Fakultas / Program Studi

:

Fakultas Ilmu Komputer/Teknik Informatika

i. Riwayat Pendidikan -

S1 FMIPA Universitas Padjadjaran Bandung (Lulus 1996)

-

S2 Teknik Informatika Universitas Dian Nuswantoro (Lulus 2004)

j. Publikasi Ilmiah

:

1. Implementasi Bilangan Bulat pada Algoritma RSA Secara Aljabar, Jurnal Teknologi Informasi, Techno.COM, Vol 9, N0 1, Pebruari 2010, ISSN : 1412-2693 2. Implementasi Matriks Pada Matematika Bisnis dan Ekonomi, Jurnal Teknologi Informasi. Techno.COM, Vol II No 2, Mei 2012, ISSN : 1412-2693 3. Penerapan Integrasi Numerik Menggunakan Metode Segi Empat (Rectangle Rule) untuk Menghitung Luas Daerah Tidak Beraturan, Jurnal Teknologi Informasi. Techno.COM, Vol 11 No 3, November 2012, ISSN : 1412-2693

Semarang, 25 Mei 2013 Ketua Peneliti,

Bowo Nurhadiyono, S.Si.,M.Kom NPP: 0686.11.1996.102

ANGGOTA PENELITI 1 a. Nama Lengkap

:

T Sutojo, S.Si., M.Kom


:

Surabaya / 27 Juli 1968

c. Jenis Kelamin

:

Laki-Laki

d. NPP

:

0686.11.1996.094

e. Disiplin Ilmu

:

Teknik Informatika

f. Pangkat/Golongan

:

Penata / III C

g. Jabatan

:

Lektor


:


i. Riwayat Pendidikan 1. Jurusan Fisika FMIPA Universitas Airlangga Surabaya, 1995 2. Teknik Informatika Universitas Dian Nuswantoro Semarang, 2006 j. Publikasi Ilmiah

:

1. Aplikasi Algoritma Pewarnaan Graf Welch Powell untuk Optimasi Penjadwalan pada Ujian Semester Fakultas Ilmu Komputer Universitas Dian Nuswantoro Semarang, 2011 2. Perbandingan Sensitifitas Filter Deteksi Tepi Sobel dengan Filter Deteksi Tepi Prewittt Untuk Citra yang mengandung Noise Gaussian, "Techno-Com", vol. 10, no. 1, edisi Mei 2009, ISSN 1412-2693

Semarang, 25 Mei 2013 Anggota Peneliti 1,

T Sutojo, S.Si., M.Kom NPP : 0686.11.1996.094

ANGGOTA PENELITI 2 a. Nama Lengkap

:

Catur Supriyanto, S.Kom., M.Cs


:

Semarang / 21 Oktober 1984

c. Jenis Kelamin

:

Laki-Laki

d. NPP

:

0686.11.2011.415

e. Disiplin Ilmu

:

Machine Learning, Information Retrieval

f. Pangkat/Golongan

:

Penata / III B

g. Jabatan

:

Asisten Ahli


:


i. Riwayat Pendidikan -

S1 Universitas Dian Nuswantoro Semarang (Lulus 2009)

-

S2 University Teknikal Malaysia, Malaka (Lulus 2011)

j. Publikasi Ilmiah -

:

A Comparison of Rabin Karp and Semantic-Based Plagiarism Detection, Tahun 2012, ICSIIT, ISBN/ISSN 978-602-97124-1-4

-

Integrating Feature-Based Document Summarization as Feature Reduction in Document Clustering, Tahun 2012, CITEE, ISBN/ISSN 2088-6578

-

Performance Enhancement of Image Clustering using Singular Value Decomposition in Color Histogram Content-Based Image Retrieval, IJCCE Vol. 1 No. 4 Tahun 2012, ISBN/ISSN 2010-3743

Semarang, 25 Mei 2013 Anggota Peneliti 2,

Catur Supriyanto, S.Kom., M.Cs NPP: 0686.11.2011.415

LAMPIRAN II DATASET PENELITIAN

No

Kalimat Soal

Kategori

1

Define the meaning of the Olympic Motto.

Knowledge

2

Identify the correct definition of osmosis.

Knowledge

3

Identify the standard peripheral components of a computer.

Knowledge

4

List the characteristics peculiar to the Cubist movement.

Knowledge

5

List the levels in Bloom's Taxonomy.

Knowledge

6

List the steps involved in titration.

Knowledge

7

List three characteristics that are unique to the Cubist movement.

Knowledge

8

Make a list of facts you learned from the story.

Knowledge

Name the food groups and at least two items of food in each 9

group.

Knowledge

10

Name the four major food groups.

Knowledge

11

Name the main characters in the story.

Knowledge

12

Quote what ICT stands for.

Knowledge

13

Recite a policy.

Knowledge

14

State the pattern for expressing time in French.

Knowledge

15

State the rule for balls and strikes in baseball.

Knowledge

16

State the rule for using a semicolon in a sentence.

Knowledge

17

Write the equation for the Ideal Gas Law.

Knowledge

18

Define compound interest.

Knowledge

19

Define mercantilism.

Knowledge

20

Define stream bank, floodplain and substrate.

Knowledge

21

Draw and label a diagram of a typical stream.

Knowledge

22

Identify the five major prophets of the Old Testament.

Knowledge

23

Identify the part of a eukaryotic cell.

Knowledge

24

Label any Olympic sporting apparatus with design features.

Knowledge

25

Name the artist who painted the Mona Lisa.

Knowledge

26

Recite the poem "Auto Wreck".

Knowledge

No

Kalimat Soal

Kategori

27

Recite the principles of the Data Protection Act.

Knowledge

28

State the definition of application in Bloom's Taxonomy.

Knowledge

29

Compare Calliope with Howie. Use the word bank.

Comprehension

30

Describe how Phillip and Timothy survived on the Cay

Comprehension

31

Describe in your own words how to copy text from one program Comprehension into another

32

Describe in your own words what happens when a stream's Comprehension velocity slows

33

Describe in your own words what is meant by a sprained ankle

Comprehension

34

Describe nuclear transport to a lay person

Comprehension

35

Describe what took place as the Hato was sinking

Comprehension

36

Explain how Timothy saved Phillip's life

Comprehension

37

Explain in your own words what a recessive gene is

Comprehension

Explain in your own words what do you mean by the term 38

economics?

Comprehension

39

Explain in your own words what is meant by mercantilism.

Comprehension

40

Explain what a poem means.

Comprehension

41

Illustrate this caption: "Olympic as a Media Event in the Comprehension Information Society"

42

Illustrate what you think the main idea was.

Comprehension

43

In one sentence explain the main idea of a written passage.

Comprehension

44

In one sentence give the point of a written passage.

Comprehension

45

Interpret the pictures.

Comprehension

46

Outline the most important insight of the tale.

Comprehension

47

Paraphrase what Hamlet is saying in his soliloquy.

Comprehension

48

Restate the Olympic motto in your own words.

Comprehension

49

Retell the story in your words.

Comprehension

50

Rewrite the principles of test writing.

Comprehension

51

State in your own words the rule for balls and strikes in baseball.

Comprehension

52

Summarize a story in own words.

Comprehension

No

Kalimat Soal

Kategori

53

Summarize the basic tenets of collaborative conservation.

Comprehension

54

Summarize the basic tenets of deconstructionism.

Comprehension

55

Summarize this magazine article.

Comprehension

56

Tell in your own words the beginning of the book.

Comprehension

57

Translate a written text aloud from L2 to English.

Comprehension

58

Translate an equation into a computer spreadsheet.

Comprehension

59

Translate an equation into a spreadsheet formula.

Comprehension

60

Translate the following passage from The Iliad into English

Comprehension

61

Describe in prose what is shown in graph form.

Comprehension

Describe in your own words how to borrow a book from the 62

library.

Comprehension

63

Explain in one's own words how to create a query in a database.

Comprehension

64

Express your opinion of 'Drugs in Sport' through poetry.

Comprehension

65

From a blueprint describe the article depicted.

Comprehension

66

Given a graph of production trends in automobiles, describe what Comprehension the graph represents in a memo to your boss.

67

Outline in your own words how the Leggo's Tomato Paste Comprehension advertisement sells their product.

68

Restate main idea of story in own words.

Comprehension

69

State in your own words where in the library to find previous Comprehension issues of journals that are no longer in the display racks.

70

Tell how Phillip kept himself alive after Timothy died.

Comprehension

71

Tell in your own words how the setting of the story made it more Comprehension interesting.

72

Tell what is meant by the definition of an isosceles triangle.

Comprehension

73

Apply shading to produce depth in drawing

Application

74

Apply the storytelling technique here to a little story of your own

Application

75

Apply your understanding the Olympic spirit to develop a new Application motto or slogan

76

Calculate the deflection of a beam under uniform loading

Application

No

Kalimat Soal

Kategori

77

Calculate the rate of habitat fragmentation within the Colorado Application Front Range in the last decade

78

Categorise the pictures and add them to the wall display

Application

79

Choose a country that does not compete at the Olympics and Application explain why that country is not an Olympic member

80

Classify celebrations into family and community categories

Application

81

Compute the area of actual circles

Application

82

Demonstrate using the OPAC and your knowledge of library Application organization to find a book about turtles

83

Design a market strategy for your product using a known strategy Application as a model.

84

Dramatize being their mothers.

Application

85

Dramatize some of the problems a homeowner might encounter Application by building in a floodplain. Draw 3 pictures showing the beginning, middle and ending of the

86

story.

Application

87

Draw a picture of the gown Cinderella wore to the ball.

Application

88

In a teaching simulation with your peers role-playing 6th grade Application students, demonstrate the principle of reinforcement in classroom interactions and prepare a half page description of what happened during the simulation that validated the principle.

89

Make a diorama to illustrate an important event.

Application

90

Make a paper-mache map to include relevant information about Application an event.

91

Make a scrapbook about the areas of study.

Application

92

Make up a puzzle game using the ideas from the study area.

Application

93

Model an Olympic Village for the new Millenium.

Application

94

Predict what happens to X if Y increases.

Application

95

Pretend you are one of the characters in the book. Write a diary

Application

about the happenings in your life for two consecutive days.

No

Kalimat Soal

Kategori

96

Solve for the ten following fraction multiplication problems. Application Please make sure to show all your work. Take a collection of photographs to demonstrate a particular

97

point.

Application

98

Use a manual to calculate an employee's vacation time.

Application

99

Use the distance travelled and cost of petrol to calculate the cost Application of a coach trip using a spreadsheet.

100 Use the rule for a semicolon in a sentence.

Application

101 Apply laws of statistics to evaluate the reliability of a written test. Application 102 Choose any U.S. president and explain how he exercised his Application power as Commander in Chief of the Armed Forces. 103 Classify frogs toads and other amphibians.

Application

104 Classify animals into two groups.

Application

105 Construct a model to demonstrate how it will work.

Application

106 Demonstrate how this could work in an industry setting?

Application

107 Derive a kinetic model from experimental data.

Application

108 Describe an experiment to answer the question of the effects of Application weight on the fall of an object. 109 Draw a picture of the bears' house.

Application

110 Have you ever met a person who had the courage Timothy had in Application the story? Explain. 111 Relate the principle of reinforcement to classroom interactions.

Application

112 Use a costing model to vary prices on goods to maximise profits Application and minimise costs. 113 Who can use what we know about sonnets and finish this poem?

Application

114 Analyse safe and dangerous aspects of these features

Analysis

115 Analyse the characteristics of frogs

Analysis

116 Analyse the movements and sounds of a frog

Analysis

117 Break down the main actions of the story

Analysis

No

Kalimat Soal

Kategori

118 Categorise all Olympic sports in order of difficulty. Describe the Analysis reasons behind your selection 119 Compare and contrast animals that the class has made

Analysis

120 Compare and contrast our school to other communities

Analysis

121 Compare and contrast two characters in the book

Analysis

122 Compare how different children come to school

Analysis

123 Compare the place where the story happened with where you live

Analysis

124 Compare this book to the last book you read

Analysis

125 Compare three celebrations using a Venn diagram

Analysis

126 Compare two of the characters in this book.

Analysis

127 Conduct an investigation to produce information to support a

Analysis

view. 128 Construct a graph to illustrate selected information.

Analysis

Contrast building in the coastal zone with building in a river 129 floodplain.

Analysis

130 Contrast Olympic athletes of today with athletes of past Olympic Analysis games. 131 Decide which parts of the book include the five W's (who, what, when, where, why) and the H (how). Then write a good paragraph for a newspaper article including these facts.

Analysis

132 Explain the patrilocal society in terms of lineage and dominance Analysis of the sexes. 133

Explain the term conjugal families, by making reference to the Analysis different types of societies to which they could belong.

134

From the argument given below, analyze the positive and Analysis negative points presented concerning the abolition of guns and write a brief (2-3 page) narrative of your analysis.

135

From the short presidential debate transcribed below:

Analysis

Differentiate the passages that attacked a political opponent personally, and those that attacked an opponent's political programs. 136

How does your pet differ from other animals?

Analysis

No

Kalimat Soal

Kategori

137

How was life different in your town 100 years ago?

Analysis

138

How would you describe Timothy?

Analysis

139

How would you distinguish between polymyositis and viral Analysis myositis in a 42-year-old man with weakness and a rash?

140

If your story happened in a foreign land, compare that land to the Analysis United States.

141

In a good paragraph, state the main idea of the book.

142 Investigate innovations that can enhance future Olympics.

Analysis Analysis

143 Point Out a sport that should be included in the Olympics and Analysis give reasons for its inclusion. 144 Recognise reliable sources of information.

Analysis

145 With your group investigate three different kinds of celebrations Analysis eg religious, cultural, national, community or family. 146 Break down the components of a standard film camera and Analysis explain how they interact to make the machine work. 147 By comparing the map of the tectonic plates to the earthquake Analysis map, what inferences can you make? 148 Can you find four different feelings Pa Lia had during the story?

Analysis

149 Compare herbatious and carnivorous animals on a Venn diagram.

Analysis

150 Compare two dog food commercials. What is the difference Analysis between them and how do they both sell their products? 151 Distinguish between micro and macro economics.

Analysis

152 Examine what helps to make a good Olympics? Think about Analysis money, schedules, sports and people. 153 How would this story be different if it had happened in a different Analysis country? 154 If your story occurred long ago, compare that time with today in a good paragraph. If it was a modern story, compare it with a long time ago and tell what would be different.

Analysis

No

Kalimat Soal

Kategori

155 Infer what would happen if the media were banned from Olympic Analysis sport. Look at the words in the word bank that describe people. Write 156 the words that describe Pa Lia, Calliope, and Howie in the correct Analysis column 157 Make a diagnosis or analyze a case study.

Analysis

Pick one of the main characters. Think of a shape that fits that 158 person's traits. Draw the shape. Then describe the character inside the shape. 159 Recognize logical fallacies in reasoning.

Analysis Analysis

Select the athlete of the century and analyse why you chose this Analysis person. 161 Using the previous example, if interest were compounded Analysis monthly instead of daily, what would the difference in interest be? 162 Can you invent another character for the story? Synthesis 160

163 Choose a character. Rewrite a scene from the story from this character's point of view. 164 Combine any two sports to develop a new Olympic sport

Synthesis

165 Combine elements of drama, music, and dance into a stage presentation. 166 Compose a simple rap or rhyme about zoo animals.

Synthesis

167 Compose music for a frog play

Synthesis

168 Construct a device that would assist an athlete in their training.

Synthesis

169 Construct an original work which incorporates five common materials in sculpture. Create a new product. Give it a name and plan a marketing 170 campaign. Create a storyboard for a sequel to your book. Use the same 171 characters 172 Create and perform a play about frogs

Synthesis

173 Create plan of local environment by drawing around boxes

Synthesis

Synthesis

Synthesis

Synthesis Synthesis Synthesis

174 Create a chart that compares things that use electricity and things Synthesis that do NOT use electricity 175 Design a new animal to live in the jungle.

Synthesis

176 Design a poster for this book

Synthesis

No

Kalimat Soal

Kategori

177 Design and make an animal that moves.

Synthesis

178 Design costumes for the characters.

Synthesis

179 Develop a plan for a new Olympic Bid System.

Synthesis

180 Develop a way to teach the concept of "adjectives".

Synthesis

181 Explain how the biological concept of symbiotic relationships Synthesis could be used to help solve socially created problems like water pollution, overflowing garbage landfills, or homelessness. 182 Explain why it is likely that a matriarchal family system would Synthesis be found in a matrilocal or matrilineal society. 183 Given a set of data derive an hypothesis to explain them.

Synthesis

Given two opposing theories design an experiment to compare 184 them.

Synthesis

185 How could we determine the number of pennies in a jar without Synthesis counting them? 186 How would you change the story to create a different ending?

Synthesis

187 How would you restructure the school day to reflect children's Synthesis developmental needs? 188 Identify one problem in the book and give an alternate solution Synthesis one not given by the author. 189 Integrate training from several sources to solve a problem.

Synthesis

190 List the events of the story in sequence.

Synthesis

191 Make a radio announcement that advertise the book. Write it out.

Synthesis

192 Name one character. Rewrite the story from this character's point Synthesis of view. 193 Organize this book into three or more sections and give your own Synthesis subtitle for each section. 194 Prepare a book jacket that illustrates the kind of book as well as Synthesis the story. 195 Pretend you are a librarian recommending this book to someone. Synthesis Write a paragraph telling what you would say.

No

Kalimat Soal

196 Revise and process to improve the outcome.

Kategori Synthesis

197 Revise how to complete a complex task in order to improve the Synthesis outcome. 198 Suppose Phillip wasn't rescued shortly after Timothy's death. Synthesis How long could he have survived? 199 Using information from the book about one of the main Synthesis characters, rewrite the ending of the book. 200 Write a letter to the editor on a social issue of concern to you.

Synthesis

201 Write a logically organized argument in favor of a given position.

Synthesis

202 Write a poem about this book.

Synthesis

203 Write a set of rules to prevent what happened in the story.

Synthesis

204 Write a short story relating a personal experience in the style of a Synthesis picaresque novel. 205 Write a song about 'Old MacDonald' who had a bulldozer instead Synthesis of a farm. Write an essay in not more than 250 words about India and 206 Technological Advancement. Use active voice as much as Synthesis possible. 207 Compose a class story.

Synthesis

208 Compose a rhythm or put new words to a known melody.

Synthesis

Create a new song for the opening line of "Mary had a little 209 lamb".

Synthesis

210 Describe what it must have been like to be blind and all alone on Synthesis the Cay. 211 Design a machine to perform a specific task.

Synthesis

212 Develop a hypothesis.

Synthesis

213 Develop one plausible ending for all three short stories below.

Synthesis

214 Devise plans to market or make artwork more valuable.

Synthesis

215 Draw a painting that uses various principles of perspective to Synthesis achieve its effect. 216 How could you re-write this story with a city setting?

Synthesis

No

Kalimat Soal

Kategori

How would the U.S.A. be different if the South had won the Civil 217 War?

Synthesis

218 Invent a machine to do a specific task.

Synthesis

219 Make up a new language code and write material using it.

Synthesis

220 Revises and process to improve the outcome.

Synthesis

After examining the videotape of a play in a football game, 221 determine the degree to which the defensive team performed Evaluation effectively and suggest ways in which it could have responded more effectively 222 Appraise data in support of a hypothesis

Evaluation

223 Appraise the speech's effectiveness based upon the class' criteria.

Evaluation

Assess the appropriateness of an author's conclusions based on Evaluation 224 the evidence given 225 Assess

the

relative

effectiveness

of

different

graphical Evaluation

representations of the same data or biological concept 226 Assess the strengths and weaknesses of the current Olympics and Evaluation recommend action that should be taken in future Olympics: What can be improved, reformed or rejected? 227 Award the contract to the best proposal. Rank the principles of Evaluation "good sportsmanship" in order of importance to you 228 Can you defend the idea that Simon's incident with the pig's head Evaluation is the most mystical in the story? 229 Choose and illustrate the two most important events in the story.

Evaluation

230 Critique the other student's (or your own) speech, based on the Evaluation criteria we have studied this semester. 231 Decide whether you could have survived on the island blind and Evaluation alone. Write about things that would have been challenging making sure to judge the difficulty level for yourself. Decide whether you learned enough about electricity from this 232 book.

Evaluation

233 Decide which candidate would best fill the position of principal.

Evaluation

No

Kalimat Soal

Kategori

Describe the economic consequence of a neolocal society. 234 Support your description with information you have learned from this course.

Evaluation

235 Design a healthy menu that you think most people would enjoy Evaluation using the healthy eating guide. 236 Evaluate a work of art, giving the reasons for your evaluation.

Evaluation

237 Evaluate appropriate and inappropriate actions of characters.

Evaluation

238 Evaluate two Internet sources of information about the Egyptians. Evaluation Which would be a better choice for your purpose and why? 239 Evaluate whether their model is a true representation of the local Evaluation environment. 240 Evaluate your own or a peer's essay in terms of the principles of Evaluation composition discussed during the semester. Explain choices made in making recommendations to an end 241 user.

Evaluation

242 Given an argument on any position, enumerate the logical Evaluation fallacies in that argument. 243 Given the data available on a research question, take a position Evaluation and defend it. 244 Given the data we've looked at on this topic, evaluate how Evaluation appropriate this conclusion is and defend your answer. 245 In a given clinical situation, select the most reasonable Evaluation intervention and predict the main effects and possible side effects. 246 Judge aesthetic qualities and relationship to future values. Evaluation 247 Justify and nominate ways to prevent animal extinction.

Evaluation

248 Predict what will happen next in.

Evaluation

Recommend how our classroom or playground could be 249 improved.

Evaluation

250 Select the best proposal for a proposed water treatment plant.

Evaluation

251 Solve terrorism. What strategies should be put in place at future

Evaluation

Olympic Games.

No

Kalimat Soal

Kategori

252 Tell about the most exciting part of the book. being sure to give Evaluation at least three reasons why? 253 Two pieces of sculpture from different eras and artists are Evaluation displayed. Study thse two pieces, use the compare-contrast method to determine which piece you prefer and write a 2-3 page report that describes your thinking process as you studied these pieces. Utilize the skills you have learned as we have studied various pieces of sculpture ver the past two weeks. 254 Using the basic principles of socialism discussed in this course, Evaluation evaluate the US economic system by providing key arguments to support your judgment. 255 Was Hemingway a great American writer? First you will need to Evaluation define greatness. 256 Would you have liked to have had Cinderella for a sister? Evaluation Explain why or why not. 257 Write a list of criteria to judge the Willy raps.

Evaluation

258 Write a review for the story and specify the type of audience that Evaluation would enjoy this book. 259 After designing an experiment, examining the results, and Evaluation drawing conclusions, determine in what ways the experiment could be conducted more effectively in order to draw more productive conclusions in the future. Construct a poster that will advertise your new food product in an Evaluation 260 exciting and irresistible way. 261 Critique an experimental design or a research proposal.

Evaluation

Decide whether you are in favor of building on a floodplain; Evaluation 262 defend your position in a debate. Establish criteria for making this choice and defend your final 263 selection.

Evaluation

264 Evaluate board games and justify why rules are important.

Evaluation

No

Kalimat Soal

Kategori

265 Examine the stated positions of both major political candidates Evaluation with regard to a particular issue and state good reasons (based on principles discussed in class) for why one candidates position is more likely to be effective than the other's. 266 Explain and justify a new budget.

Evaluation

267 Judge whether it would be possible to survive on an island alone Evaluation and blinded. Write about it. 268 Judge whether Olympic Ideals are realistic or unrealistic for the Evaluation contemporary elite athlete. 269 Listen to two classmates conversing on tape and critique their Evaluation performance on the basis of the skills covered this semester. 270 Predict whether Phillip will ever go back to visit the Cay after Evaluation several years of recovery. Assuming he gets his sight back. 271 Select the most effective solution.

Evaluation

272 Use a judge and jury to discuss the statement 'Children enjoy Evaluation Anthony Browne's books because of the illustrations'. 273 What criteria would you use to assess the validity of a business Evaluation contract? 274 What is part of this book did you like best. Tell why you like it?

Evaluation

LAMPIRAN III DRAFT ARTIKEL ILMIAH

Wrapper based Feature Selection for Naive Bayes with Kernel Density Estimationon Question Classification According to Bloom’s Cognitive Level Catur Supriyanto,Bowo Nurhadiono

Sukardi

Faculty of Computer Science Dian Nuswantoro University Semarang, Indonesia [email protected], [email protected]

Program of Informatics Engineering AdhiGunaInstitute of Informatics and Computing Palu, Indonesia [email protected]

Abstract—Thisstudyinvestigateswrapper based feature selection to improve Naïve Bayes withKernel Density Estimation. The performance of the wrapper based feature selectionwas evaluated on question classification according tocognitive levels of Bloom’s taxonomy. Compare to another classifier, Naïve Bayes has good accuracy and speed for large training dataset, but poor for small training dataset. The best features of dataset need to be selected to improve the accuracy of Naïve Bayes. This paper used forward selection and backward elimination as wrapper based feature selection methodto improve the accuracy of Naïve Bayes.The result showed that wrapper based feature selection improved the performance of the Naïve Bayes withKernel Density Estimation. Forward selection outperforms backward elimination with 67.18% accuracy rate while backward elimination achieves only 60.9%. Keywords—bloom’s cognitive level; naïve bayes; kernel density estimation; wrapper based feature selection

Introduction Bloom’s taxonomy has been developed by Benjamin Samuel Bloom [1] and widely used to categorize the question item set based on student’s deep understanding. Six levels of Bloom’s taxonomy are Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation. Higher level of bloom’s taxonomy indicates the deeper level of student’s knowledge[2]. Bloom’s taxonomy is used for designing and assessing learning objectives. Teacher or lecturer needsto design the learning objective and question item set to know the student understanding. In fact, it is difficult to design the quality of test items based on

bloom’s taxonomy. Therefore, intelligence system was used to support the teacher for designing the test items. Automatic questions classification into Bloom's cognitive levels using Support Vector Machines (SVM) has been studied by Yahya and Osman [3]. In their research, SVM produced good performance, but the complexity of SVM is very high [4].Another research on Bloom's cognitive levels using machine learning algorithm was done byNorazahYusof and Chai Jing Hui[5][6].They usedBackpropagationNeural Network to determine item set question into six levels of Bloom’s taxonomy. They proposed Document Frequency (DF) and Category Frequency-Document Frequency Method (CF-DF) as feature selection to reduce the complexity ofBackpropagation algorithm. The result showed that DF feature reduction method can be considered as a more effective feature reduction method than the whole feature set or the CF-DF feature reduction method.However,execution time of training Backpropagation algorithm is very slow[7]. Then, the accuracy of their question classification still can be improved. In our work, Naïve Bayes with Kernel Density Estimationusing wrapper based feature selection is proposed. Naïve Bayes has good accuracy and speed for the large training dataset andtherefore Naïve Bayes cannot overcome small training

dataset [8]. This paper applieswrapped based feature selections to improve the performance of Naïve Bayes on small dataset. Based on classification criterion, feature selection can be classified into filter and wrapper based feature election [9]. Filter based feature selection selects the informative features by ranking them according to a criterion function [10]. The wrapper method takes feature selection and pattern classification as a whole and evaluates feature subsets based on classification results directly [11]. Wrapper methods are widely recognized as a superior alternative in supervised learning problems[12].Wrapper methods select the best feature based on classification results. This paper used forward selection and backward elimination as wrapper based method. The difference between forward selection and backward elimination is forward selection starts with the empty set of features, and vice versa. The remainder of this paper is organized as follows. Section 2 introduces the theoretical background. Section 3 presents our conducted experiment and the result. Section 4 is devoted for conclusion.

Theoretical Background Cognitive Level of Bloom’s Taxonomy Educational objectives in the Bloom’s Taxonomy contains of three domains: cognitive, affective and psychomotor. Cognitive domain involves the intellectual skills, while the acts as the emotional and attitudinal areaffective domain component and the psychomotor domain involves the physical skills [13]. This paper focused on cognitive domain of bloom’s taxonomy. Table I shows the keyword relate to cognitive domain of bloom’s taxonomy. TABLE I.

KEYWORDS USED IN KOGNITIVE DOMAIN[14]

shows, solves, uses.

Analysis

Analyzes, breaks down, compares, contrasts, diagrams, deconstructs, differentiates, discriminates, distinguishes, identifies, illustrates, infers, outlines, relates, selects, separates.

Synthesis

Categorizes, combines, compiles, composes, creates, devises, designs, explains, generates, modifies, organizes, plans, rearranges, reconstructs, relates, reorganizes, revises, rewrites, summarizes, tells, writes.

Evaluation

Appraises, compares, concludes, contrasts, criticizes, critiques, defends, describes, discriminates, evaluates, explains, interprets, justifies, relates, summarizes, supports.

Naïve Bayes Classifier Naïve Bayes is a simple probabilistic machine learning algorithm[15]. Naïve Bayes is one of classification algorithm that needs training data to predict the unknown data. Naïve bayes classification is computed based on Bayesian theory. P(C | d ) 

P(C ) P(d | C ) P(d )

or the probability that a given document D belongs to a given class C. P(d ) is the probability of a document, we can notice that P(d ) is a Constance divider to every calculation, so we can ignore it. P(C ) is the probability of a class, we can compute it from the number of documents in the category divided by documents number in all categories. P(d | C ) represents the probability of document given class, and documents can be modeled as sets of words, thus the P(d | C ) can be written like:

Category

Keywords

Knowledge

Defines, describes, identifies, knows, labels, lists, matches, names, outlines, recalls, recognizes, reproduces, selects, states.

P( d | C ) 

Comprehends converts, defends, distinguishes, estimates, explains, extends, generalizes, gives examples, infers, interprets, paraphrases, predicts, rewrites, summarizes, translates.

P(C | d )  P(C )

Comprehension

Application

Applies, changes, computes, constructs, demonstrates, discovers, manipulates, modifies, operates, predicts, prepares, produces, relates,

(1)

P(C | d ) is the probability of class given a document,

 P( w | C )

(2)

i

 P( w | C ) i

(3)

P(wi | C ) is the probability that the i-th word of a given document occurs in a document from class C, and this can be computed as follows:

P( wi | C ) 

Tc   M  N

3. Forward selection start from the empty set Fk   0 

(4) Where Tc is the number of times the word wi that occur inclass C , M is the number of words in category C , N is The size of the vocabulary table,  is the positive constant, usually 1, or 0.5 to avoid zero probability.

Naïve Bayes withKernel Density Estimation Kernel Density Estimation (KDE) can manipulate quantitative attributes for naïve-Bayes [16]. To deal with the quantitative data, naïve Bayes use normal Gaussian distribution. f  g ( x, i ,  c ) 

1 2 c

1 P( D  d | C  c)  n



e

( x  i )2 2 c 2

(5)

 g (x,  , ) i

c

(6) Where i is the range of training data for the attribute i

x in class C , i  xi and  c 

1 nc

, nc is the number

of document in class C .

Wrapper based Feature Selection This paper proposed wrapper based feature selection to improve the performance of Naïve Bayes with Kernel Density Estimation.Wrapper based feature selection utilizes classifier algorithm to evaluate the feature subset by measuring the cross-validation. Commonly wrapper based feature selection are forward selection and backward elimination. The forward selection algorithm starts with an empty feature subset. In each iteration, addingone feature each forward step until a predeﬁned number of features is reached. For one step, each candidate feature is separately added to the current subset and then evaluated. The feature that induced the highest improvement is included in the resulting subset.

4. Iterate a. Select the next best feature j  to add to Fk with most signiﬁcant cost reduction j   arg max J  Fk  j 

b. Update Fk 1  Fk  j  , k  k  1 Fig. 1. The Algorihm of Forward Selection

Backward elimination is difference from forward selection. Backward elimination starts with the full features subset. The feature is removed when the feature decrease the accuracy of classifier. Fig. 1 and Fig. 2 show the algorithm of forward selection and backward elimination, respectively. 3. Start with the full set Fk  1,..., n and k n 4. Iterate a. Remove the worst feature j   arg max J  Fk  j 

b. Update Fk 1  Fk  j  , k  k  1 Fig. 2. The Algorihm of Backward Elimination

Experiment And Result Question Dataset Dataset of this paper was collected from Chai Jing Hui[6]. The question item dataset contains 274 items belongs to six bloom taxonomy level. The distribution of the question item dataset is shown in table II. In order to have good performance of the experiment, k-fold cross-validation has been used. We used 10-fold cross-validation using stratified sampling in the experiment. TABLE II.

DISTRIBUTION OF QUESTION DATASET

Name of Category Knowledge Comprehension Application Analysis Synthesis Evaluation

Number of Question 28 44 41 48 59 54

Performance Measures The performance evaluation of this studywas measured using confusion matrix (Table

III).Confusion matrix can produces two performance measurements: accuracy and kappa coefficient. Kappa is a coefficient to evaluate the agreement between two different observers. Kappa can be employed as a classifier performance measure[17]. TABLE III.

CONFUSION MATRIX Actual Positive

Actual Negative

Predicted Positive

TP

FP

Predicted Negative

FN

TN

TP  TN TP  TN  FP  FN Accuracy  Pc Kappa  1  Pc

Accuracy 

Pc 

(7) (8)

(TP  FN )  (TP  FP)  ( FP  TN )  ( FN  TN ) (9) NN

Experiment results This subsection presents the result of experiment. Our experiment used tokenization, stopword removal, and stemming for preprocessing of question item set document.This paper used stratified 10-fold validation to test the classification result. Cross validation is standard evaluation technique in pattern classification, in which the dataset is split into n parts (folds) of equal size, n  1 folds are used to train the classifier, and the nth fold that was held out is then used to test it [18]. Experiments were conducted with RapidMiner 5.3.005. Three conditions were observed by Naïve Bayes performance: Naïve Bayes Kernel Density Estimation without feature selection (NBK), Naïve Bayes Kernel Density Estimation using Forward Selection (NBK+FS), and Naïve Bayes Kernel Density Estimation using Backward Elimination (NBK+BE). TABLE IV. THE PERFORMANCE OF FORWARD SELECTION AND BACKWARD ELIMINATION ON NAÏVE BAYES WITH KERNELDENSITY ESTIMATION Kappa

60.9%+/- 11.95%

0.525+/- 0.142

1861.8

From Table IV, it can be seen that the using of forward selection and backward elimination improved the accuracy of Naïve Bayes with Kernel Density Estimation. Forward selection has the best accuracy and lower computation compared to backward elimination.Our approach also improves the performance of Chai Jing Hui approach[5]. The accuracy of our proposed method reaches 67.18%.Meanwhile, the performance of Chai Jing Hui only reached 65.9%. Table V and Table VI showed the detailed confusion matrix of Naïve Bayes Using Forward Selection and Naïve Bayes using Backward Elimination, respectively.

Conclusion

Where TP is true positive, TN is true negative, FP is false positive, FN is false negative, and N is total documents.

Accuracy

NBK+BE

Computation Time (seconds)

NBK

57.61%+/- 5.64%

0.484+/- 0.072

2

NBK+FS

67.18%+/- 7.77%

0.598+/- 0.096

748.8

This paper studies the wrapper based feature selection on Naïve Bayes with Kernel Density Estimation to work on question classification according to cognitive level bloom’s taxonomy. We compared the performance of forward selection and backward elimination due to accuracy, kappa, and computational time. The result shows that Naïve Bayes using forward selection has a better accuracyand lower computation timecompared to backward elimination. We expect its performance can still be improved. In the future, more experiment should be conducted to improve the accuracy of Naïve Bayes and reduce the computational time of wrapper based feature selection.

(1)

References

[1] Benjamin Samuel Bloom, Taxonomy of Educational Objectives, Handbook I. The Cognitive Domain. New York: David McKay Co Inc., 1956. [2] Timothy Highley and Anne E. Edlin, "Discrete Mathematics Assessment Using Learning Objectives Based on Bloom’s Taxonomy," in Proceedings of the 39th IEEE international conference on Frontiers in education conference, 2009. [3] Anwar Ali Yahya and Addin Osman, "Automatic classification of questions into

Bloom's cognitive levels using support vector machines ," in The international arab conference on information technology, 2011. [4] M. Janaki Meena and K. R. Chandran, "Naïve Bayes Text Classification with Positive Features Selected by Statistical Method," in First International Conference on Advanced Computing, 2009.

Biopharmaceutical Statistics, vol. 18, no. 5, pp. 827-840, 2008. [11] K. Z. Mao, "Orthogonal Forward Selection and Backward Elimination Algorithms for Feature Subset Selection," IEEE Transactions on Systems, Man, and Cybernetics, vol. 34, no. 1, pp. 629-634, February 2004.

[5] Norazah Yusof and Chai Jing Hui, "Determination of Bloom’s Cognitive Level of Question Items using Artificial Neural Network," in International Conference on Intelligent Systems Design and Applications, 2010, pp. 866-870.

[12] Ranjit Abraham, Jay B. Simha, and S. Sitharama Iyengar, "Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining," International Journal of Computational Intelligence Research, vol. 5, no. 2 , pp. 116– 129, 2009.

[6] Chai Jing Hui, "Feature Reduction for Neural Network in Determining the Bloom’s Cognitive Level of Question Items," Universiti Teknologi Malaysia, Computer Science Master's Project Report 2009.

[13] David Andrich, "Framework relating outcomes based education and the taxonomy of educational objectives. ," Studies in educational Evaluation, vol. 28, pp. 35-59, 2002.

[7] Ji Peirong, Wang Peng, Zhao Qin, and Zhao Li, "A new parallel back-propagation algorithm for neural networks," in IEEE International Conference on Grey Systems and Intelligent Services, 2011, pp. 807-810.

[14] Clark. C, "Learning Domains of Bloom’s Taxonomy: The Three Types of Learning," , 2001.

[8] Li Lei, Huang Yu-guang, and Liu Zhong-wan, "Chinese text classification for small sample set," The Journal of China University of Posts and Communications, vol. 18, no. 1, pp. 83–89, 2011.

[15] Fadi Thabtah, Mohammad Ali H. Eljinini, Mannam Zamzeer, and Wa’el Musa Hadi, "Naïve Bayesian Based on Chi Square to Categorize Arabic Data," Communications of the IBIMA, vol. 10, no. 20, pp. 158-163, 2009.

[9] R.Kohavi and G. H. John, "Wrappers for feature subset selection," Artif. Intell., vol. 97, no. 1-2, pp. 273–324, 1997.

[16] Jingli Lu, Ying Yang, and Geoffrey I. Webb, "Incremental discretization for Naïve-Bayes classifier," in Proceedings of the Second international conference on Advanced Data Mining and Applications, 2006, pp. 223-238.

[10] Yuh-Jye Lee, Chien-Chung Chang, and ChiaHuang Chao, "Incremental Forward Feature Selection with Application to Microarray Gene Expression Data," Journal of

[17] R. Parimala and Dr. R. Nallaswamy, "A Study of Spam E-mail classification using Feature Selection package," Global Journal of Computer Science and Technology, vol. 11, no.

7 , 2011. [18] D.M.Chandwadkar and Dr. M.S.Sutaone, "Role of Features and Classifiers on Accuracy of Identification of Musical Instruments," in CISP2012, 2012.

TABLE V.

CONFUSION MATRIX FOR NAÏVE BAYES USING FORWARD SELECTION

true Knowledge

true Comprehension

true Application

true Analysis

true Synthesis

true Evaluation

class precision

22

0

0

1

2

1

84.62%

pred. Comprehension

0

30

0

3

3

1

81.08%

pred. Application

0

1

26

0

0

0

96.30%

pred. Analysis

0

2

1

27

4

4

71.05%

pred. Synthesis

6

11

12

17

49

18

43.36%

pred. Evaluation

0

0

2

0

1

30

90.91%

78.57%

68.18%

63.41%

56.25%

83.05%

55.56%

pred. Knowledge

class recall

TABLE VI.

CONFUSION MATRIX FOR NAÏVE BAYES USING BACKWARD ELIMINATION

true Knowledge

true Comprehension

true Application

true Analysis

true Synthesis

true Evaluation

class precision

18

2

3

1

0

1

72.00%

pred. Comprehension

4

27

4

6

4

2

57.45%

pred. Application

1

1

20

6

1

4

60.61%

pred. Analysis

2

8

3

25

4

7

51.02%

pred. Synthesis

1

6

5

3

40

3

68.97%

pred. Evaluation

2

0

6

7

10

37

59.68%

64.29%

61.36%

48.78%

52.08%

67.80%

68.52%

pred. Knowledge

class recall

LAPORAN PENELITIAN IPTEKS KLASIFIKASI KALIMAT SOAL MENGGUNAKAN ALGORITMA NAIVE BAYES BERBASIS KERNEL DENSITY ESTIMATION DENGAN SELEKSI FITUR

Recommend Documents