Introduction Problems Limitations Literature Goal Block Diagram Method Get news with RSS RegEx Text Mining Counting Keyword Correlation Analysis GUI Testing and Analyzation Conclution Demonstration Programs
Introduction
News is an important role in human life, because they are require information. People can access news with the various media. There are : print, electronic , and Internet. Most people want to accessed news anywhere and anytime. Hence arose an idea how to make news becomes portable. In this final project will develop a server application that crawling news and manage it. News will be stored in database then be accessed by WML server.
Problems How to build a system which can crawl news in the Universal Resource Location (URL) which registered first. How to make a system which can grouping news based on category have been provided. How to build a system which can show news and access using WAP Technology.
Limitations
URL will be registered manually. Only 3 URL were registered. URL must have a RSS feature. Using Indonesian language. News taken only in RSS page. Only 5 category will be provide, there are : politic, economy, crime, sport, and disaster. Output system specially for using WAP Technology.
Literature
RSS RegEx Text Mining Correlation Analysis WML
Literature cont. RSS .
RSS is family of web feed formats used to publish frequently updated. RSS formats are specified using XML.
Literature cont. RegEx .
Regular expression which is often referred to as the Regex is a formula for finding the pattern of a sentence This formula can also be used to remove existing tags in the HTML document.
Literature cont. Text Mining
Text Mining is process unstructured/textual information, extract meaningful numeric indices from the text .
Literature cont. Correlation Analysis
Is an analysis that included a single measurement techniques association/relationship. Which measure the stength of relationship between 2 variabels. correl x, y
x x y y x x y y 2
2
Literature cont. WML
WML (Wireless Markup Language) base on XML. WML is a markup language intended for device that implement the WAP.
Goal Giving easiness way to get an up to date news using handphone that support WAP.
Block Diagram
Method
Get news with RSS Text Mining Counting Keyword Correlation Analysis
Method Cont. Get news with RSS : Use Case
Method Cont. Get news with RSS : RSS data keputusan presiden http://detik.com/news/keputusan presiden.html <description>presiden menaikkan gaji
Collection items = rss.getChannel().getItems(); title = items.getTitle(); link = items.getLink(); description = items.getDescription();
Method Cont. Get news with RSS : browse to html link getHTMLPage(items.getLink(), items.getTitle()); Presiden menaikkan gaji para pejabat
Method Cont. Get news with RSS : Remove html tags Pattern tag = Pattern.compile("<.*?>"); Matcher mtag = tag.matcher(content); while (mtag.find()) content = mtag.replaceAll(“ “);
Presiden menaikkan gaji para pejabat
Presiden menaikkan gaji para pejabat
Method Cont. Get news with RSS : Create file createFile(title); Presiden akan menaikkan gaji para pejabat
Method Cont. Text Mining
Method Cont. Text Mining : Tokenizing presiden menaikkan gaji para pejabat.
presiden menaikkan gaji para pejabat
Method Cont. Text Mining : Filtering Presiden Menaikkan Gaji Para pejabat
Presiden Menaikkan Gaji pejabat
Method Cont. Text Mining : Stemming Presiden Menaikkan Gaji pejabat
Presiden Naik Gaji jabat
Method Cont. Counting Keyword Presiden Naik Gaji jabat
Presiden=1 Naik=1 Gaji=1 Jabat=1
Method Cont. Correlation Analysis
correl x, y
x x y y x x y y 2
2
GUI Main page server crawling
GUI cont. Page setting URL
GUI cont. Page setting Clean News
GUI cont. Page setting Connection
Testing and Analyzation Testing For some of the data mixed with data in accordance with the data dictionary has been updated last time, thresholding -0.55 and the number of restrictions on the minimum amount of higher than 0. And the mix of data taken from the site http://rss.detik.com/index.php on the date of July 18, 2010 at 21:50
Testing and Analyzation Cont. Output
Testing and Analyzation Cont. Output Judul Berita 2 Janda Pahlawan akan Gelar Aksi Diam di Depan Istana Besok BPN Pacitan 6 Ha Lahan Milik Negara, 4 Ha Milik Keluarga Bupati Pacitan Tuntutan Ganti Rugi Rp 40 M Sulit Terpenuhi Butuh Pengadilan Khusus Tangani Kasus Pidana Pemilukada Gempa Lagi 7,1 SR, Papua Nugini Terancam Dilanda Tsunami Papua Nugini Gempa 7,2 SR Patung Sudirman Dilelang, Pengunjung Kecewa PDIP Masyarakat Bukan Teroris, Patwal Presiden Diminta Lebih Manusiawi Pengembalian Sengketa Pemilukada ke PT Dinilai Melanggar UU Pernah Ikut Pemilihan DPD, Komisi II Akan Minta Keterangan Saut Sirait
Kategori politik politik ekonomi dan bisnis bencana bencana bencana politik politik politik politik
Testing and Analyzation Cont. Analyzation From the experiment results above, there are two who entered the category of news that is not true. Therefore, when in value in percent errors, that appear in these experiments was 20%.