Topik Penelitian Program Doktor Ilmu Komputer
INDRA BUDI
[email protected]
Topik
Information Extraction Information Extraction on legal document Technology forecasting
Information Extraction
Information extraction (IE) systems Find and understand limited relevant parts of texts Gather information from many pieces of text Produce a structured representation of relevant information:
relations (in the database sense), a.k.a., a knowledge base
Goals: 1. 2.
Organize information so that it is useful to people Put information in a semantically precise form that allows further inferences to be made by computer algorithms
Information Extraction (IE)
IE systems extract clear, factual information
Roughly: Who did what to whom when?
E.g.,
Gathering earnings, profits, board members, headquarters, etc. from company reports
The headquarters of BHP Billiton Limited, and the global headquarters of the combined BHP Billiton Group, are located in Melbourne, Australia. headquarters(“BHP Biliton Limited”, “Melbourne, Australia”)
Learn drug-gene product interactions from medical research literature
IE Example: The Event Meeting Extraction
Menteri LuarWhere Negeriwas Inggris Mike location O’Brien bertemu dengan Megawati the meeting Soekarnoputri of di Istana Negara,? Jakarta kemarin. Dalam pertemuan itu, dia menyampaikan undangan Tony Blair untuk berkunjung ke Inggris. Megawati yang merupakan wanita pertama yang menjadi presiden di Indonesia, menyambut baik undangan tersebut dan berjanji akan memenuhinya. (British Foreign Office Minister Mike O'Brien met with Megawati Soekarnoputri in State Palace, Jakarta yesterday. At that meeting, Who have been he sent the invitation from Tony Blair to visit to England. the first attendedMegawati, the woman whomeeting become ?president in Indonesia, appreciated and promised to fullfill the invitation)
When did the meeting occur ? <meeting>
<participants>
IE Example: The Event Meeting Extraction
Menteri Luar Negeri Inggris Mike O’Brien bertemu dengan Megawati Soekarnoputri di Istana Negara, Jakarta kemarin. Dalam pertemuan itu, dia menyampaikan undangan Tony Blair untuk berkunjung ke Inggris. Megawati yang merupakan wanita pertama yang menjadi presiden di Indonesia, menyambut baik undangan tersebut dan berjanji akan memenuhinya. (British Foreign Office Minister Mike O'Brien met with Megawati Soekarnoputri in State Palace, Jakarta yesterday. At that meeting, he sent the invitation from Tony Blair to visit to England. Megawati, the first woman who become president in Indonesia, appreciated and promised to fullfill the invitation)
<meeting>
05/12/2003 Istana Negara Jakarta Indonesia <participants>
Megawati Soekarnoputri Presiden Indonesia Mike O'Brien Menteri Luar Negeri Inggris
IE Phases
Two Approaches Building Extraction Systems
Knowledge Engineering Approach Grammars are constructed by hand Domain patterns are discovered by a human expert through introspection and inspection of a corpus Much laborious tuning and “hill climbing”
Automatically Trainable Systems Use statistical methods when possible Learn rules from annotated corpora Learn rules from interaction with user
Legal Document Extraction Paper-Based Legal Document Scan without OCR 1. Re-write 2. Scan with OCR Legal Document in PDF format Convert
Text-Based Legal Document
Convert
IRS & IES
XML-Based Legal Document
IES
IRS Legal Document Analysis
Legal Search Engine
Legal Summarization
Sejarah Perubahan UU MA-10
Rincian Hasil Pencarian Sejarah Perubahan UU berdasarkan Pasal 11
Technology Forecasting MA-12
Sumber-sumber informasi ilmu pengetahuan dan teknologi (Science and Technologi/S&T) saat ini berkembang dengan pesat
Paten menunjukkan aktivitas pengembangan dalam siklus inovasi Publikasi ilmiah menunjukkan aktivitas riset dasar dan terapan.
Pengambil kebijakan dalam perencanaan dan manajemen aktivitas penelitian dan pengembangan (litbang) suatu lembaga baik pemerintah/swasta semakin dituntut untuk memanfaatkan informasi yang tersedia dalam skala besar tersebut. Dari sisi pemerintah, kecenderungan teknologi diperlukan untuk menetapkan prioritas kebijakan dan arah riset
Ringkasan Studi Literatur utk Technology Forecasting
Future Research on IE
Hukum Risalah sidang nama tersangka, jenis kasus, nama hakim, UU yg digunakan, saksi, tuntutan, vonis, dsb Medical Resep dokter nama obat, dosis, diagnosa, dsb Sport Berita Sepakbola nama tim yg bertanding, skor, pencetak gol, wasit, tempat, dsb. Use Case Scenario from Use case description Generating ER-Diagram from Database specification
Future Research (cont) MA-15
NER/Information Extraction for social media Pengelompokan topik penelitian berdasarkan hasil pencarian dari basisdata online Perbandingan teknik prediksi untuk topik penelitian Seleksi fitur time series untuk prediksi
Publikasi
Agus Widodo, Indra Budi & Belawati Widjaja, Automatic lag selection in time series forecasting using multiple kernel learning, International Journal of Machine Learning and Cybernetics, Agustus 2015. Indra Budi, Agus Widodo & Rizal F Aji, Prediction of Research Topics Using Ensemble of Best Predictors from Similar Dataset, World Academy of Science, Engineering and Technology, International Science Index 85, International Journal of Computer, Information, Systems and Control Engineering, (2014), 8(1), 82 – 88. Indra Budi, Agus Widodo & Rizal F Aji, Prediction of research topics on science & technology (S&T) using ensemble forecasting, International Journal of Software Engineering and its Applications, (2013), 7 (5), pp. 253-268, Indra Budi, and et al. Application of association rules mining to Named Entity Recognition and co-reference resolution for the Indonesian language, Inderscience Publishers, Int. Journal of Business Intelligence and Data Mining 2007-Vol. 2, No. 4, pp. 426-446