S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
DM-MA/S1IF/FTI/UKM/2010
1
DSS membantu knowledge worker membuat keputusan dengan lebih cepat dan lebih baik ◦ “Bagaimana volume penjualan berdasarkan daerah dan produk pada tahun lalu ?” ◦ “Order mana saja yang harus dipenuhi supaya keuntungannya bisa maksimal ?”
OLAP adalah bagian dari DSS
DM-MA/S1IF/FTI/UKM/2010
2
DM-MA/S1IF/FTI/UKM/2010
3
1. 2. 3.
Warehouse DB Server OLAP Server Clients (tools)
DM-MA/S1IF/FTI/UKM/2010
4
Pendekatan QD : Single layer ◦ Tiap elemen data hanya disimpan sekali ◦ Virtual warehouse
Pendekatan DW : 2 layer ◦ Membedakan real time dan derived data ◦ Paling banyak digunakan di industri
DM-MA/S1IF/FTI/UKM/2010
5
Pendekatan DW : 3 layer ◦ Transformasi real-time data ke derived data seringkali membutuhkan 2 step
DM-MA/S1IF/FTI/UKM/2010
6
DM-MA/S1IF/FTI/UKM/2010
7
Enterprise warehouse: berisi seluruh informasi tentang subject-subject yang meliputi seluruh organisasi. Mis. Produk, sales, customer, lokasi ◦ Butuh business modelling skala besar ◦ Design & build-nya bisa tahunan
Data Marts: Departmental subsets/views dari enterprise warehouse yang berfokus hanya pada subject-subject tertentu. ◦ Misal. Marketing data mart: customer, product, sales ◦ Dapat diimplementasikan tanpa Enterprise WH. Implikasi : lebih cepat, tetapi kompleks dalam integrasinya (dlm jangka panjang)
Virtual warehouse/QD: views dari operational DBS ◦ Berisi berbagai summary view untuk efficient query processing ◦ Mudah dibuat tetapi membutuhkan kapabilitas besar dari operational DB servers
DM-MA/S1IF/FTI/UKM/2010
8
Menyediakan proses query yang cepat dan informal bagi business analyst dalam hal spreadsheets/cubes ◦ Misal. view sales data by geography, time, and/or product
Memperluas spreadsheet analysis model sehingga dapat bekerja dengan warehouse data ◦ Large data sets ◦ Dibuat sehingga dapat memahami business terms/business logic dan dapat melakukan statistical analysis ◦ Mengkombinasikan interactive queries dengan fungsi reporting
Multidimensional view of data adalah dasar OLAP, termasuk hierarchically structured domains ◦ Data model, operations, etc.
DM-MA/S1IF/FTI/UKM/2010
9
Multidimensional Conceptual View Intuitive Data Manipulation Accessibility: OLAP as a Mediator: ◦ OLAP engines sebagai middleware, berada di antara heterogeneous data sources/WH dan OLAP front-end
Batch Extraction vs Interpretive: ◦ menyediakan fasilitas untuk staging database for OLAP data maupun live access ke external data
OLAP Analysis Models: ◦ categorical (parameterised static reporting), exegetical (browsing), contemplative (“what if?” analysis) and formulaic (goal seeking models)
Client Server Architecture: ◦ satu OLAP server dapat menangani banyak client
dll DM-MA/S1IF/FTI/UKM/2010
10
Relational OLAP (ROLAP) ◦ Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware ◦ Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services ◦ Greater scalability
Multidimensional OLAP (MOLAP) ◦ Sparse array-based multidimensional storage engine ◦ Fast indexing to pre-computed summarized data
Hybrid OLAP (HOLAP) (e.g., Microsoft SQLServer) ◦ Flexibility, e.g., low level: relational, high-level: array
Specialized SQL servers (e.g., Redbricks) ◦ Specialized support for SQL queries over star/snowflake schemas
DM-MA/S1IF/FTI/UKM/2010
11
DM-MA/S1IF/FTI/UKM/2010
12
Harus ada penanda level di tiap dimension Tiap query harus menggunakan penanda level tersebut
DM-MA/S1IF/FTI/UKM/2010
13
DM-MA/S1IF/FTI/UKM/2010
14
Multi-dimensional data disajikan dengan simple Jika menggunakan star, jumlah join yang harus ada relatif sedikit Lebih low maintenance Kelemahan : harus mengusahakan query optimization
DM-MA/S1IF/FTI/UKM/2010
15
Relational OLAP Server
sale
prodId p1 p2 p1
date 1 1 2
sum 62 19 48
tools
utilities
ROLAP server
Special indices, tuning; Schema is “denormalized”
relational DBMS
16
Multi-Dimensional OLAP Server Sales B A
M.D. tools
Product
milk soda eggs soap
1
2 3 Date
utilities multidimensional server
could also sit on relational DBMS
17
4
SELECT D1.d1, …, Dk.dk, agg1(F.f1,) FROM Dimension D1, …, Dimension Dk, Fact F WHERE D1.key = F.key1 AND … AND Dk.keyk = F.keyk AND otherPredicates GROUP BY D1.d1, …, Dk.dk HAVING groupPredicates
DM-MA/S1IF/FTI/UKM/2010
18
Skema : ◦ Fact : Sales, Dimensi : Produk, Toko, Waktu
Query RollUp : ◦ Tampilkan jumlah produk terjual yang lebih besar dari 50 unit per toko Toko SumJumlah
Hasil Query :
Toko1
85
Toko2
120
Query : SELECT t.kodet, sum(s.jmlunit)as SumJumlah FROM toko t, sales s WHERE t.kodet = s.kodet GROUP BY t.kodet HAVING sum(s.jmlunit) > 50
DM-MA/S1IF/FTI/UKM/2010
19
MDX = Multidimensional Expression FORMAT Query : [WITH [MEMBER <member-name> AS ’
’ | SET <set-name> AS ’<set-expression>’] . . .] SELECT [ [, ...]] FROM [<cube_specification>] [WHERE [<slicer_specification>]]
DM-MA/S1IF/FTI/UKM/2010
20
Cube : Jualan, Dimensi : Produk, Time Query : ◦ tampilkan total Count Jual untuk Produk 100 sampai dengan 150 untuk setiap bulan
Hasil Query :
MDX :
P100 P110 P120 1
50
30
25
2
20
25
20
3
30
45
35
with member [Produk].[Produk].Roll_Up as ' Sum( {[Produk].[Produk].[100] : [Produk].[Produk].[120]})' select { [Produk].[Produk].Roll_Up } on columns, {[Time New].[Month Of Year].members } on rows from Jualan DM-MA/S1IF/FTI/UKM/2010
21
Traditional Access Methods ◦ B-trees, hash tables, R-trees, grids, …
Popular in Warehouses ◦ ◦ ◦ ◦
inverted lists bit map indexes join indexes text indexes
22
18 19
20 21 22
23 25 26
age index
r5 r19 r37 r40
rId r4 r18 r19 r34 r35 r36 r5 r41
name joe fred sally nancy tom pat dave jeff
...
20 23
r4 r18 r34 r35
inverted lists
data records
23
age 20 20 21 20 20 25 21 26
Query: ◦ Get people with age = 20 and name = “fred”
List for age = 20: r4, r18, r34, r35 List for name = “fred”: r18, r52 Answer is intersection: r18
24
20 23
20 21 22
1 1 0 1 1 0 0 0 0
23 25 26
age index
bit maps
0 0 1 0 0 0 1 0 1 1
id 1 2 3 4 5 6 7 8
name joe fred sally nancy tom pat dave jeff
...
18 19
data records
25
age 20 20 21 20 20 25 21 26
Query: ◦ Get people with age = 20 and name = “fred”
List for age = 20: 1101100000 List for name = “fred”: 0100000001 Answer is intersection: 010000000000
Good if domain cardinality small Bit vectors can be compressed
26
join index product
sale
id p1 p2
rId r1 r2 r3 r4 r5 r6
nam e price bolt 10 nut 5
prodId p1 p2 p1 p2 p1 p1
jIndex r1,r3,r5,r6 r2,r4
storeId c1 c1 c3 c2 c1 c2
date 1 1 1 1 2 2
amt 12 11 50 8 44 4
27
What data is needed? Where does it come from? How to clean data? How to represent in warehouse (schema)? What to summarize? What to materialize? What to index?
28
Development ◦ design & edit: schemas, views, scripts, rules, queries, reports
Planning & Analysis ◦ what-if scenarios (schema changes, refresh rates), capacity planning
Warehouse Management ◦ performance monitoring, usage patterns, exception reporting
System & Network Management ◦ measure traffic (sources, warehouse, clients)
Workflow Management ◦ “reliable scripts” for cleaning & analyzing data
29