DATA WAREHOUSING DESIGN TRANSPARENCIES
Objectives 2
Issues associated with designing a data
warehouse. Technique for designing the database component of a data warehouse called dimensionality modeling. How a dimensional model (DM) differs from an ER model. A step-by-step methodology for designing a data warehouse. Criteria for assessing degree of dimensionality provided by a data warehouse.
Designing Data Warehouses 3
To begin a data warehouse project, need to find
answers for questions such as:
Kebutuhan user mana yang sangat mendesak Data apa saja yang pertama kali harus disediakan Projek yang mana yang seharusnya bisa diperkecil dan lebih mudah dikelola Infrastruktur mana yang dapat digunakan untuk awal yag nantinya dapat digunakan untuk dataware house sekala besar
Designing Data Warehouses 4
Banyak perusahaan menghindari kompleksitas
memulainya dengna satu atau beberapa data mart Data mart memperbolehkan didesain lebih
simple dan dapat digunakan untuk user yang spesifik
Designing Data Warehouses 5
Hanya sedikit desainer yang bersedia
berkomitmen untuk desain enterprise-wide yang harus memenuhi semua kebutuhan pengguna pada satu waktu.
Meskipun solusi sementara membangun data
mart, tujuan akhir tetap sama yaitu, penciptaan akhir dari sebuah gudang data sesuai dengan kebutuhan perusahaan.
Designing Data Warehouses 6
Pengumpulan persyaratan dan tahap analisis
dari proyek data warehouse melibatkan stakeholder perusahaan dengan mewawancarai staf (seperti pengguna pemasaran, keuangan pengguna, dan pengguna penjualan) untuk memungkinkan identifikasi sekumpulan prioritas persyaratan yang harus dipenuhi oleh warehouse.
Designing Data Warehouses 7
Pada saat yang sama, wawancara dilakukan
dengan anggota staf yang bertanggung jawab untuk sistem operasional, untuk mengidentifikasi sumber data yang bersih, data yang valid, dan konsisten yang akan tetap dapat digunakan selama beberapa tahun ke depan.
Designing Data Warehouses 8
Wawancara akan memberikan informasi yang
diperlukan untuk bentuk top-down (kebutuhan pengguna) dan pandangan bottom-up (sumber data yang tersedia) dari data warehouse. Komponen database dari data warehouse dijelaskan menggunakan teknik pemodelan dimensi.
Dimensionality modeling 9
Teknik desain logis bertujuan untuk menyajikan
data dalam format standar, bentuk intuitif yang memungkinkan untuk mengakses dengan kinerja yang cukup bagus Menggunakan konsep pemodelan ER dengan beberapa pembatasan Setiap model dimensi (DM) terdiri dari sebua table dengan kunci komposit primer, yang disebut tabel fakta, dan sebuah table yang disebut tabel dimensi.
Dimensionality modeling 10
Setiap tabel dimensi memiliki kunci (non-
komposit) primer sederhana yang sesuai dengan salah satu komponen kunci komposit dalam tabel fakta. ‘star scheme' Berbentuk seperti bintang
Dimensionality modeling 11
Semua kunci natural diganti dengan kunci
pengganti (surrogate). Berarti setiap penghubung antara fakta dan tabel dimensi didasarkan pada kunci pengganti, bukan kunci natural. Kunci pengganti memungkinkan data di
warehouse memiliki beberapa kebebasan dari data yang digunakan pada sistem OLTP.
Star schema for property sales of DreamHome 12
Dimensionality modeling 13
Star schema is a logical structure that
has a fact table containing factual data in the center, surrounded by dimension tables containing reference data, which can be denormalized. Facts are generated by events that occurred in
the past, and are unlikely to change, regardless of how they are analyzed.
Dimensionality modeling 14
Bulk of data in data warehouse is in fact tables,
which can be extremely large.
Important to treat fact data as read-only
reference data that will not change over time.
Most useful fact tables contain one or more
numerical measures, or ‘facts’ that occur for each record and are numeric and additive.
Dimensionality modeling 15
Dimension tables usually contain descriptive
textual information. Dimension attributes are used as the constraints
in data warehouse queries. Star schemas can be used to speed up query
performance by denormalizing reference information into a single dimension table.
Dimensionality modeling 16
Snowflake schema is a variant of the star
schema where dimension tables do not contain denormalized data. Starflake schema is a hybrid structure that
contains a mixture of star (denormalized) and snowflake (normalized) schemas. Allows dimensions to be present in both forms to cater for different query requirements.
Property sales with normalized version of Branch dimension table 17
Dimensionality modeling 18
Predictable and standard form of the
underlying dimensional model offers important advantages:
Efficiency Ability to handle changing requirements Extensibility Ability to model common business situations Predictable query processing.
Comparison of DM and ER models 19
A single ER model normally decomposes
into multiple DMs. Multiple DMs are then associated through
‘shared’ dimension tables.
20
Langkah-langkah dalam merancang DW :
Memilih proses Memilih item data yang diperlukan Mengidentifikasi dan membuat dimensi yang sesuai Memilih fakta Menyimpan pra-perhitungan dalam tabel fakta Melengkapi urutan dari tabel dimensi Memilih jangka waktu database Memungkinkan mengubah dimensi secara perlahan Menentukan prioritas query dan mode query.
Step 1: Choosing the process 21
Memilih Proses (fungsi) yang mengacu pada
subyek data mart tertentu. Data mart dibangun pertama kali harus menjadi salah satu yang paling mungkin tepat waktu, sesuai anggaran, dan menjawab pertanyaanpertanyaan bisnis yang paling penting secara komersial.
ER model of an extended version of DreamHome 22
ER model of property sales business process of DreamHome 23
Step 2: Choosing the grain 24
Putuskan record/kolom apa yang mewakili tabel
fakta Identifikasi dimensi dari table fakta. Keputusan detail tabel fakta juga menentukan field setiap tabel dimensi. Juga meliputi waktu sebagai inti dari table dimensi
sebagai mana pada skema star
Step 3: Identifying and conforming the dimensions 25
Dimensions set the context for asking questions
about the facts in the fact table.
If any dimension occurs in two data marts, they
must be exactly the same dimension, or one must be a mathematical subset of the other.
A dimension used in more than one data mart is
referred to as being conformed.
Star schemas for property sales and property advertising 26
Step 4: Choosing the facts 27
The grain of the fact table determines which facts
can be used in the data mart.
Facts should be numeric and additive. Unusable facts include: non-numeric facts non-additive facts fact at different granularity from other facts in table.
Property rentals with a badly structured 28fact table
Property rentals with fact table corrected 29
Step 5: Storing pre-calculations in the fact table 30
Once the facts have been selected each should be re-
examined to determine whether there are opportunities to use pre-calculations.
Step 6: Rounding out the dimension tables 31
Text descriptions are added to the dimension tables. Text descriptions should be as intuitive and
understandable to the users as possible.
Usefulness of a data mart is determined by the scope
and nature of the attributes of the dimension tables.
Step 7: Choosing the duration of the database 32 Duration measures how far back in time the fact
table goes. Very large fact tables raise at least two very
significant data warehouse design issues. Often difficult to source increasing old data. It is mandatory that the old versions of the important dimensions be used, not the most current versions. Known as the ‘Slowly Changing Dimension’ problem.
Step 8: Tracking slowly changing dimensions 33 Slowly changing dimension problem means that
the proper description of the old dimension data must be used with old fact data. Often, a generalized key must be assigned to
important dimensions in order to distinguish multiple snapshots of dimensions over a period of time.
Step 8: Tracking slowly changing dimensions 34 Three basic types of slowly changing dimensions:
Type 1, where a changed dimension attribute is overwritten Type 2, where a changed dimension attribute causes a new dimension record to be created Type 3, where a changed dimension attribute causes an alternate attribute to be created so that both the old and new values of the attribute are simultaneously accessible in the same dimension record
Step 9: Deciding the query priorities and the query modes 35
Most critical physical design issues
affecting the end-user’s perception includes:
physical sort order of the fact table on disk presence of pre-stored summaries or aggregations.
Additional physical design issues include
administration, backup, indexing performance, and security.
Database Design Methodology for Data Warehouses 36
Methodology designs a data mart that
supports requirements of particular business process and allows the easy integration with other related data marts to form the enterprise-wide data warehouse.
A dimensional model, which contains
more than one fact table sharing one or more conformed dimension tables, is referred to as a fact constellation.
Fact and dimension tables for business process of DreamHome 37
each
Dimensional model (fact constellation) for the DreamHome 38 data warehouse
Criteria for assessing the dimensionality of a data warehouse 39
Criteria proposed by Ralph Kimball to
measure the extent to which a system supports the dimensional view of data warehousing. Twenty criteria divided into three broad
groups: architecture, administration, and expression.
Criteria for assessing the dimensionality of 40 a data warehouse
Criteria for assessing the dimensionality of 41 a data warehouse Architectural criteria describes way the
entire system is organized.
Administration criteria are considered to
be essential to the ‘smooth running’ of a dimensionally-oriented data warehouse.
Expression criteria are mostly analytic
capabilities that are needed in real-life situations.