Database systems
Normal forms
An example of a bad model SSN
Surnam
OfficeNo
1001 1001 1001 1201 1221 1234
Novák Novák Novák Řezáč Falc Lupínek
238 238 238 204 216 308
City
Street
Liteň Hlavní Bystřice Benešova Benešov Jiráskova Praha Ječná Kladno Uhelná Kladno Rudná
No
ZIP
Region
10 112 124 21 234 23
26727 25751 25601 12000 27201 27202
Středočeský Středočeský Středočeský Praha Středočeský Středočeský
President_of_ Region Rath Rath Rath Bém Rath Rath
Paradox on update: • When Mr. Novak changes his office number, we will have to apply this change to all rows related to Mr. Novak. • Similarly, a change of the president of the region „Středočeský kraj“ would initiate modification of multiple rows.
Příklad špatného modelu SSN
Surnam
OfficeNo
1001 1001 1001 1201 1221 1234
Novák Novák Novák Řezáč Falc Lupínek
238 238 238 204 216 308
City
Street
Liteň Hlavní Bystřice Benešova Benešov Jiráskova Praha Ječná Kladno Uhelná Kladno Rudná
No
ZIP
Region
10 112 124 21 234 23
26727 25751 25601 12000 27201 27202
Středočeský Středočeský Středočeský Praha Středočeský Středočeský
President_of_ Region Rath Rath Rath Bém Rath Rath
Paradox on insert: • When we will hire the first employee with address in the region of „Jihočeský kraj“, we would have to acquire the name of the president of the region „Jihočeský kraj“, if this column is required (mandatory).
Příklad špatného modelu SSN
Surnam
OfficeNo
1001 1001 1001 1201 1221 1234
Novák Novák Novák Řezáč Falc Lupínek
238 238 238 204 216 308
City
Street
Liteň Hlavní Bystřice Benešova Benešov Jiráskova Praha Ječná Kladno Uhelná Kladno Rudná
No
ZIP
Region
10 112 124 21 234 23
26727 25751 25601 12000 27201 27202
Středočeský Středočeský Středočeský Praha Středočeský Středočeský
President_of_ Region Rath Rath Rath Bém Rath Rath
Paradox on delete: • If Mr, Řezáče (the only one living in Prague) leaves our company, we will loose the information that the president of the region „Praha“ is Mr. Bém.
Konvention Let X = { X1 , X2, … Xn } be a table with × columns X1 , X2, … Xn. Symbol X denotes the cartesian product × X = X1 × X2 × … × Xn . ×
Symbol x denotes an ordered tuple × × x ∈X .
Functional dependency Let R(A) be a table, where A is a set of all columns of the table R. Let X,Y be sets of columns X,Y ⊆ A. Y is said to functionally depend on X iff there is at × × × × most one value y ∈ Y for any value x ∈ X . Functional dependence is marked as X → Y. X → Y means that Y is not functionally dependent on X.
Partial functional dependency Let X and Y be some sets of columns of a table. Let X → Y. X → Y is a partial functional dependence iff tehre exists a set of columns X1⊂ X such that X1 → Y .
Transitive functional dependecy Let X and Y be sets of some columns of a table, and C be a column of the same table such that C∉X and C∉Y. The column C is said to be transitively dependent iff the following two conditions hold concurrently: • X → Y → C • Y → X .
First normal form A table is in the first normal form iff the following conditions hold concurrently: 1. 2. 3.
The columns are atomic, i.e. any cell of a table contains exactly one value from the respective domain. All columns are regular. It means that rows do not have any hidden columns such as row IDs, object IDs, or hidden timestamps. Rows are unique.
Příklad relace, která není v 1.NF:
Otec Jan Jan Josef
Rodiče Matka Děti Marta {Eliška, Lucie} Marie {Jiří, Josef} Marta {Leoš}
Second normal form A table is in the second normal form if the following conditions hold concurrently: 1. 2.
The table is in the 1st normal form. None non-key attribute (i.e. attribute that do not participate in any key) is partially functionally dependent on any key of the table.
Table {StudentId, CourseId, StudentName, Semester} is not in 2NF as: • {IdStudenta, IdPredmetu} is a key • Non-key column StudentName is functionally dependent on {StudentId}. It means it is partially functionally dependent on a key. • Non-key column Semester is functionally dependent on {CourseId}. It means it is partially functionally dependent on a key.
Second normal form (2NF) The table {StudentaId, CourseId, StudentName, Semester} is not in 2NF because of: • {StudentId, CourseId} is a key. • Non-key column StudentaName is functionally dependent on {StudentaId}. Hence, it is partially functionally dependent on a key. • Non-key attribute Semester is functionally dependent on {CourseId}. Hence, it is partially functionally dependent on a key. Solution: The relation formed by the table above shall be decomposed into three tables: • {StudentId, CourseId} • {StudentaId, StudentName} • {CourseId, Semester}
Third normal form (3NF) A table is in the third normal form iff the following holds concurrently: 1. 2.
The table is in 2NF. Non non-key attribute is transitively dependent on any key of the table.
E.g. the table {Citizen, City, CityPopulation} is not in 3NF, as: • It is in 2NF, but • Citizen is a key • City is functionally dependent on Citizen but not vice-versa. • CityPopulation is functionally dependent on City. • Hence, CityPopulation is transitively functionally dependent of Citizen.
Třetí normální forma E.g. the table {Citizen, City, CityPopulation} is not in 3NF, as: • It is in 2NF, but • Citizen is a key • City is functionally dependent on Citizen but not vice-versa. • CityPopulation is functionally dependent on City. • Hence, CityPopulation is transitively functionally dependent of Citizen.
Solution: The table will be split into: • {Citizen, City} • {City, CityPopulation}