Statisztika Politológus képzés
Daróczi Gergely Politológia Tanszék
2011. április 27.
Outline 1
Ismétlés
2
Példa
3
Elméleti háttér
4
Elméleti háttér
5
Feladatok
6
A korrelációs együttható korlátai
7
Repeating
8
Crosstables
9
Simpson’s paradox Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
2 / 40
Ismétlés Középértékek
10-10 diák magasságát mértük 2 osztályteremben? 170
180
190
cm
170
180
190
cm
Melyik osztály diákjai a magassabbak a leíró statisztikák alapján adható becslések alapján?
Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
3 / 40
Ismétlés Középértékek
170
180
190
cm
170
180
190
cm
a b
160
●
●
165
●
●
●
●
●
●
170
●
●
●
*
●
●
* ●
●
●
175
●
180
●
●
●
185
190
195
Height (cm)
Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
4 / 40
Általános iskolában végzett felmérés ˝ Cipoméret és IQ
˝ ˝ és Egy általános iskolában felmérést végeztünk a diákok cipoméretér ol ˝ Az eredmények: egy matematika teszten nyújtott teljesítményükrol. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Daróczi Gergely (PPKE BTK)
˝ Cipoméret 29.75 29.75 29.75 31.50 31.50 31.50 31.50 33.25 33.25 33.25 35.00 35.00 35.00 35.00 35.00 38.50 40.25 42.00 42.00 42.00 42.00 43.75
Matek 26.67 33.33 41.67 35.00 46.67 63.33 70.00 30.00 38.33 56.67 26.67 40.00 43.33 46.67 53.33 55.00 45.00 58.33 76.67 77.50 100.00 70.83
Statisztika
Kor 3 7 5 8 10 11 12 7. 7 12 6 8 6 10 11 9 9 9 16 18 19 14
2011-04-27
5 / 40
Általános iskolában végzett felmérés ˝ Cipoméret és IQ
100
1
●
90
80
70
●
●
●
60
● ● ● ●
50 ●
Result in math exam
Result in math exam
● ●
● ● ●
●
40
● ● ● ●
30
● ●
●
30
32
Daróczi Gergely (PPKE BTK)
34
36
Shoe size Statisztika
38
40
42
2011-04-27
6 / 40
Elméleti háttér Covariation
x és y közös variabilitásának meghatározása: n
COV (xy ) =
(xi − x )(yi − y ) n−1 i =1
∑
s remember : σ =
n
∑
(x i − x )2 n
i =1
120
Ezekiel, M. (1930) Methods of Correlation Analysis. Wiley.
100
●
●
80
● ●
●
60
● ●
● ● ●
● ●
● ●
20
● ●
● ●
0
●
●
40
Stopping distance (ft)
● ● ● ●
●
Daróczi Gergely (PPKE BTK) 5
●
● ● ●
●
● ●
●
●
●
● ●
● ● ●
●
● ● ●
● ●
●
● ●
10
Statisztika 15
20
25
2011-04-27
7 / 40
Elméleti háttér Kovariancia
120
Ezekiel, M. (1930): Methods of Correlation Analysis
Henderson & Velleman (1981): Building multiple regression models interactively ●
●
●
100
5
●
●
60
●
●
● ●
● ● ●
● ● ●
●
●
●
● ● ●
●
●
●
4
Weight (lb/1000)
●
●
● ● ●
● ●
● ●
●
●
●
●
●
● ●
●
●
● ● ● ● ● ● ● ● ● ● ●
●
● ●
0
●
2
20
●
●
●
● ●
● ●
●
●
3
80
●
40
Stopping distance (ft)
● ● ● ●
●
●
●
●
●
5
● ●
●
10
15
20
25
3.0
Speed (mph)
Daróczi Gergely (PPKE BTK)
3.5
4.0
4.5
5.0
Rear axle ratio
Statisztika
2011-04-27
8 / 40
Elméleti háttér Korreláció
n
n
∑ (xi − x¯)(yi − y¯) rxy =
i =1
(n − 1)sx sy
∑ (xi − x¯)(yi − y¯) =r
i =1 n
i =1
Daróczi Gergely (PPKE BTK)
n
∑ (xi − x¯)2 ∑ (yi − y¯)2
Statisztika
i =1
2011-04-27
9 / 40
Elméleti háttér Parciális korreláció
ˆrXY ·Z = q
N N N ∑N i =1 rX ,i rY ,i − ∑i =1 rX ,i ∑i =1 rY ,i
2 N ∑N i =1 rX ,i
−
2 ∑Ni=1 rX ,i
q
N 2 N ∑N i =1 rY ,i − ∑i =1 rY ,i
2
három változó esetében:
r −r r ˆrXY ·Z = q XY X Z Y Z (1 − rX2 Z )(1 − rY2 Z )
Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
10 / 40
Feladatok 1
2
Határozd meg az alábbi adatbázis alapján a korrelációs együtthatókat! Ne felejtkezz el a parciális tagokról! Tan. átlag 3.05 3.2 3.35 3.35 3.45 3.55 3.7 45 3.8 3.8
Daróczi Gergely (PPKE BTK)
Ösztöndíj (HUF) 22000 25000 27000 24000 25000 28000 28000 30000 27000 29000 Statisztika
Könykiadás (HUF) 3500 3000 2800 3700 2200 3200 3700 4100 4000 3800 2011-04-27
11 / 40
Feladatok Megoldás
Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
12 / 40
A korrelációs együttható korlátai
Kauzalitás Lazarsfeld paradigma Linearitás
Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
13 / 40
A korrelációs együttható korlátai Correlation does not imply causation!
Source: http://xkcd.com/552
Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
14 / 40
A korrelációs együttható korlátai Correlation does not imply causation! - Elméleti háttér
Aristotle: logic, syllogism – if (A → B )&(B → C ) ⇒ A → C David Hume: scepticism „only correlation can actually be perceived [not causality]” see: our belief that the sun will rise tomorrow see: „If I see a billiard ball moving towards another, on a smooth table, I can easily conceive to stop upon contact.” Popper: falsification Pearl, J. - Causality: Models, Reasoning, and Inference, Cambridge University Press, 2000 Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
15 / 40
A korrelációs együttható korlátai Lazarsfeld paradigma
Stouffer: The American Soldier
„Soldiers in branches with higher promotion rates are happier than soldiers in branches with lower rates of promotion.” Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
16 / 40
A korrelációs együttható korlátai Lazarsfeld paradigma
Stouffer: The American Soldier H0 : Soldiers in branches with higher promotion rates are happier than soldiers in branches with lower rates of promotion. BUT: „Soldiers in branches with higher promotion rates were more pessimistic about their own chances of being promoted than soldiers in branches with lower rates of promotion.” Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
17 / 40
A korrelációs együttható korlátai Linearitás
Forrás: Anscombe, F. J. (1973) Graphs in statistical analysis. American Statistician, Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
18 / 40
Feladat The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). mpg: Miles/(US) gallon cyl: Number of cylinders disp: Displacement (cu.in.) hp: Gross horsepower drat: Rear axle ratio wt: Weight (lb/1000) qsec: 1/4 mile time vs: V/S am: Transmission (0 = automatic, 1 = manual) gear: Number of forward gears carb: Number of carburetors Source: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391-411. Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
19 / 40
Exercise 50
*** 0.85 *** 0.85
200
2 3 4 5
0.0 0.4 0.8
*** 0.87 *** 0.68
0.42
*
3.0
*** 0.66
*** 0.60
4.0
5.0
**
0.55
**
**
0.53
***
0.39
0.48
● ●● ● ● ● ● ●●● ●
cyl
● ●● ●● ●
6
0.59
*** 0.81 ***
*** 0.79
*** 0.71
*** 0.89
0.43
0.45
**
*** 0.66
drat
*** 0.71
**
0.49
0.59
***
0.56
0.24
0.13
0.52
● ● ●
● ● ● ●
● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ●●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ●●● ● ● ●
●● ●
● ● ● ●
● ●● ● ●● ●● ●● ●● ● ●●● ● ● ● ● ●● ● ● ●
●
●
● ●● ●● ●● ● ●● ● ● ● ● ● ● ●● ●●●● ● ● ● ●● ● ●●
● ● ● ●
● ● ● ● ● ● ●
● ●● ●●● ● ●●●
●
●
●
●
●
●●●
●
●
●
● ●●● ●●●
● ●● ● ● ●●● ● ● ● ● ●●●
●
●
●
●● ●
●
●
●
●
●
●
●
0.8
hp ●
● ●● ●●● ● ● ●● ● ● ●
● ● ● ● ● ● ●
●
●●● ● ● ● ●● ●● ● ●● ● ● ●● ● ●● ●● ●
*** 0.71
*** 0.71
*** 0.72
● ● ●
●
● ● ●●● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ●
●
wt
● ● ●● ●● ●● ●●● ●●●● ● ●● ● ● ●● ● ● ● ● ●
● ●●● ●●● ●● ●●
●● ●● ● ● ● ●●
●
●
● ● ●● ● ●● ●● ● ●●●● ●● ●● ● ● ● ● ●●● ● ● ● ●
●●●●● ●●
*** 0.70
***
*** 0.69
0.58
● ● ●● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ●●
●
●
● ●● ●● ●●● ● ●●
●
● ●● ●● ● ●●● ●●●
*** 0.74
qsec
● ● ●
● ●● ●●●● ● ●
●
●●●● ● ● ●● ● ●●
●● ●● ●●●●●● ●● ●●● ●●
●
●
●
●●●●● ● ●●●● ● ●
●
●●●● ●● ●● ●
***
● ●● ● ●● ● ●● ● ●●
● ● ●
0.23
0.21
4.5
●●● ●●● ● ●●● ● ●● ● ● ●●●● ● ●● ●
●●
● ●●●●● ●● ● ●
●
●
20
30
● ●
●●
●
●
● ● ● ●
● ● ●●●● ● ● ●● ●
100
●●●●● ● ●
●●●●● ● ●
300
● ● ●● ● ●● ● ●● ●●
● ● ● ●● ● ● ●●●●
●● ●●●●● ● ●● ●● ●● ●● ●● ● ● ●
● ● ●
● ● ●●● ●
●● ● ●● ● ● ● ●●●●● ●
● ●● ● ●● ● ●●
●● ●● ●●●● ● ● ● ●
●
● ●●●● ● ●● ●●
●● ● ●
●●●●●● ● ●● ● ● ● ● ●● ●● ● ●● ●● ●●
3.0
4.0
5.0
●
●● ● ●● ● ● ● ● ●● ● ●
●
●● ● ● ● ●● ● ●
●
● ●● ● ●
●● ●● ●●● ● ●●
● ● ●
● ● ● ●● ● ● ●● ● ●
●●
● ●● ●●●
●● ●● ●●
● ● ● ●● ●●●●●
●
● ● ●
● ●● ●● ●● ●●●●●
●● ● ●●● ●●●●●● ● ● ●● ●
●
● ●● ●●●●● ●●● ● ● ● ● ●●●● ● ● ●● ● ●●
10
●●● ●● ●●●●●● ● ● ●● ● ●●●
*
0.17
0.21
●
●
●
●
●
●
***
0.57
●
●
●
●
●
●
*** 0.79
16
20
●
● ● ●
gear
●
●
●
● ●●●●●● ●●● ● ●● ● ● ●● ● ● ●●●● ● ●
● ●
● ● ● ●
0.0 0.4 0.8
● ● ●
0.27
●
● ●
0.058
●
● ●
● ● ●
*** 0.66
● ● ●●●● ● ● ●● ● ● ● ●
am ● ●
0.43
●
vs ● ●● ● ● ● ● ●●● ● ●●
0.09
●
●
●● ●● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ●●
*** 0.71
0.55
0.17
*** 0.75
*
0.44
0.091
●● ●
*
100 400
*
3.0 4.5
● ● ● ● ● ● ●
disp
● ● ● ● ● ● ●
22
● ● ● ●
● ● ● ●
● ●
●●● ●● ● ● ● ● ● ● ●●● ●● ● ●● ● ●● ● ● ●●●
●● ●● ●● ●● ●●●
3.0
**
16
4
*** 0.78
0.8
● ● ● ●
● ● ●
2
*** 0.70
● ● ● ● ● ● ●
0.0
4 250 50
● ●●● ● ● ●●●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ●● ●● ●● ● ● ● ●●●● ●● ● ● ●●
0.0
*** 0.83 *** 0.90
●●●●● ●●● ● ● ● ●● ●●● ● ●●● ● ● ●● ● ●● ●● ● ●● ● ●●●
● ● ● ● ●
●
●
● ●
●
carb 1
3
5
1 4 7
8
10
*** 0.78
25
4 5 6 7 8
mpg
7
Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391-411. Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
20 / 40
Exercise Edgar Anderson’s Iris Data
●
●
● ● ● ● ● ●●● ●●● ●● ● ●●● ●● ●●● ● ● ● ●●●● ● ●● ● ● ● ●
● ● ●● ● ● ●● ● ●● ● ● ●●● ●●●● ●●●●●●● ● ● ●● ●
4.5
5.0
5.5
●
● ● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●●● ●● ●
●
●
6.0
6.5
7.0
7.5
8.0
● ● ●● ● ● ● ●●● ●● ● ●● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ●●● ● ●
●
● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ●●●●● ● ● ●● ●●●●● ●●●●● ●●● ● ●
● ● ● ● ●● ● ● ● ● ●● ●● ●
● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ●● ●●● ●●●●● ● ●● ● ● ● ●● ●●● ● ● ●● ●●●●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●
● ●● ● ● ●●
● ● ●●● ●● ●●● ● ●●● ● ● ● ● ● ●● ● ● ● ●
●
7.5
● ● ●● ● ● ● ● ● ●● ●●●● ●●● ● ● ●
●
● ● ●●● ●●● ● ●● ● ●● ● ●● ●● ●● ●
●
●
● ●● ● ● ●●●● ● ● ● ● ●● ● ●●●● ● ● ●●●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ●●● ● ● ●●● ● ● ●●●●●●● ●● ● ● ● ● ●● ● ● ● ●●
● ●
●
● ● ● ● ● ● ●●●● ● ● ● ●● ● ●● ● ●●●
● ●●● ● ● ● ● ●● ● ●●● ● ● ●●●●● ● ● ● ● ● ● ●●● ● ● ● ●●●● ●● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●
Petal.Length
● ● ● ●●● ● ●● ● ●●●●●●●●●● ● ● ●● ● ●
4.5 ●
● ●
●
● ●
● ● ●● ●● ●● ●●● ● ● ●● ● ●● ●● ●●● ●●●●● ● ● ● ● ● ●
●
●
●
● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ●
● ●● ●●
●
0.5 1.0 1.5 2.0 2.5
●
●
● ●● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ●●●●●●● ● ●● ● ● ● ● ● ● ●● ●●●● ● ● ● ● ●●● ● ●●● ●●●● ● ● ●● ● ● ●● ●● ● ●● ●
●● ● ●● ●●● ● ● ● ●
● ● ● ●● ● ● ●● ● ● ●● ● ●●● ●● ●● ● ● ●● ●● ●●●● ●●●● ● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ●
● ● ● ●
●
● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ●● ●● ● ● ●●● ●● ● ●●●● ●●●●●●● ● ● ●● ●● ●●● ●● ●● ● ●● ● ● ● ●● ● ● ●● ●
● ● ● ●● ● ● ● ● ●● ● ● ● ●●
● ● ● ● ● ● ●●● ● ● ● ●●●● ●●●● ● ● ● ●●● ● ●● ● ●● ● ●
●
Sepal.Width
2.5
● ●●●●
7
● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ●●● ● ● ●● ●● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ● ● ●● ●●● ● ●● ●●● ●●●● ●● ●● ● ●● ●●●●● ● ● ●●● ●●●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●
●
2.0
● ● ●
6
2.0 2.5 3.0 3.5 4.0
●
● ●● ●●● ● ● ● ● ● ●● ● ●● ● ●●● ●●● ● ● ● ● ● ● ●●● ● ●●
1.5
5
Sepal.Length
1.0
6.5
0.5 ● ● ●● ● ● ● ●● ● ●●● ● ●● ● ●●● ● ● ●●●●● ● ●●● ● ●● ● ●●●●● ●●● ● ● ● ● ● ● ● ● ● ● ●●●● ●●●● ● ●● ●● ● ●● ● ● ●●● ●●●●●● ●● ● ●● ● ● ●
5.5
4.0
4
3.5
● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ●●●●● ● ● ●●● ● ●● ● ●● ● ● ● ●●●● ●● ● ● ● ● ● ● ●● ●●● ●●● ● ● ●●● ●●●●● ● ● ●● ● ● ● ● ●● ● ●● ●●●● ●●● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●
3
3.0
2
2.5
1
2.0
Petal.Width
● ● ● ●●● ● ●●● ● ● ●●●●●● ● ● ●●
1
2
3
4
5
6
7
Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2-5. Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
21 / 40
Exercise Edgar Anderson’s Iris Data
●
●
● ● ● ● ● ●●● ●●● ●● ● ●●● ●● ●●● ● ● ● ●●●● ● ●● ● ● ● ●
● ● ●● ● ● ●● ● ●● ● ● ●●● ●●●● ●●●●●●● ● ● ●● ●
4.5
5.0
5.5
●
● ● ●
● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ●●● ●● ●
●
●
6.0
6.5
7.0
7.5
8.0
● ● ●● ● ● ● ●●● ●● ● ●● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ●●● ● ●
●
● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ●●●●● ● ● ●● ●●●●● ●●●●● ●●● ● ●
● ● ● ● ●● ● ● ● ● ●● ●● ●
● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ●● ●●● ●●●●● ● ●● ● ● ● ●● ●●● ● ● ●● ●●●●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●
● ●● ● ● ●●
● ● ●●● ●● ●●● ● ●●● ● ● ● ● ● ●● ● ● ● ●
●
7.5
● ● ●● ● ● ● ● ● ●● ●●●● ●●● ● ● ●
●
● ● ●●● ●●● ● ●● ● ●● ● ●● ●● ●● ●
●
●
● ●● ● ● ●●●● ● ● ● ● ●● ● ●●●● ● ● ●●●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ●●● ● ● ●●● ● ● ●●●●●●● ●● ● ● ● ● ●● ● ● ● ●●
● ●
●
● ● ● ● ● ● ●●●● ● ● ● ●● ● ●● ● ●●●
● ●●● ● ● ● ● ●● ● ●●● ● ● ●●●●● ● ● ● ● ● ● ●●● ● ● ● ●●●● ●● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●
Petal.Length
● ● ● ●●● ● ●● ● ●●●●●●●●●● ● ● ●● ● ●
4.5 ●
● ●
●
● ●
● ● ●● ●● ●● ●●● ● ● ●● ● ●● ●● ●●● ●●●●● ● ● ● ● ● ●
●
●
●
● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ●
● ●● ●●
●
0.5 1.0 1.5 2.0 2.5
●
●
● ●● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ●●●●●●● ● ●● ● ● ● ● ● ● ●● ●●●● ● ● ● ● ●●● ● ●●● ●●●● ● ● ●● ● ● ●● ●● ● ●● ●
●● ● ●● ●●● ● ● ● ●
● ● ● ●● ● ● ●● ● ● ●● ● ●●● ●● ●● ● ● ●● ●● ●●●● ●●●● ● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ●
● ● ● ●
●
● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ●● ●● ● ● ●●● ●● ● ●●●● ●●●●●●● ● ● ●● ●● ●●● ●● ●● ● ●● ● ● ● ●● ● ● ●● ●
● ● ● ●● ● ● ● ● ●● ● ● ● ●●
● ● ● ● ● ● ●●● ● ● ● ●●●● ●●●● ● ● ● ●●● ● ●● ● ●● ● ●
●
Sepal.Width
2.5
● ●●●●
7
● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ●●● ● ● ●● ●● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ● ● ●● ●●● ● ●● ●●● ●●●● ●● ●● ● ●● ●●●●● ● ● ●●● ●●●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●
●
2.0
● ● ●
6
2.0 2.5 3.0 3.5 4.0
●
● ●● ●●● ● ● ● ● ● ●● ● ●● ● ●●● ●●● ● ● ● ● ● ● ●●● ● ●●
1.5
5
Sepal.Length
1.0
6.5
0.5 ● ● ●● ● ● ● ●● ● ●●● ● ●● ● ●●● ● ● ●●●●● ● ●●● ● ●● ● ●●●●● ●●● ● ● ● ● ● ● ● ● ● ● ●●●● ●●●● ● ●● ●● ● ●● ● ● ●●● ●●●●●● ●● ● ●● ● ● ●
5.5
4.0
4
3.5
● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ●●●●● ● ● ●●● ● ●● ● ●● ● ● ● ●●●● ●● ● ● ● ● ● ● ●● ●●● ●●● ● ● ●●● ●●●●● ● ● ●● ● ● ● ● ●● ● ●● ●●●● ●●● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●
3
3.0
2
2.5
1
2.0
Petal.Width
● ● ● ●●● ● ●●● ● ● ●●●●●● ● ● ●●
1
2
3
4
5
6
7
Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2-5. Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
22 / 40
Újra korreláció Valós kapcsolat?
● ● ●
● ●
● ●
●
●
90
●
● ●
● ●
● ●
● ● ●
●
● ●
● ●
● ●
●
● ●
●
● ● ● ●
●
● ●
● ● ● ●
● ●
●
● ●
●
●
● ●
80
● ● ● ● ●
●
● ● ● ● ● ●
●
● ●
● ●
● ●
● ● ● ●
● ●
● ●
● ●
●
● ●
●
● ●
●
●
● ●
●
70
● ●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
● ●
● ●
● ● ●
●
●
●
● ●
10
● ●
●
●
● ● ●
60
● ●
● ●
●
● ●
●
●
●
●
●
5
Daróczi Gergely (PPKE BTK)
10
Statisztika Wind (miles per hour)
15
20
2011-04-27
60
23 / 40
●
●●
●
5
Temperature (degrees Fahrenheit)
●
20
●
15
●
Kereszttáblák Discrete (qualitative) variables
ID 1 2 3 4 5 6
gender Female Female Female Female Female Female
color pink pink pink pink pink pink
··· 95 96 97 98 99 100 Daróczi Gergely (PPKE BTK)
Male Male Male Male Male Male Statisztika
yellow yellow yellow yellow yellow yellow 2011-04-27
24 / 40
Crosstables
0.2
0.4
pink
0.0
green
color
0.6
0.8
yellow
1.0
Discrete (qualitative) variables
Female
Daróczi Gergely (PPKE BTK)
Male
Statisztika gender
2011-04-27
25 / 40
Crosstables Discrete (qualitative) variables
Female Male
Daróczi Gergely (PPKE BTK)
green 17 18
Statisztika
pink 30 10
yellow 13 12
2011-04-27
26 / 40
Crosstables Discrete (qualitative) variables
Female Male
Daróczi Gergely (PPKE BTK)
green pink yellow 17 30 13 18 10 12 Marginals
Statisztika
2*Marginals N
2011-04-27
27 / 40
Crosstables Discrete (qualitative) variables
Female Male
∑
Daróczi Gergely (PPKE BTK)
green 17 18 40
pink 30 10 35
Statisztika
yellow 13 12 25
∑ 60 40 100
2011-04-27
28 / 40
Crosstables Percentages
Female Male
∑
green 17 18 40
pink 30 10 35
yellow 13 12 25
∑ 60 40 100
1. táblázat. Counted values
Female Male
∑
green 17 % 18 % 40 %
pink 30 % 10 % 35 %
yellow 13 % 12 % 25 %
∑ 60 % 40 % 100 %
2. táblázat. Total percentages Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
29 / 40
Crosstables Row percentages
Female Male
∑
green 17 18 40
pink 30 10 35
yellow 13 12 25
∑ 60 40 100
3. táblázat. Counted values
Female Male
∑
green 28.3 % 45 % 35 %
pink 50 % 25 % 40 %
yellow 21.7 % 30 % 25 %
∑ 100 % 100 % 100 %
4. táblázat. Row percentages Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
30 / 40
Crosstables Column percentages
Female Male
∑
green 17 18 40
pink 30 10 35
yellow 13 12 25
∑ 60 40 100
5. táblázat. Counted values
Female Male
∑
green 48.63 % 51.4 % 100 %
pink 75 % 25 % 100 %
yellow 52 % 48 % 100 %
∑ 60 % 40 % 100 %
6. táblázat. Column percentages Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
31 / 40
Crosstables Expected values
Female Male
∑
green 17 18 40
pink 30 10 35
yellow 13 12 25
∑ 60 40 100
7. táblázat. Counted values
Female Male
∑
green 21 14 35
pink 24 16 40
yellow 15 10 25
∑ 60 40 100
8. táblázat. Expected values Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
32 / 40
Crosstables Chi-square statistic
n
χ2 =
∑
(Oi − Ei )2
i =1
Ei
where:
χ 2 : Pearson’s cumulative test statistic, Oi : an observed (counted) frequency, Ei : an expected (theoretical) frequency, n: the number of cells in the table. H0 : observed and expected values are all the same Requirements! Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
33 / 40
Crosstables Computed chi-square
Female Male
green
pink
yellow
(17−21)2
(30−24)2
(13−15)2
21
24
15
(18−14)2
(10−16)2
(12−10)2
14
16
10
-
-
-
∑
∑ -
9. táblázat. Computed distances between observed and expected values
n
χ2 =
∑
i =1
(Oi − Ei )2 Ei
= 6.321429
degrees of freedom: (3 − 1)(2 − 1) = 2 Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
34 / 40
Crosstables Computed chi-square
⇒ p = 0.04239545 Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
35 / 40
Simpson’s paradox Berkeley sex bias case admit Deny
Male
gender
Female
Admitted
Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
36 / 40
Simpson’s paradox Berkeley sex bias case
Female Male
∑
Admitted 1494 3738 5232
Deny 2827 4704 7531
∑ 4321 8442 12763
10. táblázat. Observed values
Female Male
∑
Admitted 34.6 % 44.3 % 41 %
Deny 65.4 % 55.7 % 59 %
∑ 100 % 100 % 100 %
11. táblázat. Row percentages
χ 2 = 110.8489; d .f . = 1; p = 6.385628e − 26 Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
37 / 40
Simpson’s paradox Berkeley sex bias case
Men Women
Departement A B C D E F Daróczi Gergely (PPKE BTK)
Applicants 8442 4321
Men Applicants Admitted 825 62% 560 63% 325 37% 417 33% 191 28% 272 6% Statisztika
Admitted 44% 35%
Women Applicants Admitted 108 82% 25 68% 593 34% 375 35% 393 24% 341 7% 2011-04-27
38 / 40
Simpson’s paradox Batting averages in professional baseball
Derek Jeter David Justice
1995 Runs/Outs % 12/48 25 % 104/411 25.3 %
1996 Runs/Outs % 183/582 31.4 % 45/140 32.1 %
Combined Runs/Outs % 195/630 31 % 149/551 27 %
Who is the better player?
Daróczi Gergely (PPKE BTK)
Statisztika
2011-04-27
39 / 40
To be continued. . .
Daróczi Gergely
[email protected]