Statisztika Politológus képzés
Daróczi Gergely Politológia Tanszék
2012. május 8.
Outline 1
Mintaválasztás (ismétlés)
2
A változók közötti kapcsolatról
3
Korreláció Elméleti háttér Gyakorlat A korrelációs együttható korlátairól Gyakorlat
4
Kereszttábla Elméleti háttér Simpson paradoxon
5
Standardizálás és dekompozíció
6
Grafikonok Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
2 / 63
A mintaválasztás Valószínuségi ˝ vs. nem-valószínuségi ˝ mintavétel
No˝ Férfi
∑
Elméleti matematika Környezettudomány Rendezvényszervezo˝
10 40 10
10 10 20
20 50 30
∑
60
40
100
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
3 / 63
Kutatás egy általános iskolában ˝ Okos diákok nagy cipoben (példa)
˝ Egy mini-kutatást végeztünk a diákok cipomérete és matematika ˝ A következo˝ eredményeket kaptuk: felkészültségérol. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Daróczi Gergely (PPKE BTK)
˝ Cipoméret 29.75 29.75 29.75 31.50 31.50 31.50 31.50 33.25 33.25 33.25 35.00 35.00 35.00 35.00 35.00 38.50 40.25 42.00 42.00 42.00 42.00 43.75
Matematika eredmény 26.67 33.33 41.67 35.00 46.67 63.33 70.00 30.00 38.33 56.67 26.67 40.00 43.33 46.67 53.33 55.00 45.00 58.33 76.67 77.50 100.00 70.83
Statisztika
2012-05-08
4 / 63
Kutatás egy általános iskolában ˝ Okos diákok nagy cipoben (példa)
100
●
90
80
Result in math exam
● ●
70
●
●
●
60
● ● ● ●
50 ●
● ● ●
●
40
● ● ● ●
30
● ●
●
30
32
Daróczi Gergely (PPKE BTK)
34
36
Shoe size
Statisztika
38
40
42
2012-05-08
5 / 63
Kutatás egy általános iskolában ˝ Okos diákok nagy cipoben (példa)
˝ Egy mini-kutatást végeztünk a diákok cipomérete, matematika ˝ és életkoráról. A következo˝ eredményeket kaptuk: felkészültségérol 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Daróczi Gergely (PPKE BTK)
˝ Cipoméret 29.75 29.75 29.75 31.50 31.50 31.50 31.50 33.25 33.25 33.25 35.00 35.00 35.00 35.00 35.00 38.50 40.25 42.00 42.00 42.00 42.00 43.75
Matematika eredmény 26.67 33.33 41.67 35.00 46.67 63.33 70.00 30.00 38.33 56.67 26.67 40.00 43.33 46.67 53.33 55.00 45.00 58.33 76.67 77.50 100.00 70.83
Statisztika
Age 3 7 5 8 10 11 12 7 7 12 6 8 6 10 11 9 9 9 16 18 19 14
2012-05-08
6 / 63
Kutatás egy általános iskolában ˝ Okos diákok nagy cipoben (példa)
100
●
90
80
Result in math exam
● ●
70
●
●
●
60
● ● ● ●
50 ●
● ● ●
●
40
● ● ● ●
30
● ●
●
30
32
Daróczi Gergely (PPKE BTK)
34
36
Shoe size
Statisztika
38
40
42
2012-05-08
7 / 63
Kutatás egy általános iskolában ˝ Okos diákok nagy cipoben (példa)
100
●
80 ●
Result in math exam
●
●
●
●
60
● ● ● ●
● ● ● ●
40
● ● ● ● ● ●
●
20
5
Daróczi Gergely (PPKE BTK)
10
Age
Statisztika
15
2012-05-08
7 / 63
Kutatás egy általános iskolában ˝ Okos diákok nagy cipoben (példa)
45 ●
●
●
●
●
●
40
Shoe size
●
35
●
●
●
●
●
●
●
●
30
●
●
5
Daróczi Gergely (PPKE BTK)
●
●
●
10
Age
Statisztika
15
2012-05-08
7 / 63
Kutatás egy általános iskolában ˝ Okos diákok nagy cipoben (példa)
100
100
●
●
45 ●
90 ●
80 80
● ● ●
●
●
60
● ● ● ●
50 ●
● ●
60
● ● ● ●
35
●
●
● ●
●
● ● ●
● ●
●
●
●
●
40
●
● ●
●
40
●
Shoe size
70
Result in math exam
Result in math exam
● ●
●
●
● ●
● ●
●
●
●
40
● ●
● ●
●
30
●
●
●
●
●
30
●
20
●
30
●
32
34
36
38
Shoe size
40
Daróczi Gergely (PPKE BTK)
42
5
10
Age
Statisztika
15
5
10
Age
15
2012-05-08
7 / 63
Kutatás egy általános iskolában ˝ Okos diákok nagy cipoben (példa)
●
43.75 ●
● ●●
●
●
●
●
●
●
● ●
●
size ●
● ●
●
●
●
●
●
● ●
●
29.75
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
● ●
100
● ●
math ●
●
● ● ● ●
26.67
●
●
● ● ●
●
● ● ●
● ●
●
19
age
3
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
7 / 63
Kutatás egy általános iskolában ˝ Okos diákok nagy cipoben (példa)
100
0.65**
*** 0.67
100
30
size
42
80
38
60
34
40
80
●
●
math
●
●
● ● ●
●
●
●
●
●
●
● ●
● ●
●
● ●
● ● ●
●
●
● ● ●
● ●
●
●
●
●
●
● ● ●
● ●
● ●
●
● ●
●
5
●
30
age
● ●
● ●
● ●
15
●
●
●
●
10
60
●
●
40
*** 0.93
● ● ●
●
32
34
36
38
Daróczi Gergely (PPKE BTK)
40
42
44
5
Statisztika
10
15
2012-05-08
7 / 63
Kutatás egy általános iskolában ˝ Okos diákok nagy cipoben (példa)
Parciális korreláció: rmatek ,cipo·kor = 0.11
rmatek ,kor ·cipo = 0.87 rcipo,kor ·matek = 0.22
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
7 / 63
Elméleti háttér Kovariancia
x és y változók együttes szórása: n
COV (xy ) =
(xi − x )(yi − y ) n−1 i =1
∑
s emlkeztet : σ =
n
∑
(x i − x )2 n
i =1
120
Ezekiel, M. (1930) Methods of Correlation Analysis. Wiley.
100
●
●
80
● ●
●
60
● ●
● ● ●
● ●
● ●
20
● ●
● ●
0
●
●
40
Stopping distance (ft)
● ● ● ●
●
● ● ●
●
● ●
●
●
●
● ●
● ● ●
●
● ● ●
● ●
●
● ●
●
5
Daróczi Gergely (PPKE BTK)
10
15
Statisztika
20
25
2012-05-08
8 / 63
Elméleti háttér Kovariancia
120
Ezekiel, M. (1930): Methods of Correlation Analysis
Henderson & Velleman (1981): Building multiple regression models interactively ●
●
●
100
5
●
●
●
●
60
●
●
● ●
● ● ●
● ● ●
●
●
●
● ● ●
●
●
●
4
Weight (lb/1000)
●
● ● ●
● ●
● ●
●
●
●
●
●
●
● ● ●
● ●
●
● ● ● ● ● ● ● ● ● ● ●
●
0
●
2
20
● ●
●
●
● ●
●
●
3
80
●
40
Stopping distance (ft)
● ● ●
●
●
●
●
●
5
● ●
●
10
15
20
25
3.0
Speed (mph)
Daróczi Gergely (PPKE BTK)
3.5
4.0
4.5
5.0
Rear axle ratio
Statisztika
2012-05-08
9 / 63
Elméleti háttér Kovariancia
n
n
∑ (xi − x¯)(yi − y¯)
∑ (xi − x¯)(yi − y¯) rxy =
i =1
Daróczi Gergely (PPKE BTK)
(n − 1)sx sy
=r
i =1 n
n
i =1
i =1
∑ (xi − x¯)2 ∑ (yi − y¯)2
Statisztika
2012-05-08
10 / 63
Elméleti háttér Kovariancia
ˆrXY ·Z = q
N N N ∑N i =1 rX ,i rY ,i − ∑i =1 rX ,i ∑i =1 rY ,i
N 2 N ∑N i =1 rX ,i − ∑i =1 rX ,i
2 q
N 2 N ∑N i =1 rY ,i − ∑i =1 rY ,i
2
három változó esetén:
r −r r ˆrXY ·Z = p XY X Z Y Z (1 − rX2 Z )(1 − rY2 Z )
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
11 / 63
Gyakorlat 1
Mit takar a korreláció és parciális korreláció kifejezés?
2
Határozza meg a korrelációs együtthatót az alábbi változó-párok esetében!
3
Mennyiben különbözik a parciális korreláció értéke? Érdemjegy (átlag) 3.05 3.2 3.35 3.35 3.45 3.55 3.7 45 3.8 3.8 Daróczi Gergely (PPKE BTK)
Ösztöndíj (HUF) 22000 25000 27000 24000 25000 28000 28000 30000 27000 29000 Statisztika
Kiadás könyvekre (HUF) 3500 3000 2800 3700 2200 3200 3700 4100 4000 3800 2012-05-08
12 / 63
Gyakorlat Megoldás
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
13 / 63
A korrelációs együttható korlátairól
Korreláció és linearitás Korreláció és kauzalitás Lazarsfeld paradigma
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
14 / 63
A korrelációs együttható korlátairól Correlation does not imply causation!
Forrás: http://xkcd.com/552
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
15 / 63
A korrelációs együttható korlátairól Correlation does not imply causation! - Elméleti háttér
Arisztotelész: logika, szillogizmus – if (A → B )&(B → C ) ⇒ A → C David Hume: szkepticizmus „only correlation can actually be perceived [not causality]” l. holnap vajon felkel a nap? l. „If I see a billiard ball moving towards another, on a smooth table, I can easily conceive to stop upon contact.” Popper: falszifikáció Pearl, J. - Causality: Models, Reasoning, and Inference, Cambridge University Press, 2000
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
16 / 63
A korrelációs együttható korlátairól Lazarsfeld paradigma
Stouffer: The American Soldier
Soldiers in branches with higher promotion rates are happier than soldiers in branches with lower rates of promotion.
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
17 / 63
A korrelációs együttható korlátairól Lazarsfeld paradigma
Stouffer: The American Soldier H0 : Soldiers in branches with higher promotion rates are happier than soldiers in branches with lower rates of promotion. Ámde: „Soldiers in branches with higher promotion rates were more pessimistic about their own chances of being promoted than soldiers in branches with lower rates of promotion.”
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
18 / 63
A korrelációs együttható korlátairól Lazarsfeld paradigma
Stouffer: The American Soldier H0 : Soldiers in branches with higher promotion rates are happier than soldiers in branches with lower rates of promotion. Ámde: „Soldiers in branches with higher promotion rates were more pessimistic about their own chances of being promoted than soldiers in branches with lower rates of promotion.” Kulcsszavak: referencia csoport, relatív depriváció Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
18 / 63
A korrelációs együttható korlátairól Linearitás
Forrás: Anscombe, F. J. (1973) Graphs in statistical analysis. American Statistician, 27, 17–21. Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
19 / 63
Gyakorlat The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). mpg: Miles/(US) gallon cyl: Number of cylinders disp: Displacement (cu.in.) hp: Gross horsepower drat: Rear axle ratio wt: Weight (lb/1000) qsec: 1/4 mile time vs: V/S am: Transmission (0 = automatic, 1 = manual) gear: Number of forward gears carb: Number of carburetors Source: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391-411.
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
20 / 63
Gyakorlat 50
*** 0.85 *** 0.85
200
2 3 4 5
0.0 0.4 0.8
*** 0.68
*** 0.87
0.42
*** 0.83 *** 0.90
*** 0.70
*** 0.78
0.59
*** 0.79
*** 0.71
*** 0.89
0.43
0.45
**
*** 0.66
drat
*** 0.71
3.0
*
*** 0.66
0.60
***
*** 0.81
0.52
*
*** 0.71
4.0
5.0
***
0.48
**
0.55
**
0.49
0.59
***
0.56
0.24
0.13
**
**
0.53
***
0.39
● ●● ● ● ● ● ●●● ●
cyl
● ●● ●● ●
6
● ● ● ●
4
● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●●● ● ● ●● ●●
● ●● ●●● ● ● ●● ● ● ●
● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ●●● ● ● ●
●● ● ● ●● ● ● ●●●●●● ●● ● ●●● ● ● ● ●● ● ● ●
●
●
●
● ●● ● ● ● ● ●● ●●●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●●
● ● ● ●
● ● ● ● ● ● ●
● ●● ●●● ● ●●●
●
●
●
●
●
●●●
●
●
●
● ●●● ●●●
● ●● ● ● ●●● ● ● ● ● ●●●
●
●
●
●● ●
●
●
●
●
●
●
●
● ● ● ● ● ●
hp ●
● ●
●●● ●● ● ● ● ● ● ● ●●● ●● ● ● ●● ●● ● ●
0.8
2
● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ●●● ● ● ● ● ● ●
● ● ●
0.0
●
●
●●● ● ● ● ●● ●● ● ●● ● ● ●● ● ●● ●● ●
*** 0.71
*** 0.72
● ● ●
●
● ● ●●● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ●
● ● ● ●● ●● ●● ●●● ●●●● ●● ● ●● ● ●● ● ● ● ●
wt
● ●●● ●●● ●● ●●
●● ●● ● ● ● ●●
●
● ●
●●●●● ●●
*** 0.70
***
*** 0.69
0.58
0.55
0.17
●
● ● ●● ● ●● ●● ● ●●●● ●● ●● ● ● ● ● ●●● ● ●
*** 0.71
●
● ● ●● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ●●
●
● ●● ●● ●●● ● ●●
●
● ●● ●● ● ●●● ●●●
*** 0.74
qsec
● ● ●
● ●● ●●●● ● ●
●
●● ●● ●● ●● ●●●
●●●● ● ● ●● ● ●●
●● ●● ●●●●●● ●● ●●● ●●
●
●
●
●●●●● ● ●●●● ● ●
●
●●●● ●● ●● ●
***
● ●● ● ●● ● ●● ● ●●
● ● ●
0.23
0.21
4.5
●●● ●●● ● ●●● ● ●● ● ● ●●●● ● ●● ●
●●
● ●●●●● ●● ● ●
●
●
20
30
● ●
● ● ● ●
●●
●
●
● ● ●●●● ● ● ●● ●
100
●●●●● ● ●
●●●●● ● ●
300
● ● ●● ● ●● ● ●● ●●
● ● ● ●● ● ● ●●●●
●● ●●●●● ● ●● ●● ●● ●● ●● ● ● ●
● ● ●
● ● ●●● ●
● ●
● ●● ● ●● ● ●●
●● ●● ●●●● ● ● ● ●
●
● ●●●● ● ●● ●●
●● ● ●
3.0
4.0
5.0
●●●●●● ● ●● ● ● ● ● ●● ●● ● ●● ●● ●●
●
●
●● ● ●● ● ● ● ● ●● ● ●
●
●● ● ● ● ●● ● ●
●● ● ●● ● ● ● ●●●●● ● ● ●● ● ●
●● ●● ●●● ● ●●
●
● ● ● ●● ● ● ●● ● ●
●●
● ●● ●●●
●● ●● ●●
● ● ● ●● ●●●●●
●
● ● ●
● ●● ●● ●● ●●●●●
●● ● ●●● ●●●●●● ● ● ●● ●
●
● ●● ●●●●● ●●● ● ● ● ● ●●●● ● ● ●● ● ●●
10
●●● ●● ●●●●●● ● ● ●● ● ●●●
100 400
*
0.17
0.21
●
●
●
●
●
●
***
0.57
●
●
●
●
●
●
*** 0.79
16
20
●
● ● ●
gear
●
●
●
● ●●●●●● ●●● ● ●● ● ● ●● ● ● ●●●● ● ●
● ● ● ●
0.0 0.4 0.8
● ● ●
0.27
●
● ● ● ●
0.058
●
● ●
● ● ●
*** 0.66
● ● ●●●● ● ● ●● ● ● ● ●
am ● ●
0.43
●
vs ● ●● ● ● ● ● ●●● ● ●●
0.09
●
●
●● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ●●
*** 0.75
*
0.44
0.091
●● ●
*
3.0 4.5
● ● ●
disp
22
● ● ● ● ● ● ●
● ● ● ● ● ● ●
16
● ● ● ● ●● ● ●● ●● ●● ●● ● ● ● ●●●● ●● ● ●●
● ● ● ●
● ● ● ●
0.8
●
● ● ● ●
● ● ● ● ● ● ●
0.0
4 250 50
● ●●● ● ● ●●●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●●● ●
3.0
**
●●●●● ●●● ● ● ● ●● ●●● ● ●●● ● ● ●● ●● ●●● ●● ● ● ●●●
● ● ● ● ●
●
●
● ●
●
carb 1
3
5
1 4 7
8
10
*** 0.78
25
4 5 6 7 8
mpg
7
Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391-411. Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
21 / 63
Gyakorlat Edgar Anderson’s Iris Data
● ● ● ● ●●
● ● ● ● ● ● ● ●● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●● ● ●● ●●● ●●● ●● ● ●●●● ●●●●●●● ● ● ● ●● ●●● ● ●● ● ●● ● ●● ● ● ●● ● ●● ●
Sepal.Width
5.0
5.5
0.5 1.0 1.5 2.0 2.5
● ● ● ●● ● ● ● ●●● ●●●● ● ● ● ● ● ●●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●● ●● ● ● ● ●●● ● ●●●● ● ●● ● ●
●
● ● ●●
●
● ● ● ● ● ●●● ●●● ●● ● ●● ● ●●● ● ●● ●●● ● ●●●● ● ● ● ● ●
4.5
●
●
● ● ●
● ● ●● ● ●
● ● ● ● ● ● ● ● ● ● ●● ●●● ●
●
● ● ● ● ● ● ● ● ●● ●●●● ● ● ● ●●● ●●●● ●●●● ●● ● ●
●
● ● ●● ● ● ● ● ●● ●● ●● ●●
●
6.0
6.5
7.0
7.5
8.0
●
7.5 4.5
●
● ● ●●● ●● ●●● ● ●●● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ● ● ●●●● ● ● ● ●● ● ●● ● ●●●
● ● ●● ● ● ● ● ● ●● ●●●● ●●● ● ● ●
●
● ●●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ●●● ● ● ● ●●●● ●● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ●● ●●● ●● ● ●● ● ● ● ●●● ●●● ●● ● ● ● ● ●
Petal.Length ● ● ●●● ●●● ● ●● ● ●● ● ●● ●● ●● ●
●
●
● ●● ● ● ●●●● ● ● ● ● ●● ● ●●●● ● ● ●●●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ●●● ● ● ●●● ● ● ●●●●●●● ●● ● ● ● ● ●● ● ● ● ●●
● ●
● ● ●●● ● ●● ● ●●●●●●●●●● ● ● ●● ● ● ●
● ● ● ● ●●● ● ● ●● ●● ● ●●● ● ● ● ●● ●●● ●
● ●
● ● ●● ●● ●● ●●● ● ●● ● ●● ●● ●●● ●●●● ● ●● ●● ● ● ●
●
●
●
●●● ● ● ● ●● ●●●● ● ● ●● ● ● ●● ● ●● ● ● ● ●●
● ● ●● ● ● ●● ● ● ●● ● ●●● ●● ●● ● ● ● ●● ●●●● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ●● ● ● ●●●● ● ● ● ● ● ● ●
● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ●● ●●● ●●●●● ● ●● ● ● ● ●● ●●● ● ● ●● ●●●●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●
●
●
● ●● ●●
● ● ●● ● ● ●● ● ●● ● ● ●●● ●●●● ●●●●●●● ● ● ●● ●
● ● ● ● ● ● ●●● ● ● ● ●●●● ●●●● ● ● ● ●●● ● ●● ● ●● ● ●
●
● ●● ● ●
● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ●●●●●●● ● ●● ● ● ● ● ● ● ● ●● ●●●● ● ● ● ● ●●● ● ●●● ●●●● ● ● ●● ● ● ●● ●● ● ●● ●
● ●● ● ●●● ●● ●● ●●● ● ●●● ● ●● ●● ● ● ●● ● ● ●●● ● ●●
2.5
●● ●
7
● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ●●● ● ● ●● ●● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ● ● ●● ●●● ● ●● ●●● ●●●● ●● ●● ● ●● ●●●●● ● ● ●●● ●●●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●
●
2.0 ● ●●●●
●
●
5
2.0 2.5 3.0 3.5 4.0
●
1.5
4
Sepal.Length
1.0
6.5
0.5 ● ● ●● ● ● ● ●● ● ●●● ● ●● ● ●●● ● ● ●●●●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●●● ●●● ●● ● ● ● ● ●●●● ● ●● ● ●● ●● ● ● ●● ● ● ●● ●●●●●● ●● ● ●● ● ● ●
5.5
4.0
6
3.5
● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●●● ●● ● ● ● ● ● ●●●● ● ●● ●● ●● ● ●●● ●● ● ● ●● ● ● ● ● ●●● ●●● ● ●● ● ●●● ●●●●●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ●●●● ●●● ●●●● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ●
3
3.0
2
2.5
1
2.0
Petal.Width
● ● ● ●●● ● ●●● ● ● ●●●●●● ● ● ●●
1
2
3
4
5
6
7
Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2-5. Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
22 / 63
Gyakorlat #2 Edgar Anderson’s Iris Data
● ● ● ● ●●
● ● ● ● ● ● ● ●● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●● ● ●● ●●● ●●● ●● ● ●●●● ●●●●●●● ● ● ● ●● ●●● ● ●● ● ●● ● ●● ● ● ●● ● ●● ●
Sepal.Width
5.0
5.5
0.5 1.0 1.5 2.0 2.5
● ● ● ●● ● ● ● ●●● ●●●● ● ● ● ● ● ●●● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●● ●● ● ● ● ●●● ● ●●●● ● ●● ● ●
●
● ● ●●
●
● ● ● ● ● ●●● ●●● ●● ● ●● ● ●●● ● ●● ●●● ● ●●●● ● ● ● ● ●
4.5
●
●
● ● ●
● ● ●● ● ●
● ● ● ● ● ● ● ● ● ● ●● ●●● ●
●
● ● ● ● ● ● ● ● ●● ●●●● ● ● ● ●●● ●●●● ●●●● ●● ● ●
●
● ● ●● ● ● ● ● ●● ●● ●● ●●
●
6.0
6.5
7.0
7.5
8.0
●
7.5 4.5
●
● ● ●●● ●● ●●● ● ●●● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ● ● ●●●● ● ● ● ●● ● ●● ● ●●●
● ● ●● ● ● ● ● ● ●● ●●●● ●●● ● ● ●
●
● ●●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ●●● ● ● ● ●●●● ●● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ●● ●●● ●● ● ●● ● ● ● ●●● ●●● ●● ● ● ● ● ●
Petal.Length ● ● ●●● ●●● ● ●● ● ●● ● ●● ●● ●● ●
●
●
● ●● ● ● ●●●● ● ● ● ● ●● ● ●●●● ● ● ●●●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ●●● ● ● ●●● ● ● ●●●●●●● ●● ● ● ● ● ●● ● ● ● ●●
● ●
● ● ●●● ● ●● ● ●●●●●●●●●● ● ● ●● ● ● ●
● ● ● ● ●●● ● ● ●● ●● ● ●●● ● ● ● ●● ●●● ●
● ●
● ● ●● ●● ●● ●●● ● ●● ● ●● ●● ●●● ●●●● ● ●● ●● ● ● ●
●
●
●
●●● ● ● ● ●● ●●●● ● ● ●● ● ● ●● ● ●● ● ● ● ●●
● ● ●● ● ● ●● ● ● ●● ● ●●● ●● ●● ● ● ● ●● ●●●● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ●● ● ● ●●●● ● ● ● ● ● ● ●
● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ●● ●●● ●●●●● ● ●● ● ● ● ●● ●●● ● ● ●● ●●●●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●
●
●
● ●● ●●
● ● ●● ● ● ●● ● ●● ● ● ●●● ●●●● ●●●●●●● ● ● ●● ●
● ● ● ● ● ● ●●● ● ● ● ●●●● ●●●● ● ● ● ●●● ● ●● ● ●● ● ●
●
● ●● ● ●
● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ●●●●●●● ● ●● ● ● ● ● ● ● ● ●● ●●●● ● ● ● ● ●●● ● ●●● ●●●● ● ● ●● ● ● ●● ●● ● ●● ●
● ●● ● ●●● ●● ●● ●●● ● ●●● ● ●● ●● ● ● ●● ● ● ●●● ● ●●
2.5
●● ●
7
● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ●●● ● ● ●● ●● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ● ● ●● ●●● ● ●● ●●● ●●●● ●● ●● ● ●● ●●●●● ● ● ●●● ●●●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●
●
2.0 ● ●●●●
●
●
5
2.0 2.5 3.0 3.5 4.0
●
1.5
4
Sepal.Length
1.0
6.5
0.5 ● ● ●● ● ● ● ●● ● ●●● ● ●● ● ●●● ● ● ●●●●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●●● ●●● ●● ● ● ● ● ●●●● ● ●● ● ●● ●● ● ● ●● ● ● ●● ●●●●●● ●● ● ●● ● ● ●
5.5
4.0
6
3.5
● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●●● ●● ● ● ● ● ● ●●●● ● ●● ●● ●● ● ●●● ●● ● ● ●● ● ● ● ● ●●● ●●● ● ●● ● ●●● ●●●●●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ●●●● ●●● ●●●● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ●
3
3.0
2
2.5
1
2.0
Petal.Width
● ● ● ●●● ● ●●● ● ● ●●●●●● ● ● ●●
1
2
3
4
5
6
7
Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2-5. Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
23 / 63
Gyakorlat #3 Valós asszociáció?
● ● ●
● ●
●
● ●
●
●
●
90
●
● ●
● ●
● ●
● ● ●
● ●
80
● ● ● ● ●
● ●
● ●
● ●
● ●
●
● ● ● ●
●
● ●
● ● ● ●
● ●
●
● ●
●
●
● ●
●
● ● ● ● ● ●
●
● ●
● ●
● ● ● ●
● ●
● ●
● ●
● ●
● ●
●
●
●
● ● ●
●
●
●
●
● ●
● ● ●
● ●
●
● ●
●
70
Temperature (degrees Fahrenheit)
●
● ●
● ●
●
● ●
● ●
● ●
● ● ●
●
● ●
● ●
60
● ●
● ●
●
●
● ●
●
●
●
●
●
5
10
15
20
Wind (miles per hour)
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
24 / 63
Gyakorlat #3 Valós asszociáció?
20
***
0.46
20
60
Temp
90
15
80
10
70
5
●
● ●
15
●
●
●
● ●
●
●●
● ●
10
●
●
●
●
● ●
●●
●
●
●●
●● ●
5
●
60
● ●●
●● ● ●
● ●● ●●
●
● ●
● ●● ● ●●● ● ● ● ●● ● ● ●●● ●● ● ●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ●● ●●●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ● ●● ●
● ●
70
Daróczi Gergely (PPKE BTK)
●
80
●
Wind
●
90
Statisztika
2012-05-08
24 / 63
Gyakorlat #3 Valós asszociáció?
● ●
●
80 Temp
● ● ●
●●
● ●
● ●
● ● ● ● ● ●
● ● ●
●
● ●
●
● ●
5
70
● ●
● ●
60
●
● ●
● ●● ● ● ● ●
● ● ●
●●
● ● ●●
● ● ●●
● ● ●
●
●
●
● ●
● ● ●
● ● ●● ●● ●
●
●
● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●
● ● ● ● ●●
●
May
●
●
● ● ●
●
●
●
●
●● ●●
● ●
●
15
●
● ●● ●● ●
Wind
● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●
●
● ● ●
● ●
● ●
●
●● ● ● ● ●
10
90
● ●
●
20
● ● ● ●
●● ● ●
● ● ●● ● ● ●● ● ●●
● ● ●● ● ● ● ●● ● ● ●
● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
Jun
Jul
Aug
Sep
Oct
May
date
Daróczi Gergely (PPKE BTK)
Jun
Jul
Aug
Sep
Oct
date
Statisztika
2012-05-08
24 / 63
Gyakorlat #3 Valós asszociáció?
70
80
90 15250
60
● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●
Temp
0.46
●
●● ●
● ● ● ●
●
● ●● ●● ●
●
● ● ●●
● ●
● ●
● ● ● ● ●
● ●
● ●● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
15100
15150
Daróczi Gergely (PPKE BTK)
15200
●
● ●
●
●
● ●
●
●●
● ●
●
●
●
●
●●
● ●
●●
●● ● ●
● ●●
● ● ●● ● ●
● ●
●
● ● ●
●● ● ●
● ●● ●●
●
● ●
● ● ●●● ●● ● ●● ● ●●●● ● ●● ● ● ● ●● ● ●● ● ●● ●● ● ● ●● ●●●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●
● ●●
●
● ●●●
15
● ● ●
●
●
Wind
●● ●
15250
●
●
5
Statisztika
10
●●
● ●
●
●
●
20
●
●
*** 5
60
70
80
90
15100
0.17
15200
*
0.39
15150
***
date
10
15
20
2012-05-08
24 / 63
Kereszttábla Alacsony mérési szintu˝ (kvalitatív) változók
ID 1 2 3 4 5 6
nem Female Female Female Female Female Female
kedvenc szín pink pink pink pink pink pink
··· 95 96 97 98 99 100 Daróczi Gergely (PPKE BTK)
Male Male Male Male Male Male
yellow yellow yellow yellow yellow yellow Statisztika
2012-05-08
25 / 63
Kereszttábla
0.0
green
0.2
0.4
pink
color
0.6
0.8
yellow
1.0
Alacsony mérési szintu˝ (kvalitatív) változók
Female
Male gender
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
26 / 63
Kereszttábla Alacsony mérési szintu˝ (kvalitatív) változók color pink
yellow
Male
gender
Female
green
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
26 / 63
Kereszttábla Alacsony mérési szintu˝ (kvalitatív) változók
˝ nok férfiak
Daróczi Gergely (PPKE BTK)
zöld 17 18
piros 30 10
Statisztika
sárga 13 12
2012-05-08
27 / 63
Kereszttábla Alacsony mérési szintu˝ (kvalitatív) változók
˝ nok férfiak
Daróczi Gergely (PPKE BTK)
zöld 17 18
piros sárga 30 13 10 12 Marginals
Statisztika
Marginals N
2012-05-08
28 / 63
Kereszttábla Alacsony mérési szintu˝ (kvalitatív) változók
˝ nok férfiak
∑
Daróczi Gergely (PPKE BTK)
zöld 17 18 35
piros 30 10 40
Statisztika
sárga 13 12 25
∑ 60 40 100
2012-05-08
29 / 63
Kereszttábla Százalékok
˝ nok férfiak
∑
zöld 17 18 35
piros 30 10 40
sárga 13 12 25
∑ 60 40 100
1. táblázat. Tapasztalt értékek
˝ nok férfiak
∑
zöld 17 % 18 % 35 %
piros 30 % 10 % 40 %
sárga 13 % 12 % 25 %
∑ 60 % 40 % 100 %
2. táblázat. Teljes százalék
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
30 / 63
Kereszttábla Sorszázalék
˝ nok férfiak
∑
zöld 17 18 35
piros 30 10 40
sárga 13 12 25
∑ 60 40 100
3. táblázat. Tapasztalt értékek
˝ nok férfiak
∑
zöld 28.3 % 45 % 35 %
piros 50 % 25 % 40 %
sárga 21.7 % 30 % 25 %
∑ 100 % 100 % 100 %
4. táblázat. Sorszázalék
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
31 / 63
Kereszttábla Oszlopszázalék
˝ nok férfiak
∑
zöld 17 18 35
piros 30 10 40
sárga 13 12 25
∑ 60 40 100
5. táblázat. Tapasztalt értékek
˝ nok férfiak
∑
zöld 48.63 % 51.4 % 100 %
piros 75 % 25 % 100 %
sárga 52 % 48 % 100 %
∑ 60 % 40 % 100 %
6. táblázat. Oszlopszázalék
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
32 / 63
Kereszttábla Várható érték
˝ nok férfiak
∑
zöld 17 18 35
piros 30 10 40
sárga 13 12 25
∑ 60 40 100
7. táblázat. Tapasztalt érték
˝ nok férfiak
∑
zöld 21 14 35
piros 24 16 40
sárga 15 10 25
∑ 60 40 100
8. táblázat. Várható érték
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
33 / 63
Kereszttábla Khí-négyzet statisztika
n
χ2 = ∑
(Oi − Ei )2
i =1
Ei
where:
χ 2 : Pearson-féle teszt statisztika, Oi : tapasztalt érték, Ei : várható éréték, n: cellák száma. H0 : a tapasztalt és a várható érték megegyezik Követelmények? Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
34 / 63
Kereszttábla Khí-négyzet
˝ nok férfiak
∑
zöld
piros
sárga
(17−21)2
(30−24)2
(13−15)2
21
24
15
(18−14)2
(10−16)2
(12−10)2
14
16
10
-
-
-
∑ -
9. táblázat. Számított távolság a várt és tapasztalt értékek között
n
χ2 = ∑
i =1
(Oi − Ei )2 Ei
= 6.321429
szabadságfok: (3 − 1)(2 − 1) = 2
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
35 / 63
Kereszttábla Khí-négyzet
⇒ p = 0.04239545 Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
36 / 63
Simpson paradoxon A Berkeley egyetem esete (Bickel et al.) admit Deny
Male
gender
Female
Admitted
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
37 / 63
Simpson paradoxon A Berkeley egyetem esete (Bickel et al.)
˝ nok férfiak
∑
Felvett 1494 3738 5232
Elutasított 2827 4704 7531
∑ 4321 8442 12763
10. táblázat. Tapasztalt értékek
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
38 / 63
Simpson paradoxon A Berkeley egyetem esete (Bickel et al.)
˝ nok férfiak
∑
Felvett 1494 3738 5232
Elutasított 2827 4704 7531
∑ 4321 8442 12763
10. táblázat. Tapasztalt értékek
˝ nok férfiak
∑
felvett 34.6 % 44.3 % 41 %
elutasított 65.4 % 55.7 % 59 %
∑ 100 % 100 % 100 %
11. táblázat. Sorszázalék
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
38 / 63
Simpson paradoxon A Berkeley egyetem esete (Bickel et al.)
˝ nok férfiak
∑
Felvett 1494 3738 5232
Elutasított 2827 4704 7531
∑ 4321 8442 12763
10. táblázat. Tapasztalt értékek
˝ nok férfiak
∑
felvett 34.6 % 44.3 % 41 %
elutasított 65.4 % 55.7 % 59 %
∑ 100 % 100 % 100 %
11. táblázat. Sorszázalék
χ 2 = 110.8489; d .f . = 1; p = 6.385628e − 26 Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
38 / 63
Simpson paradoxon A Berkeley egyetem esete (Bickel et al.)
férfiak ˝ nok
szak A B C D E F
Daróczi Gergely (PPKE BTK)
˝ Jelentkezok 8442 4321
férfiak ˝ jelentkezok 825 560 325 417 191 272
felvett 62% 63% 37% 33% 28% 6%
Statisztika
Felvettek száma 44% 35%
˝ nok ˝ jelentkezok felvett 108 82% 25 68% 593 34% 375 35% 393 24% 341 7%
2012-05-08
39 / 63
Simpson paradoxon Baseball ütések
Derek Jeter David Justice
1995 Runs/Outs % 12/48 25 % 104/411 25.3 %
1996 Runs/Outs % 183/582 31.4 % 45/140 32.1 %
Combined Runs/Outs % 195/630 31 % 149/551 27 %
Melyikük a jobb játékos?
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
40 / 63
Standardizálás és dekompozíció Egy egyszeru˝ példa
Henderson & Velleman (1981): Building multiple regression models interactively ●
●
2.5
●
● ●
●
1.5
Weight (t)
2.0
● ●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
● ● ●
1.0
●
●
● ● ● ●
50
100
150
200
250
300
Horsepower
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
41 / 63
Standardizálás és dekompozíció Egy egyszeru˝ példa
Henderson & Velleman (1981): Building multiple regression models interactively ●
1
●
● ●
●
0
Standardized weight (t)
2
●
●
●
● ●
●
●
● ●
●
●
●
●
●
● ●
●
● ●
−1
● ●
●
● ● ● ●
−1
0
1
2
Standardized horsepower
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
41 / 63
Standardizálás és dekompozíció Elméleti háttér
Egy standardizált változó (z-values, z-scores, normal scores, standardized variables) azt mutatja, hogy hány szórásnyira esik az adott érték az átlagtól: z=
x −µ
σ Diamonds
15000 5000
Frequency
15000
0
0
5000
Frequency
25000
Diamonds
0
5000
10000
15000
20000
Price (USD)
Daróczi Gergely (PPKE BTK)
−1
0
1
2
3
4
Price (standardized)
Statisztika
2012-05-08
42 / 63
Standardizálás és dekompozíció Dekompozíció
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
43 / 63
Grafikonok Csoportosított oszlopdiagram
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
44 / 63
Grafikonok Rétegzett oszlopdiagram
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
45 / 63
Grafikonok Vonaldiagram
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
46 / 63
Grafikonok Kördiagram
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
47 / 63
Grafikonok Terület
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
48 / 63
Grafikonok Összetett diagram
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
49 / 63
Grafikonok Összetett diagram
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
50 / 63
Grafikonok Poláris diagram
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
51 / 63
Grafikonok Heatmap
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
52 / 63
Grafikonok Heatmap (naptár)
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
53 / 63
Grafikonok Waterfall
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
54 / 63
Grafikonok Dot plot
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
55 / 63
Grafikonok Dot plot
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
56 / 63
Grafikonok Boxplot
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
57 / 63
Grafikonok Violin plot
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
58 / 63
Grafikonok Mosaic chart
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
59 / 63
Grafikonok Szófelho˝
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
60 / 63
Grafikonok „Crayola Color Chart, 1903-2010”
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
61 / 63
Grafikonok Érdekes honlapok
http://www.visual-literacy.org/periodic_table/periodic_table.html http://www.edwardtufte.com/tufte/ http://www.perceptualedge.com/ http://www.visualcomplexity.com/vc/ http://flowingdata.com/ http://infosthetics.com/ http://chartsgraphs.wordpress.com/ http://www.informationisbeautiful.net/ http://chartporn.org/
Daróczi Gergely (PPKE BTK)
Statisztika
2012-05-08
62 / 63
Köszönöm a figyelmet!
Daróczi Gergely
[email protected]