• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Correlation & Linear Regression in SPSS 4th seminar
Petra Petrovics
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Types of dependence • association – between two nominal data
• mixed – between a nominal and a ratio data • correlation – among ratio data
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Correlation describes the strength of a relationship, the degree to which one variable is linearly related to another
Regression shows us how to determine the nature of a relationship between two or more variables
• X (or X1, X2, … , Xp): known variable(s) / independent variable(s) / predictor(s) • Y: unknown variable / dependent variable • causal relationship: X „causes” Y to change
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Correlation Measures 1. Covariance 2. Coefficient of correlation
3. Coefficient of determination 4. Coefficient of rank correlation
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
1. Covariance • A measure of the joint variation of the two variables; • An average value of the product of the deviations of observations on 2 random variables from their sample means.
x x y y Cx, y n 1
– – – –
ranges from - to +; C = 0, when X and Y are uncorrelated; its sign shows the direction of correlation it doesn’t measure the degree of relationship!!!
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
2. Coefficient of correlation Σd x d y C r = s xs y d 2xd 2y
• • • • •
Pearson correlation A measure of how closely related two data series are. Its sign shows the direction of correlation It measures the strength of correlation 0 < r < 1 statistical dependence r = 0 X and Y are uncorrelated r = -1 negative ☻ r = 1 positive ☺ • You can use only in case of linear relationship!
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
3. Coefficient of determination • r2 • The square of the sample correlation coefficient between the outcomes and their predicted values. • Measures the degree of correlation in percentage (%) • It provides a measure of how well future outcomes are likely to be predicted by the model. • Vary from 0 to 1. 2
r
S yˆ Sy
=1-
Se Sy
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Exercise 1 - Correlation File / Open / Employee data.sav Is there any relation between - current salary & - beginning salary? CORRELATION
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Analyze / Correlate / Bivariate… 0 I r I 0,3 weak dependence 0,3 I r I 0,7 medium-strong dependence
0,7 I r I 1 strong dependence
r
Shows direction and strength
C Just direction! +
-
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Output Current Salary Beginning Salary
Mean Std. Deviation $34,419.57 $17,075.661 $17,016.09 $7,870.638
N 474 474 Beginning Salary
Current Salary Current Salary
Beginning Salary
Pearson Correlation
Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N
1
,880(**) ,000 55948605047,73 118284577,27 474 1
137916495436,340 291578214,45 474 ,880(**) ,000 55948605047,73 29300904965,45 118284577,27 474
61946944,96 474
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Exercise 2 – Multiple Correlation Is there any relation between
• the current salary • previous experience (month)
• month since hire • beginning salary? MULTIPLE CORRELATION
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Analyze / Correlate / Bivariate… 0 I r I 0,3 weak dependence 0,3 I r I 0,7 medium-strong dependence 0,7 I r I 1 strong dependence
C
Just direction!
r
Shows direction and strength
+
-
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Output View Correlations
Current Salary
r
C Inverse relationship & weak dependence
Direct relationship & strong dependence
Previous Experience (months)
Months since Hire
Beginning Salary
Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N Pearson Correlation Sig. (2-tailed) Sum of Squares and Cross-products Covariance N
Matrix Previous Experience (months) -,097* ,034
Months since Hire ,084 ,067
Beginning Salary ,880** ,000
1,379E+011
-82332343,5
6833347,5
5,59E+010
291578214,5 474 -,097* ,034
-174064,151 474 1
14446,823 474 ,003 ,948
118284577 474 ,045 ,327
-82332343,54
5173806,810
1482,241
17573777
-174064,151 474 ,084 ,067
10938,281 474 ,003 ,948
3,134 474 1
37153,862 474 -,020 ,668
6833347,489
1482,241
47878,295
-739866,50
3,134 474 ,045 ,327
101,223 474 -,020 ,668
-1564,200 474 1
55948605048
17573776,7
-739866,5
2,93E+010
118284577,3 474
37153,862 474
-1564,200 474
61946945 474
Current Salary 1
14446,823 474 ,880** ,000
*. Correlation is significant at the 0.05 level (2-tailed). **. Correlation is significant at the 0.01 level (2-tailed).
Inverse relationship Direct relationship
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Linear regression y ŷ = b0 + b 1 x
b1: for every 1 unit increase in x we expect y to change by b1 units on average b0: when x=0, y=b0 x
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Exercise 3 – Linear Regression File / Open / Employee data.sav Determine a linear relationship between the salary and the age of the employees!
Create a new variable!
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Transform / Compute Variable…
Create a new variable: age = this year – date of birth (in year)
This year
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Regression
Analyze / Regression / Linear…
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Model Summary Model 1
R ,146a
R Square ,021
Adjusted R Square ,019
Std. Error of the Estimate $16,928.804
a. Predictors: (Constant), age
Multiple correlation coefficient R
Adjusted multiple determination coefficient
ry21 ry22 2ry1 ry 2 r12 1 r122
R 2 1
Multiple determination coefficient
It expresses the combined effect of all the variables acting on the dependent variable
How many percent of the variation of the dependent variable can be explained by the variation of all the independent variables
Weak dependence
The dependent variable’s (current salary) variation is explained in 2,1% by the regression model
n 1 (1 R 2 ) n p 1
It enables to compare the multiple determination coefficient among populations / samples with different size and different number of dependent variables as it control for the number of sample / population size (n) and the number of independent variables (p)
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
F-test: for model testing
We can accept the model in every significance level. The F ratio (in the Analysis of Variance Table) is 10.241 and significant at p=.001. This provides evidence of existence of a linear relationship between the variables
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet Coefficientsa
Model b0 (Constant) 1 b1 age
Unstandardized Coefficients B Std. Error 41543,805 2358,686 -211,609 66,124
Standardized Coefficients Beta -,146
t 17,613 -3,200
Sig. ,000 ,001
a. Dependent Variable: Current Salary
The regression line: ŷ = b0 + b1x
We can accept the parameters at every significance level.
b0: If the x variable is 0, how much is the y.
If the employees are 0-year-old, they earn $41543,805 (It doesn’t mean anything.) b1: If the x increases by 1 unit, what is the difference in y. If the employees are 1 year older, they earn less money with $211,609 on average.
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Exercise 4 – Curve Estimation File / Open / Employee data.sav Determine the relationship between the salary and the age of the employees! Which regression model fit the most?
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Analyze / Regression / Curve Estimation…
• Linear • Compound • Power
To get a chart
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Output View Model Summary
Linear
R ,146
R Square ,021
Adjusted R Square ,019
Std. Error of the Estimate 16928,804
The independent variable is age. Model Summary
Compound
R ,215
R Square ,046
Adjusted R Square ,044
Std. Error of the Estimate ,389
The independent variable is age.
The highest R2
Power
Model Summary R ,156
R Square ,024
Adjusted R Square ,022
The independent variable is age.
Std. Error of the Estimate ,393
• Faculty of Economics • Also in the Output View…
Gazdaságelméleti és Módszertani Intézet
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
• Weak dependence. • The age has 4,6% influence on the current salary’s variation
The model is significant.
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
b a
ŷ = a bx = 40482.362 0.993x a: no analyzation
The parameters are significant.
b: When an employee is 1 year older, the current salary will be 0.993 times higher on average.
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
• Faculty of Economics •
Gazdaságelméleti és Módszertani Intézet
Thank You for Your Attention
[email protected]