In: Statistics and Probability
Question No. 01: Linear Regression Analysis in SPSS Statistics
a. Assume a case study to use simple linear regression for
analysis and precisely interpret the results of your
study. Also, use Y=aX + b to predict the results.
b. Suppose another case study to use multiple linear regression,
Interpret the results tactfully. Also, use
Z=aX+bY+c to predict the results. (Use screenshots as
required).
Simple linear regression model in SPSS:
Let Y : Price (dependent variable)
X: income ( independent variable)
There are 5 steps to analyse your data using linear regression in SPSS Statistics,
1.Click Analyze > Regression > Linear... on the top menu, as shown below:
You will be presented with the Linear Regression dialogue box:
2.Transfer the independent variable, Income, into the Independent(s):box and the dependent variable, Price, into the Dependent: box. You can do this by either drag-and-dropping the variables or by using the appropriate buttons. You will end up with the following screen.
3. You now need to check four of the assumptions discussed below
a) no significant outliers .
b) independence of observations.
c)Homoscedasticityassumption
d)and normal distribution of errors/residuals .
You can do this by using the statistics button and plots features, and then selecting the appropriate options within these two dialogue boxes. In our enhanced linear regression guide, we show you which options to select in order to test whether your data meets these four assumptions.
4.Click on the ok button. This will generate the results.
Output:
The first table of interest is the Model Summary table, as shown below:
This table provides the R and R2 values. The R value represents the simple correlation and is 0.873 , which indicates a high degree of correlation. The R2 value indicates how much of the total variation in the dependent variable, Price, can be explained by the independent variable, Income. In this case, 76.2% can be explained, which is very large.
The next table is the ANOVA table, which reports how well the regression equation fits the data (i.e., predicts the dependent variable) and is shown below:
Look at the "Regression" row and go to the "Sig." column. This indicates the statistical significance of the regression model that was run. Here, p < 0.0005, which is less than 0.05, and indicates that, overall, the regression model statistically significantly predicts the outcome variable (i.e., it is a good fit for the data).
The Coefficients table provides us with the necessary information to predict price from income, as well as determine whether income contributes statistically significantly to the model (by looking at the "Sig." column). Furthermore, we can use the values in the "B" column under the "Unstandardized Coefficients" column, as shown below:
The fitted simple linear regression equation as:
Price = 8287 + 0.564(Income)
Multiple linear regression analysis in SPSS :
Open data file.
In our example,
Y: Murder rate ( dependent variable) and population, burglary, larceny, and vehicle theft are independent variable.
In our example, we need to enter the variable “murder rate” as the dependent variable and the population, burglary, larceny, and vehicle theft variables as independent variables.
In the field “Options…” we can set the stepwise criteria. We want to include variables in our multiple linear regression model that increase the probability of F by at least 0.05 and we want to exclude them if the increase F by less than 0.1.
The “Statistics…” menu allows us to include additional statistics
that we need to assess the validity of our linear regression
analysis.
It is advisable to include the collinearity diagnostics and the
Durbin-Watson test for auto-correlation. To test the assumption of
homoscedasticity and normality of residuals we will also include a
special plot from the “Plots…” menu.
The next table shows the multiple linear regression model summary and overall fit statistics. We find that the adjusted R² of our model is .398 with the R² = .407. This means that the linear regression explains 40.7% of the variance in the data. The Durbin-Watson d = 2.074, which is between the two critical values of 1.5 < d < 2.5. Therefore, we can assume that there is no first order linear auto-correlation in our multiple linear regression data.
The F-test is highly significant, thus we can assume that the model explains a significant amount of the variance in murder rate.
The next table shows the multiple linear regression estimates including the intercept and the significance levels.
we find that only burglary and motor vehicle theft are significant predictors. We can also see that motor vehicle theft has a higher impact than burglary by comparing the standardized coefficients (beta = .507 versus beta = .333).
The information in the table above also allows us to check for multicollinearity in our multiple linear regression model. Tolerance should be > 0.1 (or VIF < 10) for all variables, which they are.