Multiple linear regression analysis makes several key
assumptions are as follows -
- There must be a linear relationship between the result variable
and the independent variables. Scatter charts can show if there is
a linear or curvilinear relationship.
- Multivariate normality: multiple regression assumes that
residues are normally distributed.
- No multicollinearity: multiple regression assumes that
independent variables are not highly correlated with each other.
This hypothesis is tested using the Variance Inflation Factor (VIF)
values.
- Homoscedasticity: this hypothesis states that the variance of
the error terms is similar between the values of the independent
variables. A plot of residuals standardized with respect to the
expected values can show if the points are equally distributed over
all the values of the independent variables.Intellectus statistics
automatically include intake tests and diagrams when a regression
is performed.
---------------------------------------------------------------------------------------------------------------------------------------------------------
EXPLANATION
Multiple linear regression requires at least two independent
variables, which can be nominal, ordinal, or interval/ratio
variables. An empirical rule for sample size is that regression
analysis requires at least 20 cases per independent variable in the
analysis.
- First of all, multiple linear regression requires that the
relationship between independent and dependent variables is linear.
The linearity hypothesis can be better tested with
scatterplots.
- Secondly, multiple linear regression analysis requires that
errors between observed and expected values (ie, regression
residuals) should normally be distributed. This hypothesis can be
verified by observing a histogram or a Q-Q-Plot. The normality can
also be verified with a test of goodness of adaptation (for
example, the Kolmogorov-Smirnov test), although this test should be
conducted on the residues themselves.
- Third, multiple linear regression assumes that there is no
multicollinearity in the data. Multicollinearity occurs when
independent variables are too closely related to each other.
Multicollinearity can be controlled in different ways:
- Correlation matrix - When calculating an array of Pearson
bivariate correlations among all the independent variables, the
extent of correlation coefficients should be less than 0.80.
- Variance Influence Factor (VIF) - Linear Regression VIFs
indicate the degree to which variances in regression estimates have
increased due to multicollinearity. VIF values above 10 indicate
that multicollinearity is a problem.
- If multicollinearity is found in the data, a possible solution
is to center the data. To center the data, subtract the average
score from each observation for each independent variable. However,
the simplest solution is to identify the variables that cause
multicollinearity problems (ie, through correlations or VIF values)
and by removing these variables from the regression.
- 4. The last hypothesis of multiple linear regression is
homoscedasticity. A dispersion plot of residues with respect to
expected values is a good way to verify homoscedasticity. There
should be no clear patterns in the distribution; if there is a
cone-shaped model, the data are heteroscedastic.If the data is
heteroscedastic, a non-linear data transformation or the addition
of a quadratic term could solve the problem.
--------------------------------------------------------------------------------------------------------------------------------------------------------