In: Economics
Write down the two main reasons for analysing the
model?=?+??+?12
What is ?2 and why is it important? What does it show us?
Why do we need normality in our data and how can we test for it?
What is heteroskedasticity and why do we need to detect and correct for
it when it is present in our model? How do we test for it?
What is autocorrelation, which assumption of the linear model does it violate, and how can we test for it?
Hi there, you have asked too many questions in one go! I will try to most of them:
3. normality:
In statistics, normality tests are conducted to make sure if data is properly modeled by normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed.
the reason why normal distribution is important is more times than not, if your process can closely resemble to the normal distribution, then your process can be defined as being in control.
normal distribution makes it easy to calculate other values such as moments, correlation between variables, etc. because of this, it is easy to analyse the data even if the distribution isnt normal.
in order to check if the data is normally distributed, we must see the shape of the bell curve that is formed. If the shaped is a proper U shaped bell cure and all the points lie on curve, then data is properly distributed, if it is not, then the shape of the bell curve will be uneven and it may also have some points beyond the curve which may be described as outliers.
while determining normality, skewness and kurtosis should be taken into account simultaneously
We can also check 'normal Q-Q plot' i.e. if the data points lie on the line, then the data is normally distributed.
Heteroskedasticity:
its any set of data that has unequal variability across a set of second, predictor variables.
it depicts a cone-like shape on a graph where the points are scattered around the line.
while running a model using regression analysis, if heteroskedasticity is present in your data, it can give you biased results/ co-efficients. in order to avoid heteroskedasticity, one should have a thorough look at the scatter graph, if cone-like shaped exists, then its heteroskedastic model.
detecting heteroskedasticity:
when you perform regression, you get a best fit line. The data points are usually scattered around the line. A residual is a vertical line between the data point and the regression line. they are positive if, there are above the regression line and vice versa.
A residual value is how much a regression line vertically misses a data point.
A residual plot has residual values on vertical axis and horizontal axis consists of independent vriables. A residual plot can suggest heteroskedasticity, these are created by calculating square residuals and plotting them against an explanatory variable and making a separate plot for explanatory variable that is contributing errors.
Tests like Park test and White Test can also determine it.
If a heteroskedastic data is used, it can lead to biased standard errors, results of significance tests wil be either too high or to low, estimators will not be useful.