In: Statistics and Probability
Based on question 1e above, do you think the following scatter plots contain any outliers or any influential data points? Justify your answers on each plot.
(iii) (iv)
(i) (ii)
(a)
Serial correlation or autocorrelation is usually only defined for weakly stationary processes, and it says there is nonzero correlation between variables at different time points.
Heteroskedasticity means not all of the random variables have the same variance.
(b)
Specified regression model,A regression model is used to investigate the relationship between two or more variables and estimate one variable based on the others.
Estimated regression equation, in statistics, an equation constructed to model the relationship between dependent and independent variables.Either a simple or multiple regression model is initially posed as a hypothesis concerning the relationship among the dependent and independent variables. The least squares method is the most widely used procedure for developing estimates of the model parameters. For simple linear regression, the least squares estimates of the model parameters β0 and β1 are denoted b0 and b1. Using these estimates, an estimated regression equation is constructed: ŷ = b0 + b1x . The graph of the estimated regression equation for simple linear regression is a straight line approximation to the relationship between y and x.
(c)
data type:Quantitative data consist of values representing counts or measurements "Variable: Year in school
Qualitative (or non-numeric) data consist of values that can be placed into nonnumeric categories. "Variable: Political affiliation (rep, dem, ind)
Levels of measurement:Levels of Measurement for Qualitative Data :Qualitative (two levels of qualitative data) " Nominal level (by name) ! No natural ranking or ordering of the data exists. ! e.g. political affiliation (dem, rep, ind) " Ordinal level (by order) ! Provides an order, but can’t get a precise mathematical difference between levels. " e.g. heat (low, medium, high)
Levels of Measurement for Qualitative Data: Political affiliation (dem, rep, ind) Nominal
Level of pain (low, med, high) Ordinal
(d)
ANOVA is the statistical model that you use to predict a continuous outcome on the basis of one or more categorical predictor variables.
Multiple Regression is the statistical model that you use to predict a continuous outcome on the basis of one or more continuous predictor variables.
(e)
Outliers are the data points those diverge by good margin from the overall pattern. It can have an extreme X or Y values or both compared to other values.
Influencers is an outlier that impacts the slope of the regression line. To test the influence of an outlier is to compute the regression equation with and without the outlier.
please rate my answer and comment for doubts.