In: Statistics and Probability
For one of the final projects, you will individually or in partners demonstrate that you can connect statistics topics in a short but coherent presentation. Your project's data set will need at least four variables - at least two categorical and at least two quantitative. For example, you might consider the following variables for American participants in a survey: birth month (categorical), state of birth (categorical), average number of bowls of cereal eaten per week (quantitative), and amount spent on groceries (quantitative).
(a) First, formulate a research question relating to two of your quantitative variables along the lines of "how does *quantitative variable 1* relate to *quantitative variable 2*?" For example, you might ask "Does the average height for students relate to the average number of hours slept by students?" Include the question in your Word document.
(b) Create a least-squares regression line that answers the research question posed in part (a). Your answer here will be graded on the following: (i) an appropriate scatterplot related to the two variables (ii) correlation coefficient "r" and coefficient of determination "r2" between the two variables, (iii) a determination of whether the correlation coefficient is significant and (iv) whether your line is correct (with slope and intercept) based on the data provided!
Hours of sleep | Height (inches) |
4 | 62 |
5 | 65 |
6 | 65 |
6 | 62 |
7 | 63 |
7 | 67 |
7 | 60 |
7 | 74 |
7 | 64 |
7 | 63 |
8 | 73 |
8 | 62 |
8 | 66 |
8 | 70 |
8 | 72 |
8 | 69 |
8 | 63 |
9 | 60 |
9 | 67 |
10 | 73 |
A) From the data given above we need to formulate the research problem relating to two of your quantitative variables.
Such as,
"Does the average height for students relate to the average number of hours slept by students"
Answer:-
Hence, We can formulate the research problem (based on the data), like is their any impact of "number of hours slept by students" on the "height of the students "
So to solve this research question we need to use Simple Linear Regression(SLR) i.e ordiniary least square regression.
B) Fitting the Simple LInear Regression to our data.(alpha = 0.05)
Note:- Using Excel to solve this problem
1)Scatter Plot:
From this scatter plot we can see that there is linearity between the variables.
2) & 3)
Obtain the correlation coefficient "r" and coefficient of determination "r2" between the two variables
and, Determine whether the correlation coefficient is significant or not.
Answer:-
After using "Regression" from Excel data analysis tookpack we will get these answers.
From below output
r = 0.365829
r2 =0.133831
To check sgnificance the value SignificanceF will be caompared to alpha(0.05)
So,Here SignificanceF > 0.05 i.e 0.1126 > 0.05
hence the correlation coefficient is not significant at 0.05, but it is significant at 11.26 %
4) Whether our line is correct (with slope and intercept) based on the data provided
Answer:-
Please find the SLR output below:-
Hence the SLR model will be:-
Height of Student(y) = 57.3529 + 1.1764 * Hours of sleep(x)
Now,
To answer the question above we have made different regression equation line. From which we concluded that using different regression lines only incresing the complexity of the model, not causing the significant increase in R2 value.
In above we have fitted exponenetial regression line, linear regression line and polynomial regression line respectively.
Hence we can conclude that our fitted linear line is correct with the given data.