In: Statistics and Probability
A person wants to see if there is correlation between and person's height (x) has any factor on their foot/shoe size (y).
The data is:
Height (X): Foot/shoe Size (Y):
6'3" 12.5 U.S.
6'4" 11.5 U.S.
5'10" 12 U.S.
5'7" 9.5 U.S.
6'7" 15 U.S.
5'3" 7 U.S.
6'0" 10.5 U.S.
6'3" 13 U.S.
discuss why a regression analysis could be appropriate for this problem.
Specifically, what statistical questions are you asking? Why would you want to predict the value of Y? What if you wanted to predict a value of Y that’s beyond the highest value of X (for example if X is time and you want to forecast Y in the future)?examine your classmate’s problem to assess the appropriateness and accuracy of using a linear regression model. Discuss the meaning of the standard error of the estimate and how it affects the predicted values of Y for that analysis.
According to guidelines, only first four parts will be answered.
a) Let us put aside statistics for a moment and think form a layman's perspective. we can see that data is given on a person's height and his/her foot size. from our day to day experience we can say the taller a person is the more is his foot size. it is quite unexpected to imagine a person with a height of 6'6" and having a foot size of 2 U.S. so, from common sense, we can say that a person's height is most likely to affect his/her foot size.
Now, let us get back to statistical world. Regression is a powerful tool that will help you to examine the relationship between two or more variable that are logically related. here the term logic is very important. you have to keep in mind that regresion is nothing but a tool, it examines the relationship between two or more variables but it cant decide whether the relation is logical or not. As a statistician it is our duty to decide whether the variables are logically related or not. for example, world's temperature and literacy rate among people is increasing day by day. but it doesnt mean that literacy rate is causing world temperature to increase. there is no logical relationship between these two.
So, as discussed in the first paragraph, there is a logical relationship between a person's height and his/her foot size. If we want to establish a relationship between a person's height and his/her foot size, regression methods can be used.
b) to answer this question, let us think what regression does. majorly, it does two thigs- 1) establishing relationship between two or more variables and 2) predicting future values.
So, broadly, two statistical questions can be asked:
1) Is there a significant causal relationship between a person;s height and his/her foot size?
2) For a given value of a person's height can you predict his/her foot size?
c) This question is interesting, why we would want to predict y. from the perspective of the data, suppose you want to start a shoe business. the major question is that how much inventory you should keep and what ranges of shoe sizes you should keep. the locality where you have opened your store, all the people are over 7' tall,so if you dont keep shoes of their sizes your business will not be profitable. if you can gather the height of all the peoples in the locality and from there if you can predict their shoe sizes, your business will be more profitable.
d) suppose, we are given the value of x for say 5 data points and we want to predict y outside this range. first you need to establish the relationship between x and y. then fit a regression line. extend this line beyond the range of x and from this line you can find the value of y for which x value is not present.