In: Statistics and Probability
Discussion 1: Searching for Causes
This week examines how to use correlation and simple linear regression to test the relationship of two variables. In both of these tests you can use the data points in a scatterplot to draw a line of best fit; the closer to the line the points are the stronger the association between variables. It is important to recognize, however, that even the strongest correlation cannot prove causation.
For this Discussion, review this week’s Learning Resources and consider a true relationship between variables.
By Day 3
Post a brief explanation of when an observed correlation might represent a true relationship between variables and why. Be specific and provide examples.
With the simple linear regression model
yi=?0+? 1xi+?i
the observed value of the dependent variable yi is composed of a linear function ?0+? 1xi of the explanatory variable xi, together with an error term ?i.
The error terms ? 1 ,…,?n are generally taken to be independent observations from a N(0,?2 ) distribution, for some error variance ?2. This implies that the values y 1 ,…,yn are observations from the independent random variables
Yi ~ N (?0+? 1xi, ?2 )
INTERPRETATION depends on Significance F and P-values of regression model
To check if your results are reliable (statistically significant), => If this value is less than 0.05, you're OK. If Significance F is greater than 0.05, it's probably better to stop using this set of independent variables. Delete a variable with a high P-value (greater than 0.05) and rerun the regression until Significance F drops below 0.05.
CORRELATION
When conducting a statistical test between two variables, it is a good idea to conduct a Pearson correlation coefficient value to determine just how strong that relationship is between those two variables.
FORMULA AND INTERPRETATION ON COEFFICIENT VALUE
In order to determine how strong the relationship is between two variables, a formula must be followed to produce what is referred to as the coefficient value.
The coefficient value can range between -1.00 and 1.00.
1) If the coefficient value is in the negative range => then that means the relationship between the variables is negatively correlated, or as one value increases, the other decreases.
2) If the value is in the positive range => then that means the relationship between the variables is positively correlated, or both values increase or decrease together. Let's look at the formula for conducting the Pearson correlation coefficient value.
Example :-
You were analyzing the relationship between your participants' age and reported level of income.
=> THE formula for correlation is,
so we will find all the calculations as,
After conducting the test,
your Pearson correlation coefficient value is r= +.20.
Therefore, you would have a slightly positive correlation between the two variables,
so the strength of the relationship is also positive and considered strong.
interpretation => you could confidently conclude that, there is strong relationship and positive correlation between one's age and their income . and also can say that as people grow older their income tends to increase as well.
SCATTERPLOT
1) If the data show an uphill pattern as you move from left to right => this indicates a positive relationship between X and Y. As the X-values increase (move right), the Y-values tend to increase (move up).
2) If the data show a downhill pattern as you move from left to right => this indicates a negative relationship between X and Y. As the X-values increase (move right) the Y-values tend to decrease (move down).
3) If the data don’t seem to resemble any kind of pattern (even a vague one) => then no relationship exists between X and Y
EXAMPLE :-
These two scatter plots show the average income for adults based on the number of years of education completed (2006 data). 16 years of education means graduating from college. 21 years means landing a Ph.D.
What type of correlation does each graph represent? |
|
=> Both graphs are positively correlated. As years of education increase, so does income. |
|
Draw a line of best fit for each graph. Then, estimate and compare the earnings for each gender with 11 years of education completed. |
|
Based on these plots it looks like a female who completes 11 years of school can expect to earn around $14,000/year while a male can expect to earn around $23,000/year. |
|
INTERPRETATION :- These graphs show two important things. First, higher education does lead to a higher income in general. Second, there is a gender gap in income. While women have begun to close this discrepancy, there is more work to do. |