In: Computer Science
2. Instructions: Use Excel to complete the following questions. Attach copies of the Excel pages or embed the Excel work in your write-up.
The marketing department at a large state university undertook a study to relate the starting salary (y) after graduation for marketing majors to grade point average (GPA) in major courses. To do this, records of seven recent marketing graduates were randomly selected, and the data shown above were obtained. For this project, do the following: 1. Input the data into Excel 2. Create a scatter plot of the data points 3. Perform a linear regression analysis to determine the regression equation and the coefficient of determination. 4. Plot the line of the regression equation on your scatter plot. Respond in writing to the following: 1. Report the linear regression equation produced from Excel. (Display the scatterplot with the regression line included) 2. What is the value of Pearson’s correlation coefficient r? 3. Is Pearson’s correlation coefficient r positive or negative? What does this imply about the relationship between the factors in this study? 4. What is the implication of any correlation found between the variables in this analysis? 5. Does this correlation imply a causal relationship? Explain. Excel output and a written report. |
Below is the table and scattered plot of the given data.
Mathematically, a linear regression is defined by this equation:
y = bx + a + ε
Where:
The linear regression equation always has an error term because, in real life, predictors are never perfectly precise. However, some programs, including Excel, do the error term calculation behind the scenes. So, in Excel, you do linear regression using the least squares method and seek coefficients a and b such that:
y = bx + a
For our example, the linear regression equation takes the following shape:
Starting Salary = b * GPA + a
Now to perform linear regression anaylsis follow below steps:
7. Click OK and observe the regression analysis output created by Excel.
Regression analysis output: coefficients
This section provides specific information about the components
of your analysis:
The most useful component in this section is Coefficients. It enables you to build a linear regression equation in Excel:
y = bx + a
For our data set, where y is the number of umbrellas sold and x is an average monthly rainfall, our linear regression formula goes as follows:
Y =Coefficient * x + Intercept
Equipped with a and b values rounded to three decimal places, it turns into:
y = 5.7066x + 14.816
4. Now to plot linear regression follow below steps:
1. Right click on any point of scattered graph and choose Add Trendline… from the context menu and select below details.
After this your chart will look like below having regression line eqaution above the plotted line.
Now for Pearson’s correlation coefficient r. Go back to Regression analysis output sheet.
Regression analysis output: Summary Output
This part tells you how well the calculated linear regression
equation fits your source data.
Here is what they means and implicates
Multiple R. It is the Correlation Coefficient that measures the strength of a linear relationship between two variables. The correlation coefficient can be any value between -1 and 1, and its absolute value indicates the relationship strength. The larger the absolute value, the stronger the relationship:
A correlation is a measure or degree of relationship between two variables. A set of data can be positively correlated, negatively correlated or not correlated at all.
As one set of values increases the other set tends to increase then it is called a positive correlation or As one set of values increases the other set tends to decrease then it is called a negative correlation or If the change in values of one set doesn't affect the values of the other, then the variables are said to have "no correlation" or "zero correlation.".
A causal relation between two events exists if the occurrence of the first causes the other. The first event is called the cause and the second event is called the effect.
Here it seems that gpa and starting salary may have a casual relation as it effect the salary. But there may be the case that company would be considering any other third, lurking variable( Like interview or entrance exams or certifictaes or experience) that that makes the relationship appear stronger (or weaker) than it actually is. So correlation between two variables does not imply causation
.