In: Statistics and Probability
This is an exploratory problem intended to introduce the idea of
curvilinear regression. Personally, I was a bit shocked to discover
that multiple LINEAR regression is the main vehicle to calculate
regressions for data with nonlinear relationships...sounds a bit
counter-intuitive. However, if we think of the higher-power terms
(quadratic, cubic, etc.) as distinct variables, the ideas work well
together.
Here is a data set for students in a gifted program. The first
score (X1=GPA) is the students’ math grade from last year, and the
second score (Y=SAT) is their SAT-M score. As this is a
non-representative group (when considering the population of all
students taking math classes in high school), it is not unexpected
to see range-restriction effects (generally all high performing,
few lower performing representatives) or ceiling effects (maximum
score on the SAT-M is 800). In data such as this, it is not
uncommon to see non-linear trends.
GPA | SAT |
---|---|
3.6 | 745 |
3.4 | 740 |
3.5 | 735 |
2.8 | 720 |
3.2 | 735 |
3.6 | 740 |
4 | 745 |
3.1 | 735 |
3.6 | 740 |
3 | 735 |
3.8 | 740 |
3.4 | 745 |
3.8 | 750 |
2.1 | 700 |
2.9 | 725 |
3.5 | 740 |
3.2 | 745 |
2.8 | 730 |
3.6 | 730 |
3.2 | 740 |
3.4 | 745 |
3.3 | 740 |
2.8 | 735 |
2.2 | 695 |
3.3 | 740 |
Step 1: Copy the data into your preferred
statistical software program. Change the variable names to GPA and
SAT if need be. Before doing any analysis, look at a scatterplot of
the data with GPA on the horizontal axis and SAT on the vertical
axis. Be sure to note any trends.
The following includes information for Excel users. If you are
not using Excel, please disregard.
Step 2: Run a regression (Data Analysis >
Regression) with SAT as the X variable. Again, be sure to note what
evidence supports the assumptions for a regression analysis. Report
the regression equation and the requested statistics:
SAT=___+___×GPA
(Report regression coefficients accurate to 3 decimal
places.)
R2adj=
(Report accurate to 3 decimal places.)
Step 3: Create a third variable called GPAsq (for
squared GPA). In Excel, use a formula, something like =B1^2 and
fill down the rest of the column.
Step 4: Run the quadratic regression by adding the
independent variable GPAsq to the model. Report the regression
equation and the requested statistics:
SAT=__+__ ×GPA + ___ ×GPA2
(Report regression coefficients accurate to 3 decimal
places.)
R2adj=
(Report accurate to 3 decimal places.)
Step 5: Notice how the adjusted coefficient of
multiple determination changed from the bivariate regression to the
quadratic (multiple) regression. The next step is to determine if
this more complicated model is statistically significantly better
than the more parsimonious linear model.
For the multiple regression model, what was the F-ratio
and the resulting P-value?
Fmodel=
(Report accurate to 2 decimal places.)
P=
(Report accurate to 3 decimal places.)
1)
2)
SUMMARY OUTPUT | ||||||||||
Regression Statistics | ||||||||||
Multiple R | 0.84951 | |||||||||
R Square | 0.721668 | |||||||||
Adjusted R Square | 0.709566 | |||||||||
Standard Error | 7.06443 | |||||||||
Observations | 25 | |||||||||
ANOVA | ||||||||||
df | SS | MS | F | Significance F | ||||||
Regression | 1 | 2976.158 | 2976.158 | 59.63508 | 7.82E-08 | |||||
Residual | 23 | 1147.842 | 49.90617 | |||||||
Total | 24 | 4124 | ||||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |||
Intercept | 656.447 | 10.24413 | 64.08029 | 1.97E-27 | 635.2554 | 677.6386 | 635.2554 | 677.6386 | ||
GPA | 24.15321 | 3.127691 | 7.722375 | 7.82E-08 | 17.68308 | 30.62333 | 17.68308 | 30.62333 | ||
SAT = 656.447 + 24.153*GPA
Adjusted R Square = 0.710
..............
3)
GPA^2 | GPA | SAT |
12.96 | 3.6 | 745 |
11.56 | 3.4 | 740 |
12.25 | 3.5 | 735 |
7.84 | 2.8 | 720 |
10.24 | 3.2 | 735 |
12.96 | 3.6 | 740 |
16 | 4 | 745 |
9.61 | 3.1 | 735 |
12.96 | 3.6 | 740 |
9 | 3 | 735 |
14.44 | 3.8 | 740 |
11.56 | 3.4 | 745 |
14.44 | 3.8 | 750 |
4.41 | 2.1 | 700 |
8.41 | 2.9 | 725 |
12.25 | 3.5 | 740 |
10.24 | 3.2 | 745 |
7.84 | 2.8 | 730 |
12.96 | 3.6 | 730 |
10.24 | 3.2 | 740 |
11.56 | 3.4 | 745 |
10.89 | 3.3 | 740 |
7.84 | 2.8 | 735 |
4.84 | 2.2 | 695 |
10.89 | 3.3 | 740 |
4)
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.923376 | |||||||
R Square | 0.852624 | |||||||
Adjusted R Square | 0.839226 | |||||||
Standard Error | 5.256076 | |||||||
Observations | 25 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 2 | 3516.221 | 1758.11 | 63.63893 | 7.12E-10 | |||
Residual | 22 | 607.7793 | 27.62633 | |||||
Total | 24 | 4124 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 500.8541 | 36.00674 | 13.91001 | 2.22E-12 | 426.1807 | 575.5276 | 426.1807 | 575.5276 |
GPA^2 | -17.1422 | 3.877083 | -4.42141 | 0.000216 | -25.1827 | -9.10158 | -25.1827 | -9.10158 |
GPA | 128.804 | 23.78323 | 5.415747 | 1.94E-05 | 79.48056 | 178.1274 | 79.48056 | 178.1274 |
SAT = 500.854 + (-17.142)*GPA + 128.804*GPA^2
Adjusted R Square 0.839
.........
5)
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 2 | 3516.221 | 1758.11 | 63.63893 | 0.00000 |
Residual | 22 | 607.7793 | 27.62633 | ||
Total | 24 | 4124 |
F ratio = 63.64
p value = 0.000
thanks
revert back for any doubt
please rate