In: Statistics and Probability
Problem 3 (A Real Data Application). Recall in the simple linear regression model in Module 3, I gave a real data example using the Nobel-winning Capital Asset Pricing Model (CAPM). In that example, we obtained R2 = 0.108, or 10.8%, which is a small value way less than 100%. This means that the single independent variable, the market return, RM, does not explain the return of an individual stock or portfolio very well in this simple linear regression model. Researchers have been developing new methodologies to add other independent variables to better capture the relationship between returns of an individual asset and the measures of these independent variables. Fama and French (1992) [1] develop a three-factor model by adding two other variables on the basis of the CAPM.
The model is in a form of: R = α + β1RM + β2SMB + β3HML + ɛ
where R is the returns of an individual financial asset (i.e. a stock or a portfolio), RMis the market return (such as the S&P 500’s return as we used in the CAPM), SMB is the Small (market capitalization) Minus Big, and the HML is the High (book-to-market ratio) Minus Low. Here RM, SMB, and HML are the three factors. This is a typical multiple linear regression model.
This three-factor model can be used in the mutual fund industry to explain the return of an individual asset by the three factors. I have uploaded one real data set in EXCEL into the Homework Assignment#4 area in Canvas. Please download the data file to work on the following question. In the file, we look at a famous mutual fund called Fidelity Megellan fund. It is a monthly data spanning from January of 1979 to January of 2006.
Question: Using EXCEL, please run the estimation procedure for the above-mentioned three-factor model, and illustrate your findings/comments based on the estimation of the model. Please specially pay attention to the R2 (35 points). This is not all of the data but if you could just show me how to run the model that would be great!
RM | SMB | HML | R | |
197901 | 4.18 | 3.5 | 2.18 | 9.62 |
197902 | -3.41 | 0.43 | 1.14 | -5.14 |
197903 | 5.75 | 3.2 | -0.61 | 11.91 |
197904 | 0.05 | 2.11 | 1.11 | 1.26 |
197905 | -2.18 | 0.12 | 1.24 | -3.43 |
197906 | 3.88 | 1.06 | 1.34 | 6.59 |
197907 | 0.73 | 1.32 | 1.77 | 1.8 |
197908 | 5.7 | 1.97 | -1.6 | 10.95 |
197909 | -0.69 | -0.29 | -0.89 | -1.47 |
197910 | -8.14 | -3.29 | -1.89 | -10.46 |
197911 | 5.37 | 2.75 | -3.24 | 9.27 |
197912 | 1.87 | 4.09 | -2 | 4.35 |
198001 | 5.76 | 1.56 | 1.7 | 7.85 |
198002 | -0.79 | -1.89 | 0.63 | -0.49 |
198003 | -13.23 | -6.51 | -1.04 | -14.71 |
|
Using EXCEL, please run the estimation procedure for the above-mentioned three-factor model, and illustrate your findings/comments based on the estimation of the model.
We run the following steps in EXCEL-
1. Enter the given data as RM in col A, SMB in col B,HML in col C and R in col D.
2. Selcet Data tab then Data analysis.
3. Select regression and click OK.
4. Enter the y range as R (D1: D16) values and x range as RM,SMB and HML (A1:C16) values together and select labels.
5.Click OK.
After running the above steps we get the following output-
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.982126 | |||||||
R Square | 0.964572 | |||||||
Adjusted R Square | 0.95491 | |||||||
Standard Error | 1.686743 | |||||||
Observations | 15 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 3 | 852.0716 | 284.0239 | 99.82911 | 2.92E-08 | |||
Residual | 11 | 31.29611 | 2.845101 | |||||
Total | 14 | 883.3677 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 1.380736 | 0.472126 | 2.924511 | 0.013825 | 0.341595 | 2.419878 | 0.341595 | 2.419878 |
RM | 1.416683 | 0.174563 | 8.115606 | 5.7E-06 | 1.032473 | 1.800894 | 1.032473 | 1.800894 |
SMB | 0.026413 | 0.340442 | 0.077584 | 0.939552 | -0.7229 | 0.775721 | -0.7229 | 0.775721 |
HML | -0.253 | 0.26869 | -0.94159 | 0.366621 | -0.84438 | 0.338386 | -0.84438 |
0.338386 |
We know that if p-value<= 0.05 we comment the coefficient to be significant.
According to the output we see the p-value of RM is 0.0000057 which is less than 0.05 so it is significant, where as the p-values of SMB = 0.939552and HML= 0.36621 which are > 0.05 so they are not significant.
The overall p-value of the model is 0.000000029 which is < 0.05 so the model is significant.
The r-squared value obtained is 0.9645 which is close to 1. Hence we say that it is a good fit.