In: Math
Use the data in the file andy.dta consisting of data on hamburger franchises in 75 cities from Big Andy's Burger Barn.
Set up the model
ln(Si)=b1 + b2ln(Ai) + ei,
where
Si = Monthly sales revenue ($1000s) for the i-th firm
Ai = Expenditure on advertising ($1000s) for the i-th firm
(a) Interpret the estimates of slope and intercept.
(b) How well did the model fit to the data? Use any tests and measures presented in class.
(c) Perform any test for heteroscedasticity in your data.
sales | price | advert |
73.2 | 5.69 | 1.3 |
71.8 | 6.49 | 2.9 |
62.4 | 5.63 | 0.8 |
67.4 | 6.22 | 0.7 |
89.3 | 5.02 | 1.5 |
70.3 | 6.41 | 1.3 |
73.2 | 5.85 | 1.8 |
86.1 | 5.41 | 2.4 |
81 | 6.24 | 0.7 |
76.4 | 6.2 | 3 |
76.6 | 5.48 | 2.8 |
82.2 | 6.14 | 2.7 |
82.1 | 5.37 | 2.8 |
68.6 | 6.45 | 2.8 |
76.5 | 5.35 | 2.3 |
80.3 | 5.22 | 1.7 |
70.7 | 5.89 | 1.5 |
75 | 5.21 | 0.8 |
73.7 | 6 | 2.9 |
71.2 | 6.37 | 0.5 |
84.7 | 5.33 | 2.1 |
73.6 | 5.23 | 0.8 |
73.7 | 5.88 | 1.1 |
78.1 | 6.24 | 1.9 |
75.7 | 5.59 | 2.1 |
74.4 | 6.22 | 1.3 |
68.7 | 6.41 | 1.1 |
83.9 | 4.96 | 1.1 |
86.1 | 4.83 | 2.9 |
73.7 | 6.35 | 1.4 |
75.7 | 6.47 | 2.5 |
78.8 | 5.69 | 3 |
73.7 | 5.56 | 1 |
80.2 | 6.41 | 3.1 |
69.9 | 5.54 | 0.5 |
69.1 | 6.47 | 2.7 |
83.8 | 4.94 | 0.9 |
84.3 | 6.16 | 1.5 |
66 | 5.93 | 2.8 |
84.3 | 5.2 | 2.3 |
79.5 | 5.62 | 1.2 |
80.2 | 5.28 | 3.1 |
67.6 | 5.46 | 1 |
86.5 | 5.11 | 2.5 |
87.6 | 5.04 | 2.1 |
84.2 | 5.08 | 2.8 |
75.2 | 5.86 | 3.1 |
84.7 | 4.89 | 3.1 |
73.7 | 5.68 | 0.9 |
81.2 | 5.83 | 1.8 |
69 | 6.33 | 3.1 |
69.7 | 6.47 | 1.9 |
78.1 | 5.7 | 0.7 |
88 | 5.22 | 1.6 |
80.4 | 5.05 | 2.9 |
79.7 | 5.76 | 2.3 |
73.2 | 6.25 | 1.7 |
85.9 | 5.34 | 1.8 |
83.3 | 4.98 | 0.6 |
73.6 | 6.39 | 3.1 |
79.2 | 6.22 | 1.2 |
88.1 | 5.1 | 2.1 |
64.5 | 6.49 | 0.5 |
84.1 | 4.86 | 2.9 |
91.2 | 5.1 | 1.6 |
71.8 | 5.98 | 1.5 |
80.6 | 5.02 | 2 |
73.1 | 5.08 | 1.3 |
81 | 5.23 | 1.1 |
73.7 | 6.02 | 2.2 |
82.2 | 5.73 | 1.7 |
74.2 | 5.11 | 0.7 |
75.4 | 5.71 | 0.7 |
81.3 | 5.45 | 2 |
75 | 6.05 | 2.2 |
We can run this using the open source statisitcal Package R , the R snippet is as follows
andy <- read.csv(file.choose())
## fit the regression
fit <- lm(log(sales)~log(advert),data=andy)
summary(fit)
fit$coefficients
par(mfrow=c(2,2)) # init 4 charts in 1 panel
plot(fit)
## test for heterscadisicity
car::ncvTest(fit)
The results are
summary(fit)
Call:
lm(formula = log(sales) ~ log(advert), data = andy)
Residuals:
Min 1Q Median 3Q Max
-0.180134 -0.054071 0.000106 0.062748 0.168751
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.32290 0.01283 337.059 <2e-16 ***
log(advert) 0.04554 0.01784 2.553 0.0128 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.08146 on 73 degrees of freedom
Multiple R-squared: 0.08197, Adjusted R-squared: 0.06939 # the model can explain only 8% variation in the data , hence it is not a good model
F-statistic: 6.518 on 1 and 73 DF, p-value: 0.01277
fit$coefficients
(Intercept) log(advert)
4.32290054 0.04553909
the regression equation is formed using the coefficients as
ln(sales) = 4.322 + 0.0455*log(advert)
car::ncvTest(fit)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 0.01313877 Df = 1 p = 0.9087428
test have a p-value not less than a significance level of 0.05, therefore we cant reject the null hypothesis that the variance of the residuals is constant and infer that heteroscedasticity is not present, thereby confirming our graphical inference.