In: Statistics and Probability
The Marseille Water Taxi ferries tourists from the harbor at Marseille, France, to the Frioul Islands in the Mediterranean Sea. The table below shows the number of passengers on the noontime ferry over seven randomly selected days along with the current ambient temperature in degrees Celsius.
Temperature Passengers
16 15
19 20
22 20
26 22
18 10
24 18
Run a regression where "Temperature" is the independent variable and "Passengers" is the dependent variable. Interpret the coefficient of determination in your own words.
Sol:
create a dataframe df2 with the the given data
use lm fucntion in R to fit a model
Plot function to get the scatterplot
abline to get the regression line on plot
summary function in R to get R sq
Rcode:
levels = c("1", "2", "3","4"))
df2 =read.table(header = TRUE, text ="
Temperature Passengers
16 15
19 20
22 20
26 22
18 10
24 18
"
)
df2
linreg <- lm(Passengers~Temperature,data=df2)
plot(x= df2$Temperature,y=df2$Passengers,
main="Scatterplot of Passengers vs Temperature",
xlab="Temperature",ylab="Passengers")
abline(coef(linreg)[1:2],col='red')
## rounded coefficients for better output
cf <- round(coef(linreg), 4)
check to avoid having plus followed by minus for negative
coefficients
eq <- paste0("Passengers = ", cf[1],
ifelse(sign(cf[2])==1, " + ", " - "), abs(cf[2]), "Temperature
"
)
## printing of the equation
mtext(eq, 3, line=0)
Call:
lm(formula = Passengers ~ Temperature, data = df2)
Residuals:
1 2 3 4 5 6
1.249 3.922 1.595 0.492 -5.302 -1.957
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.3387 8.8956 0.150 0.888
Temperature 0.7757 0.4211 1.842 0.139
Residual standard error: 3.594 on 4 degrees of freedom
Multiple R-squared: 0.4589, Adjusted R-squared:
0.3237
F-statistic: 3.393 on 1 and 4 DF, p-value: 0.1393
Regression line to predict passengers is
passengers=1.3387+0.7757 *temperature
R sq=0.4589
45.89% variation in Passengers is explained by model
Explained variance=45.89%
Unexplained variance=100-45.89=54.11%