In: Statistics and Probability
Essay
A small “Delivery Service” business offers same delivery for letters and small packages. The company groups individual deliveries into one trip to reduce transportation cost. Therefore some trips will have more than one delivery. The data (selected randomly) on ten past trips and some output of R-Stats are :
Where: TravTime=total time for one trip in hours.
Distance=total distance for one trip in KMs.
NumDeliv=number of deliveries for each trip.
TravTime |
Distance |
NumDeliv |
7.1 |
91 |
3 |
6.4 |
66 |
1 |
6.2 |
78 |
3 |
7.4 |
111 |
7 |
5.7 |
44 |
2 |
6.6 |
77 |
3 |
6.5 |
80 |
3 |
6.1 |
66 |
2 |
7.2 |
105 |
5 |
6.5 |
76 |
3 |
Model-1 Simple Regression: lm(formula = TravTime ~ Distance, data = Dataset) |
||||||||||
|
a. What is the independent variable (X)?
b. What is the dependent variable (Y)?
c. What is the regression equation? [You can look for the coefficient regression on R output above]
d. What is the standard error of the estimate?
e. Estimate the value of y^ when x = 4
f. Give the explanation from the result of simple linear regression between Travel Time and Distance. Is it good or bad model?
a.
The independent variable (X) in the above linear regression is Distance. This means that Distance is not dependent on any other variable, or is independent in the above linear regression.
b.
The dependent variable (Y) in the above linear regression is TravTime (which means travel time). This means that TravTime is dependent on other variables; in this case, on Distance.
c.
The regression equation is y= b0 + b1x.
The coefficients can be seen from the regression output.
Thus, the regression equations become y= 4.5151 + 0.02588x.
d.
The standard error of the coefficients can be seen from the regression output.
The standard error of the intercept is 0.23641.
The standard error of the slope coefficient is 0.00290.
The standard error of the residual is 0.1696.
e.
When the value of x is 4, we can substitute this value in the regression equation to get the value of y.
Thus, the value of y is 4.5151 + 0.02588 * 4
= 4.5151 + 0.10352
= 4.61862
f.
The regression equation means that even when the distance travelled is 0, the time taken to travel that distance is 4.5151 units. With every unit increase in the distance, the time taken to travel increases by 0.02588 units.
From the given regression output, we can check the R-squared of the model to check how well it is performing. The R-squared of the model is 0.9087, or 90.8%. This means that ~90% of the variation in the dependent variable is being explained by the independent variable. This implies that this is a good model.
If you have any doubts with any of the answers, or need any further explanation, please comment below, and I shall try to solve it. Happy learning!