Question

In: Statistics and Probability

Use the data set named Store_Visits located in the folder Data Files for HW Assignment (outside...

Use the data set named Store_Visits located in the folder Data Files for HW Assignment (outside of Minitab folder) in the K-drive. The response variable y is the number of visits of a customer to a particular food store in a large suburban area within the period of a month, and the independent variable x is the distance (in miles) of the customer’s home to the store.

Fit a simple linear regression model to the data, and answer the following questions.

a) Give the proportion of the variation in the number of visits per month of a customer explained by the distance of the customer’s home to the store.

b) Submit the residual plot. ^ It appears from the plot that there is a problem with one of the model assumptions. Which one is it, and what would you suggest to remedy the problem?

  1. c) Carry out your suggestion to fix the problem of part (b) and submit a new residual plot. Does your suggested remedy work? ^

  2. d) Based on your new model, what is the proportion of the variation in the number of visits per month of a customer explained by the distance of the customer’s home to the store? How does it compare to that of the original model?

  3. e) Based on your new model, construct a 95% prediction interval for y, the number of visits to the store for a customer who lives 2.5 miles from the store. Interpret the P.I.

K-Drive data. -Minitab

y   x
12   0.8
5   1.2
6   2.3
8   1.5
3   3.2
2   6.3
1   7.9
2   5.3
6   1.5
3   1.9
10   1.7
5   2.6
3   2.9
6   4.2
2   3.9
4   3.1
3   5.8
6   1.7
7   2.2
2   4.5
1   6.1
1   5.8
1   7.4
3   6.4
2   4.7
2   3.9
3   4
4   4.6

Solutions

Expert Solution

The response variable y is the number of visits of a customer to a particular food store in a large suburban area within the period of a month,

And the independent variable x is the distance (in miles) of the customer’s home to the store.

by using Minitab

Fit a simple linear regression model to the data

Steps

1) Enter given data in Minitab coloums

2) Select following options

     Stats Regression Regression Fit Regression Modal

3) Then in obtained box select

   Responses : Column of variable y

And          Continue Predictors : Column of variable x

4) You can select Graph You need ( this can residual graph ) , and storage if wanted

5) then click ok

Now these is our data ( copied from minitab with residuals output )

y x RESI1
12 0.8 4.721378
5 1.2 -1.85132
6 2.3 0.323756
8 1.5 1.469155
3 3.2 -1.71482
2 6.3 0.596763
1 7.9 1.305966
2 5.3 -0.47149
6 1.5 -0.53085
3 1.9 -3.10354
10 1.7 3.682805
5 2.6 -0.35577
3 2.9 -2.03529
6 4.2 2.353435
2 3.9 -1.96704
4 3.1 -0.82164
3 5.8 1.062638
6 1.7 -0.31719
7 2.2 1.216931
2 4.5 -1.32609
1 6.1 -0.61689
1 5.8 -0.93736
1 7.4 0.77184
3 6.4 1.703589
2 4.7 -1.11244
2 3.9 -1.96704
3 4 -0.86022
4 4.6 0.780735

And this is minitab output

Regression Analysis: y versus x

Analysis of Variance

Source        DF   Seq SS   Contribution Adj SS   Adj MS    F-Value P-Value
Regression      1   122.25        58.50%     122.25   122.246    36.65     0.000
          x             1   122.25        58.50%   122.25 122.246    36.65     0.000
     Error          26 86.72        41.50% 86.72      3.335
Lack-of-Fit      22    74.72        35.76%      74.72      3.396        1.13 0.510
Pure Error        4    12.00         5.74%     12.00      3.000
Total            27 208.96       100.00%


Model Summary

      S    R-sq R-sq(adj)    PRESS R-sq(pred)
1.82628 58.50%     56.90% 103.156      50.63%


Coefficients

Term       Coef     SE Coef       95% CI         T-Value P-Value    VIF
Constant 8.133    0.760     ( 6.572, 9.695)    10.71    0.000
     x           -1.068    0.176 (-1.431, -0.706)    -6.05   0.000      1.00


Regression Equation

y = 8.133 - 1.068 x


Fits and Diagnostics for Unusual Observations

                                                     Std    Del
Obs       y   Fit    SE Fit      95% CI      Resid Resid Resid        HI Cook’s D    DFITS
1 12.000 7.279   0.637 (5.969, 8.588) 4.721   2.76   3.22 0.121741      0.53 1.19750
11 10.000 6.317   0.511 (5.267, 7.368) 3.683   2.10   2.26 0.078294      0.19 0.65879


Obs
1 R
11 R

R Large residual

Residual Plots for y

Regression Equation

y = 8.133 - 1.068 x

a)

Proportion of the variation in the number of visits per month of a customer explained by the distance of the customer’s home to the store is 58.50%

From output table

Model Summary

      S       R-sq        R-sq(adj)    PRESS   R-sq(pred)
1.82628 58.50%     56.90%    103.156      50.63%

b) Submit the residual plot

It appears from the plot that there is a problem with one of the model assumptions

The plot shows a some U shape pattern i.e plot patterns are non-random , thus variance is not constant .

Suggesting a better fit for a non-linear model .

c)

Suggestion to fix the problem of part (b) it to use transformation ( nonlinear transformation)

A nonlinear transformation changes (increases or decreases) linear relationships between variables and, thus, changes the correlation between variables .

Steps in minitab

Steps

1) For data in Minitab coloums

2) Select following options

     Stats Regression Regression Fit Regression Modal

3) Then in obtained box select

   Responses : Column of variable y

And          Continue Predictors : Column of variable x

4) You can select Graph You need ( this can residual graph ) , and storage if wanted

5) Select Option , you will see box with " No transformation "

     Change it to " = 0.5 (square root)" For square root transformation .

6) then click ok

Here RESI2 is residual ( e = - ) column

y x RESI2
12 0.8 0.745803
5 1.2 -0.37464
6 2.3 0.134675
8 1.5 0.298421
3 3.2 -0.34067
2 6.3 0.175358
1 7.9 0.191528
2 5.3 -0.09363
6 1.5 -0.08052
3 1.9 -0.69036
10 1.7 0.68607
5 2.6 0.001951
3 2.9 -0.42137
6 4.2 0.645756
2 3.9 -0.47022
4 3.1 -0.09962
3 5.8 0.358701
6 1.7 -0.02672
7 2.2 0.304038
2 4.5 -0.30882
1 6.1 -0.29265
1 5.8 -0.37335
1 7.4 0.057034
3 6.4 0.520095
2 4.7 -0.25503
2 3.9 -0.47022
3 4 -0.12548
4 4.6 0.303862

Minitab Output

Regression Analysis: y versus x

Method

Box-Cox transformation λ = 0.5


Analysis of Variance for Transformed Response

Source         DF   Seq SS Contribution Adj SS Adj MS F-Value P-Value
Regression      1   7.7510        66.04% 7.7510 7.7510    50.56    0.000
   x             1   7.7510        66.04% 7.7510 7.7510    50.56    0.000
    Error          26   3.9856        33.96% 3.9856 0.1533
Lack-of-Fit     22   3.3918        28.90% 3.3918 0.1542     1.04    0.553
Pure Error 4   0.5938         5.06% 0.5938 0.1484
Total            27 11.7366       100.00%


Model Summary for Transformed Response

       S         R-sq      R-sq(adj)    PRESS      R-sq(pred)
0.391525 66.04%     64.74%     4.64139      60.45%


Coefficients for Transformed Response

Term        Coef    SE Coef      95% CI       T-Value P-Value   VIF
Constant     2.933     0.163    ( 2.599,   3.268)      18.01    0.000
     x          -0.2690    0.0378 (-0.3467, -0.1912)    -7.11    0.000     1.00


Regression Equation

y^0.5 = 2.933 - 0.2690 x


Fits and Diagnostics for Unusual Observations

Original Response

Obs         y      Fit      95% CI
1 12.0000 7.3891 (5.9414, 8.9946)


Transformed Response

                                                    Std    Del
Obs     y'      Fit       SE Fit      95% CI     Resid   Resid Resid        HI Cook’s D     DFITS
1    3.464 2.718   0.137 (2.437, 2.999) 0.746   2.03    2.17    0.121741      0.29 0.809136


Obs
1 R

y' = transformed response
R Large residual


Residual Plots for y

New residual plot are

We can see a random pattern in residual plot . And hence variance is constant

Yes suggested remedy work . ( i.e square root transformation )

d) For new model , Proportion of the variation in the number of visits per month of a customer explained by the distance of the customer’s home to the store is 66.04%

Model Summary for Transformed Response

       S         R-sq      R-sq(adj)    PRESS      R-sq(pred)
0.391525 66.04%     64.74%     4.64139      60.45%

Compare to that of the original model variance explained is increased by 8 %

For original model R2 = 58.50%

For new model R2   = 66.04%

e) Based on your new model, construct a 95% prediction interval for y, the number of visits to the store for a customer who lives 2.5 miles from the store. Interpret the P.I.

For new model 95% confidence interval is for variable x is (-0.3467, -0.1912) and that of constant or intercept is ( 2.599,   3.268)

{   Coefficients for Transformed Response

   Term        Coef    SE Coef      95% CI       T-Value P-Value   VIF
   Constant     2.933     0.163    ( 2.599,   3.268)      18.01    0.000
        x          -0.2690    0.0378 (-0.3467, -0.1912)    -7.11    0.000     1.00

}

Our New Model is

Regression Equation

y^0.5 = 2.933 - 0.2690 x

a prediction interval ( P.I ) is an estimate of an interval in which a future observation will fall, with a certain probability, (here 0.95 )

Construct a 95% prediction interval for y, the number of visits to the store for a customer who lives x =2.5 miles from the store

y^0.5 = 2.933 - 0.2690 x

   = ( 2.933 - 0.2690 x ) 2

for x = 2.5

= ( 2.933 - 0.2690 * 2.5 ) 2   = 5.10986

thus   = 5.10986        at x = 2.5

95% prediction interval for y is given by

t *   

Where t   =

Here n = 28 number of observation .

   and k = 1 number of independent variable

At 5 % level of significance

t   =    = =

You can find it from software like minitab , R or by statistical tables

t = = 2.055

And

=   S *

n = 28

x' = 2.5

= mean ( x ) = 3.835714                ( can be calculated manually )

   = 107.1243  

Here S = 0.391525 ( form output table )

    S           R-sq      R-sq(adj)    PRESS      R-sq(pred)
0.391525   66.04%     64.74%     4.64139      60.45%

Hence S2 = ( 0.391525 )2 = 0.1532918

And

=   S * = 0.391525 *    = 0.05969364

And = 0.003563331

hence 95% prediction interval for y is given by

t *

     = 5.10986 2.055 * = 5.10986 2.055 * 0.3960494

Thus 95% prediction interval for y is given by ( at x =2.5 )

P. I . = 5.10986 0.8138815

         = ( 4.295979 ,5.923742 )


Related Solutions

Use the data set named Store_Visits located in the folder Data Files for HW Assignment (outside...
Use the data set named Store_Visits located in the folder Data Files for HW Assignment (outside of Minitab folder) in the K-drive. The response variable y is the number of visits of a customer to a particular food store in a large suburban area within the period of a month, and the independent variable x is the distance (in miles) of the customer’s home to the store. Fit a simple linear regression model to the data, and answer the following...
JAVA There is a folder named Recursive folder at the same location as these below files,...
JAVA There is a folder named Recursive folder at the same location as these below files, which contains files and folders. I have read the data of the Recursive folder and stored in a variable. Now they are required to be encrypted and decrypted. You may copy past the files to three different files and see the output. I am confused on how to write code for encrypt() and decrypt() The names of files and folders are stored in one...
Using NetBeans, create a Java project named FruitBasket. Set the project location to your own folder....
Using NetBeans, create a Java project named FruitBasket. Set the project location to your own folder. 3. Import Scanner and Stacks from the java.util package. 4. Create a Stack object named basket. 5. The output shall: 5.1.Ask the user to input the number of fruits s/he would like to catch. 5.2.Ask the user to choose a fruit to catch by pressing A for apple, O for orange, M for mango, or G for guava. 5.3.Display all the fruits that the...
Inside “Lab1” folder, create a project named “Lab1Ex3”. Use this project to develop a C++ program...
Inside “Lab1” folder, create a project named “Lab1Ex3”. Use this project to develop a C++ program that performs the following:  Define a function called “Find_Min” that takes an array, and array size and return the minimum value on the array.  Define a function called “Find_Max” that takes an array, and array size and return the maximum value on the array.  Define a function called “Count_Mark” that takes an array, array size, and an integer value (mark) and...
Inside “Lab1”folder, create a project named “Lab1Ex1”. Use this project to write and run a C++...
Inside “Lab1”folder, create a project named “Lab1Ex1”. Use this project to write and run a C++ program that produces:  Define a constant value called MAX_SIZE with value of 10.  Define an array of integers called Class_Marks with MAX_SIZE which contains students Mark. Define a function called “Fill_Array” that takes an array, and array size as parameters. The function fills the odd index of the array randomly in the range of [50- 100] and fills the even index of...
Open the files for the Course Project and the data set. For each of the five...
Open the files for the Course Project and the data set. For each of the five variables, process, organize, present, and summarize the data. Analyze each variable by itself using graphical and numerical techniques of summarization. Use Excel as much as possible, explaining what the results reveal. Some of the following graphs may be helpful: stem-leaf diagram, frequency/relative frequency table, histogram, boxplot, dotplot, pie chart, and bar graph. Caution: not all of these are appropriate for each of these variables,...
You are to write a class named Rectangle. You must use separate files for the header...
You are to write a class named Rectangle. You must use separate files for the header (Rectangle.h) and implementation (Rectangle.cpp) just like you did for the Distance class lab and Deck/Card program. You have been provided the declaration for a class named Point. Assume this class has been implemented. You are just using this class, NOT implementing it. We have also provided a main function that will use the Point and Rectangle classes along with the output of this main...
In Python This assignment involves the use of text files, lists, and exception handling and is...
In Python This assignment involves the use of text files, lists, and exception handling and is a continuation of the baby file assignment. You should now have two files … one called boynames2014.txt and one called girlnames2014.txt - each containing the top 100 names for each gender from 2014. Write a program which allows the user to search your files for a boy or girl name and display where that name ranked in 2014. For example … >>>   Enter gender...
This assignment involves the use of text files, lists, and exception handling and is a continuation...
This assignment involves the use of text files, lists, and exception handling and is a continuation of the baby file assignment. You should now have two files … one called boynames2014.txt and one called girlnames2014.txt - each containing the top 100 names for each gender from 2014. Write a program which allows the user to search your files for a boy or girl name and display where that name ranked in 2014. For example … >>> Enter gender (boy/girl): boy...
Assignment 3: Data Analysis with Graphs Assignment Data Set A Masses, to the nearest kilogram, of...
Assignment 3: Data Analysis with Graphs Assignment Data Set A Masses, to the nearest kilogram, of 30 men: 74 52 67 68 71 76 86 81 73 64 75 71 57 67 57 59 72 79 64 70 77 79 65 68 76 83 61 63 68 74 Data Set B Times, in seconds, taken by 20 boys to swim one length of a pool 32 31 26 27 27 32 29 26 25 25 31 33 26 30 23...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT