Question

In: Statistics and Probability

The dataset HomesForSaleCA contains a random sample of 30 houses for sale in California. Suppose that...

The dataset HomesForSaleCA contains a random sample of 30 houses for sale in California. Suppose that we are interested in predicting the Size (in thousands of square feet) for such homes.

State   Price   Size    Beds    Baths
CA      500     3.2     5       3.5
CA      995     3.7     4       3.5
CA      609     2.2     4       3
CA      1199    2.8     3       2.5
CA      949     1.4     3       2
CA      415     1.7     3       2.5
CA      895     2.1     3       2
CA      775     1.6     3       3
CA      109     0.6     1       1
CA      5900    4.8     4       4.5
CA      219     1.1     3       2
CA      255     1.2     3       2
CA      86      0.6     1       1
CA      62      1.2     3       2
CA      165     1.9     5       3.5
CA      1695    6.9     5       5.5
CA      499     1.4     3       2
CA      47      1.5     3       2
CA      195     2       3       2.5
CA      775     1       2       2
CA      199     1.4     3       2
CA      480     3       5       3
CA      173     0.9     3       1
CA      189     2.5     2       2
CA      230     1.7     3       2
CA      380     2.1     5       3
CA      110     0.8     2       1
CA      499     1.3     3       2
CA      399     1.4     3       2
CA      2450    5       4       5


1. What is the total variability in the sizes of the 30 homes in this sample? (Hint: Try a regression ANOVA with any of the other variables as a predictor.)

2. What other variable in the HomesForSaleCA dataset explains the greatest amount of the total variability in home sizes? Explain how you decide on the variable.

3. How much of the total variability in home sizes is explained by the "best" variable identified in question 2? Give the answer both as a raw number and as a percentage.

4. Which of the variables in the dataset is the weakest predictor of home sizes? How much of the variability does it explain?

5. Is the weakest predictor identified in question 4 still an effective predictor of home sizes? Include some justification for your answer.

thank you for your help!

Solutions

Expert Solution

Solution :

The regression with Price is:

0.430
r   0.656
Std. Error   1.096
n   30
k   1
Dep. Var. Size
ANOVA table
Source SS   df   MS F p-value
Regression 25.36173499 1   25.36173499 21.10 .0001
Residual 33.65826501 28   1.20208089
Total 59.02000000 29  
Regression output confidence interval
variables coefficients std. error    t (df=28) p-value 95% lower 95% upper
Intercept 1.4987
Price 0.0008 0.00018305 4.593 .0001 0.0005 0.0012

The regression with Beds is:

0.412
r   0.642
Std. Error   1.113
n   30
k   1
Dep. Var. Size
ANOVA table
Source SS   df   MS F p-value
Regression 24.3432 1   24.3432 19.66 .0001
Residual 34.6768 28   1.2385
Total 59.0200 29  
Regression output confidence interval
variables coefficients std. error    t (df=28) p-value 95% lower 95% upper
Intercept -0.6617
Beds 0.8541 0.1927 4.434 .0001 0.4595 1.2488

The regression with Baths is:

0.846
r   0.920
Std. Error   0.570
n   30
k   1
Dep. Var. Size
ANOVA table
Source SS   df   MS F p-value
Regression 49.9270 1   49.9270 153.74 6.86E-13
Residual 9.0930 28   0.3247
Total 59.0200 29  
Regression output confidence interval
variables coefficients std. error    t (df=28) p-value 95% lower 95% upper
Intercept -0.8648
Baths 1.1859 0.0956 12.399 6.86E-13 0.9900 1.3818

(1) Total variability = 59.0200

(2) Baths explain the highest variability in Size (84.6%)

(3) 84.6%

(4) Beds is the weakest predictor. It explains 41.2% of the variability.

(5) Yes, it is effective since it's p-value is significant and there is a linear relationship of Beds with Size.

Please give me a thumbs-up if this helps you out. Thank you!


Related Solutions

Homes for Sale The dataset HomesForSale has data on houses available for sale in three Mid-Atlantic...
Homes for Sale The dataset HomesForSale has data on houses available for sale in three Mid-Atlantic states (NY, NJ, and PA). Table 1 shows the mean and standard deviation from the three Mid-Atlantic states, in thousands of dollars. Use this table, the knowledge that within the US the average house sells for about 265 thousand dollars1, and a 5% significance level to answer the following questions. State n Mean Std. Dev. New York 30 565.6 697.6 New Jersey 30 388.5...
The following dataset contains a random sample of lifetime of 120 BMW Xenon headlight bulbs. The...
The following dataset contains a random sample of lifetime of 120 BMW Xenon headlight bulbs. The manufacturer of these bulbs wants to know whether it can claim that the bulbs last more than 1000 hours. Use a=0.01 93 1053.74 94 1032.37 95 1003.42 96 908.74 97 1037.92 98 1096.83 99 1169.07 100 842.85 101 1038.22 102 973.30 103 996.08 104 984.70 105 989.44 106 1023.07 107 1047.95 108 1073.63 109 1021.51 110 977.27 111 928.44 112 910.28 113 990.68 114...
The following dataset contains a random sample of countries. Two variables are included: GDP per capita...
The following dataset contains a random sample of countries. Two variables are included: GDP per capita and infant mortality rate per 1,000 live births. Determine the equation of the best fit line and calculate the r-squared. Interpret all findings. If you do not show your work for obtaining each portion of the regression equation and r-squared, you will lose extensive points on this exercise. Country GDP per Capita (USD) Infant Mortality Rate Malaysia 9766.166 6 Slovak Republic 15962.57 5.8 Central...
The dataset HomesForSale has data on houses available for sale in three Mid-Atlantic states (NY, NJ,...
The dataset HomesForSale has data on houses available for sale in three Mid-Atlantic states (NY, NJ, and PA). Table 1 shows the mean and standard deviation from the three Mid-Atlantic states, in thousands of dollars. Use this table, the knowledge that within the US the average house sells for about 265 thousand dollars1, and a 5% significance level to answer the following questions. State n Mean Std. Dev. New York 30 565.6 697.6 New Jersey 30 388.5 224.7 Pennsylvania 30...
The dataset ToyotaCorolla.jmp contains data on used cars on sale during the late summer of 2004...
The dataset ToyotaCorolla.jmp contains data on used cars on sale during the late summer of 2004 in the Netherlands. It has 1436 records containing details on 38 attributes, including Price, Age, Kilometers, HP, and other specifications. (a.) Explore the data using the data visualization (e.g., Graph > Scatterplot Matrix and Graph > Graph Builder) capabilities of JMP. Which of the pairs among the variables seem to be correlated? (three or four correlations please). Multivariate Correlations Price Age_08_04 KM HP CC...
Suppose you take a random sample of 30 individuals from a large population. For this sample,...
Suppose you take a random sample of 30 individuals from a large population. For this sample, the sample mean is 4.2 and sample variance is 49. You wish to estimate the unknown population mean µ. (a) Calculate a 90% confidence interval for µ. (b) Calculate a 95% confidence interval for µ. (c) Based on (a) and (b), comment on what happens to the width of a confidence interval (increase/decrease) when you increase your confidence level. (d) Suppose your sample size...
A sample of 30 houses that were sold in the last year was taken. The value...
A sample of 30 houses that were sold in the last year was taken. The value of the house (Y) was estimated. The independent variables included in the analysis were the number of rooms (X1), the size of the lot (X2), the number of bathrooms (X3), and a dummy variable (X4), which equals 0 if the house does not have a garage and equals 1 otherwise. The following results were obtained: Coefficients Standard Error Intercept 15,232.5 8,462.5 X1 2,178.4 778.0...
Suppose a random sample of 30 customers is taken to test a company’s claim that 85%...
Suppose a random sample of 30 customers is taken to test a company’s claim that 85% of customers are satisfied with their dog food. Assume trials are independent. What is the probability at least 22 customers are satisfied?
A random sample of 10 houses in Loonah, each of which is heated with natural machining,...
A random sample of 10 houses in Loonah, each of which is heated with natural machining, is selected and the amount of gas (safe gas) used during the month of Feb is determined for each house. The resulting observations are 104, 113, 114, 127, 112, 115, 124, 123 161, 120. Assuming that these were selected from a normal population, calculate: a.) a point estimate for the mean (ie. ) b.) an unbiased point estimate for the standard deviation (ie. S)...
The accompanying frequency distribution represents the square footage of a random sample of 500 houses that...
The accompanying frequency distribution represents the square footage of a random sample of 500 houses that are owner occupied year round. Approximate the mean and standard deviation square footage. The mean square footage is x bar = ? Table Square footage Frequency 0 ? 499?1 9 500 ? 999?1 13 ?1,000 - ?1,499 33 ?1,500 - 1,999 115 ?2,000 - 2,499 125 ?2,500 - 2,999 83 ?3,000 - 3,499 45 ?3,500 - ?3,999 41 ?4,000 ? ?4,499 26 ?4,500 ?...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT