Question

In: Statistics and Probability

The dataset HomesForSaleCA contains a random sample of 30 houses for sale in California. Suppose that...

The dataset HomesForSaleCA contains a random sample of 30 houses for sale in California. Suppose that we are interested in predicting the Size (in thousands of square feet) for such homes.

State   Price   Size    Beds    Baths
CA      500     3.2     5       3.5
CA      995     3.7     4       3.5
CA      609     2.2     4       3
CA      1199    2.8     3       2.5
CA      949     1.4     3       2
CA      415     1.7     3       2.5
CA      895     2.1     3       2
CA      775     1.6     3       3
CA      109     0.6     1       1
CA      5900    4.8     4       4.5
CA      219     1.1     3       2
CA      255     1.2     3       2
CA      86      0.6     1       1
CA      62      1.2     3       2
CA      165     1.9     5       3.5
CA      1695    6.9     5       5.5
CA      499     1.4     3       2
CA      47      1.5     3       2
CA      195     2       3       2.5
CA      775     1       2       2
CA      199     1.4     3       2
CA      480     3       5       3
CA      173     0.9     3       1
CA      189     2.5     2       2
CA      230     1.7     3       2
CA      380     2.1     5       3
CA      110     0.8     2       1
CA      499     1.3     3       2
CA      399     1.4     3       2
CA      2450    5       4       5

1. What is the total variability in the sizes of the 30 homes in this sample? (Hint: Try a regression ANOVA with any of the other variables as a predictor.)

2. What other variable in the HomesForSaleCA dataset explains the greatest amount of the total variability in home sizes? Explain how you decide on the variable.

3. How much of the total variability in home sizes is explained by the "best" variable identified in question 2? Give the answer both as a raw number and as a percentage.

4. Which of the variables in the dataset is the weakest predictor of home sizes? How much of the variability does it explain?

5. Is the weakest predictor identified in question 4 still an effective predictor of home sizes? Include some justification for your answer.

thank you for your help!

Expert Solution

Solution :

The regression with Price is:

	r²	0.430
	r	0.656
	Std. Error	1.096
	n	30
	k	1
	Dep. Var.	Size

ANOVA table
Source	SS	df	MS	F	p-value
Regression	25.36173499	1	25.36173499	21.10	.0001
Residual	33.65826501	28	1.20208089
Total	59.02000000	29


Regression output				confidence interval
variables	coefficients	std. error	t (df=28)	p-value	95% lower	95% upper
Intercept	1.4987
Price	0.0008	0.00018305	4.593	.0001	0.0005	0.0012

The regression with Beds is:

	r²	0.412
	r	0.642
	Std. Error	1.113
	n	30
	k	1
	Dep. Var.	Size

ANOVA table
Source	SS	df	MS	F	p-value
Regression	24.3432	1	24.3432	19.66	.0001
Residual	34.6768	28	1.2385
Total	59.0200	29


Regression output				confidence interval
variables	coefficients	std. error	t (df=28)	p-value	95% lower	95% upper
Intercept	-0.6617
Beds	0.8541	0.1927	4.434	.0001	0.4595	1.2488

The regression with Baths is:

	r²	0.846
	r	0.920
	Std. Error	0.570
	n	30
	k	1
	Dep. Var.	Size

ANOVA table
Source	SS	df	MS	F	p-value
Regression	49.9270	1	49.9270	153.74	6.86E-13
Residual	9.0930	28	0.3247
Total	59.0200	29


Regression output				confidence interval
variables	coefficients	std. error	t (df=28)	p-value	95% lower	95% upper
Intercept	-0.8648
Baths	1.1859	0.0956	12.399	6.86E-13	0.9900	1.3818

(1) Total variability = 59.0200

(2) Baths explain the highest variability in Size (84.6%)

(3) 84.6%

(4) Beds is the weakest predictor. It explains 41.2% of the variability.

(5) Yes, it is effective since it's p-value is significant and there is a linear relationship of Beds with Size.

Please give me a thumbs-up if this helps you out. Thank you!

orchestra answered 1 year ago

Homes for Sale The dataset HomesForSale has data on houses available for sale in three Mid-Atlantic...

Homes for Sale The dataset HomesForSale has data on houses available for sale in three Mid-Atlantic states (NY, NJ, and PA). Table 1 shows the mean and standard deviation from the three Mid-Atlantic states, in thousands of dollars. Use this table, the knowledge that within the US the average house sells for about 265 thousand dollars1, and a 5% significance level to answer the following questions. State n Mean Std. Dev. New York 30 565.6 697.6 New Jersey 30 388.5...

The following dataset contains a random sample of lifetime of 120 BMW Xenon headlight bulbs. The...

The following dataset contains a random sample of lifetime of 120 BMW Xenon headlight bulbs. The manufacturer of these bulbs wants to know whether it can claim that the bulbs last more than 1000 hours. Use a=0.01 93 1053.74 94 1032.37 95 1003.42 96 908.74 97 1037.92 98 1096.83 99 1169.07 100 842.85 101 1038.22 102 973.30 103 996.08 104 984.70 105 989.44 106 1023.07 107 1047.95 108 1073.63 109 1021.51 110 977.27 111 928.44 112 910.28 113 990.68 114...

The following dataset contains a random sample of countries. Two variables are included: GDP per capita...

The following dataset contains a random sample of countries. Two variables are included: GDP per capita and infant mortality rate per 1,000 live births. Determine the equation of the best fit line and calculate the r-squared. Interpret all findings. If you do not show your work for obtaining each portion of the regression equation and r-squared, you will lose extensive points on this exercise. Country GDP per Capita (USD) Infant Mortality Rate Malaysia 9766.166 6 Slovak Republic 15962.57 5.8 Central...

The dataset HomesForSale has data on houses available for sale in three Mid-Atlantic states (NY, NJ,...

The dataset HomesForSale has data on houses available for sale in three Mid-Atlantic states (NY, NJ, and PA). Table 1 shows the mean and standard deviation from the three Mid-Atlantic states, in thousands of dollars. Use this table, the knowledge that within the US the average house sells for about 265 thousand dollars1, and a 5% significance level to answer the following questions. State n Mean Std. Dev. New York 30 565.6 697.6 New Jersey 30 388.5 224.7 Pennsylvania 30...

The dataset ToyotaCorolla.jmp contains data on used cars on sale during the late summer of 2004...

The dataset ToyotaCorolla.jmp contains data on used cars on sale during the late summer of 2004 in the Netherlands. It has 1436 records containing details on 38 attributes, including Price, Age, Kilometers, HP, and other specifications. (a.) Explore the data using the data visualization (e.g., Graph > Scatterplot Matrix and Graph > Graph Builder) capabilities of JMP. Which of the pairs among the variables seem to be correlated? (three or four correlations please). Multivariate Correlations Price Age_08_04 KM HP CC...

Suppose you take a random sample of 30 individuals from a large population. For this sample,...

Suppose you take a random sample of 30 individuals from a large population. For this sample, the sample mean is 4.2 and sample variance is 49. You wish to estimate the unknown population mean µ. (a) Calculate a 90% confidence interval for µ. (b) Calculate a 95% confidence interval for µ. (c) Based on (a) and (b), comment on what happens to the width of a confidence interval (increase/decrease) when you increase your confidence level. (d) Suppose your sample size...

A sample of 30 houses that were sold in the last year was taken. The value...

A sample of 30 houses that were sold in the last year was taken. The value of the house (Y) was estimated. The independent variables included in the analysis were the number of rooms (X1), the size of the lot (X2), the number of bathrooms (X3), and a dummy variable (X4), which equals 0 if the house does not have a garage and equals 1 otherwise. The following results were obtained: Coefficients Standard Error Intercept 15,232.5 8,462.5 X1 2,178.4 778.0...

(a) A random sample of 10 houses in a particular area, each of which is heated...

(a) A random sample of 10 houses in a particular area, each of which is heated with natural gas, is selected and the amount of gas (therms) used during the month of January is determined for each house. The resulting observations are 118, 122, 146, 80, 141, 103, 138, 99, 109, 125. Let μ denote the average gas usage during January by all houses in this area. Compute a point estimate of μ. therms (b) Suppose there are 20,000 houses...

Suppose a random sample of 30 customers is taken to test a company’s claim that 85%...

Suppose a random sample of 30 customers is taken to test a company’s claim that 85% of customers are satisfied with their dog food. Assume trials are independent. What is the probability at least 22 customers are satisfied?

A random sample of 10 houses in Loonah, each of which is heated with natural machining,...

A random sample of 10 houses in Loonah, each of which is heated with natural machining, is selected and the amount of gas (safe gas) used during the month of Feb is determined for each house. The resulting observations are 104, 113, 114, 127, 112, 115, 124, 123 161, 120. Assuming that these were selected from a normal population, calculate: a.) a point estimate for the mean (ie. ) b.) an unbiased point estimate for the standard deviation (ie. S)...