Question

In: Statistics and Probability

House Prices: Notes on variables: Price: In thousands of dollars Ac: air conditioner (1 if yes,...

House Prices:

Notes on variables:

Price: In thousands of dollars

Ac: air conditioner (1 if yes, 0 if no)

Size: In square feet

Age: In years

Pool: (1 if yes, 0 if no)

Bedrms: Number of Bedrooms

Baths: Number of Bathrooms

4)   a. Make a Scatter plot where x=size and y=price

       b. Calculate the Least Squares Regression (LSR) equation:

       c. Does the intercept make sense? Why or why not?

       c. Find and interpret the R2

d. Predict the average price of a home that is 1000 square feet.

e. Better interpretation of the slope: If the square footage increased by 40 square feet, then how more would you expect the house to cost?

f. Plot the residuals vs. X-values.

g. Use the residuals plot or the regression line to identify any outliers. (If there are any.)

5) Looking at Heights again. Remember women have a mean height m= 65 in. and a standard deviation s =2.5 in. and men have a mean height of m=70 and a standard deviation s= 3

       a. What proportion of women are taller than 69 in?

b. What height for men is the 65th percentile (larger than 65% of the data)?

c. What proportion/percentage of men have height between 68 to 75?

price ac size age pool bedrms baths
107 0 736 39 0 2 1
133 0 720 63 0 2 1
141 0 768 66 0 2 1
165 0 929 41 0 3 1
170 0 1080 44 0 3 1
173 0 942 65 0 2 1
182 0 1000 40 0 3 1
200 0 1472 66 0 3 2
220 0 1200 69 0 3 1
226 0 1302 49 0 3 2
260 1 2109 37 0 3 2
275 0 1528 41 0 2 2
280 0 1421 41 1 3 2
289 1 1753 1 0 3 2
295 0 1528 32 0 3 2
300 0 1643 29 1 3 2
310 0 1675 63 0 3 2
315 1 1714 38 0 3 2
350 0 2150 75 0 4 2
365 1 2206 28 0 4 2.5
503 1 3269 5 0 4 2.5
135 0 936 75 0 2 1
147 0 728 40 0 2 1
165 0 1014 26 0 2 1
175 1 1661 27 0 3 2
190 0 1248 42 0 3 1
191 0 1834 40 1 3 2
195 0 989 41 0 3 1
205 0 1232 43 0 2 2
210 0 1017 38 0 2 1
215 0 1216 77 0 2 1
228 0 1447 44 0 2 2
242 0 1974 65 1 4 2
250 1 1600 63 0 3 2
250 0 1168 63 1 3 1
255 0 1478 50 0 3 2
255 0 1756 36 1 3 2
265 0 1542 38 0 3 2
265 0 1633 32 1 4 2
275 1 1500 42 0 2 2
285 0 1734 62 1 3 2
365 1 1900 42 0 3 2
397 1 2468 10 0 4 2.5

Solutions

Expert Solution

4. Using simple R coding, we may complete the required objectives. But first, we should copy the above data to a text file (like notepad), and save it.

The R code to enter the data to a data frame would be as below.

> library(readr)

> dat <- read_delim("<location of the text file>", "\t", escape_double = FALSE, trim_ws = TRUE)

(a) The scatter plot and the required R commands would be as below.

> plot(x=dat$size,y=dat$price)

(b) The R command and regression result would be as below.

> lm(price ~ size, data = dat)

Call:
lm(formula = price ~ size, data = dat)

Coefficients:
(Intercept)         size  
    39.9621       0.1376

The regression equation would be .

The intercept represents the average price when size is zero, ie . The intercept doesn't makes sense because size of the house can not be zero.

(c) The command to find the R-squared and the resutls are as below.

> summary(lm(price ~ size, data = dat))$r.squared
[1] 0.7940702

Hence, the r-squared is 0.7940702 or 0.79 (approx).

(d) For size be 1000, we have . Hence, average price of 1000 squared feet house is $177.56.

(e) The slope is the ratio of change in average price and change in size, as . If change in size is 40, then or . Hence, for an increase in size of 40, the price rises by $5.504 or $5.5.

(f) The R command for residual plot and the plot itself would be as below.

> plot(x=dat$size,y=resid(lm(price ~ size, data = dat)))

(g) The outliers can be seen in the residual plots as datapoints for which the corresponding residuals are below -50. Except those 4 points, all the residuals lies between +50 and -50.

The corresponding datapoints can be found the the R command below.

> dat[resid(lm(price ~ size, data = dat))< -50,c(1,3)]
# A tibble: 4 x 2
  price  size
  <int> <int>
1   260  2109
2   175  1661
3   191  1834
4   242  1974

The command "resid(lm(price ~ size, data = dat))< -50" checks the residuals that are below -50, and returns TRUE if they are and false if they are not, and returns a vector of TRUE and FALSE in total. The [ , ] is used to slice the rows and columns for which the pre "," is TRUE and slices 1st and 3rd column by using the vector "c(1,3)".


Related Solutions

House prices: The following table presents prices, in thousands of dollars, of single-family homes for 20...
House prices: The following table presents prices, in thousands of dollars, of single-family homes for 20 of the 25 largest metropolitan areas in the United States for the third quarter of 2012 and the third quarter of 2013. Metro Area 2012 2013 Metro Area 2012 2013 Atlanta, GA 87.8 115.1 Philadelphia, PA 193.5 197.7 Baltimore, MD 218.1 226.5 Phoenix, AZ 129.9 169.0 Boston, MA 311.5 332.2 Portland, OR 208.6 246.5 Chicago, IL 157.2 159.4 Riverside, CA 174.3 216.7 Cincinnati, OH...
An air conditioner draws 17 A at 220-V ac. The connecting cord is copper wire with...
An air conditioner draws 17 A at 220-V ac. The connecting cord is copper wire with a diameter of 1.291 mm . a.) How much power does the air conditioner draw? b.) If the length of the cord (containing two wires) is 6.5 m , how much power is dissipated in the wiring? c.) If no. 12 wire, with a diameter of 2.053 mm, was used instead, how much power would be dissipated? d.) Assuming that the air conditioner is...
Number 3 (10 pts) You collect data on house sale prices (in thousands of dollars), along...
Number 3 (10 pts) You collect data on house sale prices (in thousands of dollars), along with the number of bedrooms of the house, and the size of the house measured in square feet. Running a regression in Excel with the sale price as the response variable gives the following output: SUMMARY OUTPUT Regression Statistics Multiple R 0.724573411 R Square 0.525006628 Adjusted R Square 0.419452545 Standard Error 193.4364724 Observations 12 ANOVA df SS MS F Significance F Regression=Model 2 372217.2302...
House prices in Southern California reached a record median price of $505,000 in 2007 (nominal dollars)....
House prices in Southern California reached a record median price of $505,000 in 2007 (nominal dollars). In 2018 the median price was $507,500 (again nominal dollars), surpassing the previous record. Suppose the CPI in 2007 was 217.39 and 256.21 in 2018. Which year had the higher real price in 2018 dollars and by how much? 2. Let the market demand for a product be described by P = 40 - 0.02*Q.  The market supply curve is P = 20 + 0.03*Q....
The asking prices in thousands of dollars for 25 single family residences listed in a city...
The asking prices in thousands of dollars for 25 single family residences listed in a city in California are given below. 370.0 269.0 290.9 398.0 326.0 218.9 249.9 239.0 498.9 334.0 288.9 324.0 328.0 300.0 361.0 294.0 319.0 330.0 366.0 339.0 310.9 294.0 270.9 349.0 326.0 (a) Locate the largest and smallest prices and use the range to approximate the standard deviation (in thousands of dollars). $ thousand (b) Calculate the sample mean x (in thousands of dollars). x =...
You want to compare housing prices in your community (measured in thousands of dollars) to the...
You want to compare housing prices in your community (measured in thousands of dollars) to the national values of µ= 496, σ= 36.3. Recently, 4 houses on your street were sold for a mean price of 476. Report the standard error of the mean and Z in your answers to the following questions. Can you conclude that the mean price of housing in your community is different from the national mean at a statistically significant level? How would your conclusion...
Air enters a window air conditioner at 1 atm, 36oC and 75% relative humidity (??) at...
Air enters a window air conditioner at 1 atm, 36oC and 75% relative humidity (??) at a rate of 12m3/min and it leaves as saturated air at 18oC. Part of the moisture in the air which condenses during the process is also removed at 18oC. Determine (a) the rate of heat (Q?) and (b) moisture removal from the air. (c) What-if Scenario: What would the rate of heat removal be if moist air entered the dehumidifier at 95 kPa instead...
Outside air at 10 degC, 1 bar and 40% relative humidity enters an air conditioner operating...
Outside air at 10 degC, 1 bar and 40% relative humidity enters an air conditioner operating at steady state with a mass flow rate of 1.5kg/s. The air is first heated at essentially constant pressure to 30 degC. Liquid water at 15 degC is then injected, bringing the air to 25 degC, 1bar. Determine (a) the rate of heat transfer to the air passing through the heating section, in kJ/s. (b) the rate of water is injected, in kg/s. (c)...
A realtor in Oregon found that the selling price (in thousand dollars) of the house and...
A realtor in Oregon found that the selling price (in thousand dollars) of the house and the age of the house (how long since it was built) was highly correlated (r=–0.6). She collected some data and derived the following regression model. Y = 260 – 2.5X (1) Given above information, what would be the expected selling price for a house that was build 20 years ago? (2) If another house is 8 years newer than the one mentioned in (1),...
Amounts are in thousands of dollars (except number of shares and price per share):    Kiwi...
Amounts are in thousands of dollars (except number of shares and price per share):    Kiwi Fruit Company Balance Sheet   Cash and equivalents $ 570   Operating assets 650   Property, plant, and equipment 2,700   Other assets 110   Total assets $ 4,030   Current liabilities $ 920   Long-term debt 1,280   Other liabilities 120   Total liabilities $ 2,320   Paid in capital $ 340   Retained earnings 1,370   Total equity $ 1,710   Total liabilities and equity $ 4,030      Kiwi Fruit Company Income Statement   Net sales...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT