Question

In: Statistics and Probability

The file Stat8_prob3.txt contains data of 100 Tarrant County houses (in 1900) on variables such as value (VALUE)

 

The file Stat8_prob3.txt contains data of 100 Tarrant County houses (in 1900) on variables such as value (VALUE), size in square feet (SIZE), a physical condition index (CONDITION), and a depreciation factor (DEPRECIATION).

(a) Fit the model to predict VALUE using SIZE, CONDITION, and DEPRECIATION as the predictor variables.

(b) Plot the residuals e against the fitted values y^i. What departures from the regression model assumptions can you see?

(c) If any of the assumptions in part (b) have been violated, suggest a possible transformation. (Hint: Apply the Box-Cox transformation.)

(d) Fit the new model and report the new fitted regression line.

(e) Calculate the new residuals and fitted values and use the appropriate plot(s) to comment on whether the assumption of part (b) is now satisfied.

Here is the data Stat8_prob3.txt:

"VALUE","SIZE","DEPRECIATION","CONDITION"
23974,1442,.4,0
24087,1426,.4,0
16781,1632,.5,0
29061,910,.5,.18
37982,972,.55,.18
29433,912,.55,.18
33624,1400,.45,.05
27032,1087,.45,.18
28653,1139,.45,.18
33075,1386,.55,.05
17474,756,.5,.05
33852,1044,.5,.07
29046,1032,.5,0
20715,720,.55,0
19461,734,.5,0
21377,720,.5,0
52881,1635,.6,.02
43889,1381,.55,.02
45134,1372,.55,.02
47655,1349,.6,.02
53088,1599,.6,.02
38923,1171,.5,.02
57870,1966,.55,.02
30489,1504,.45,0
29207,1296,.35,0
44919,1356,.55,.12
48090,1553,.55,.1
40521,1142,.55,.1
43403,1268,.55,.1
38112,1008,.55,.1
27710,1120,.5,0
27621,960,.6,0
22258,920,.35,0
29064,1259,.5,0
12001,783,.4,0
37650,1874,.35,.02
27930,1242,.5,0
16066,772,.4,0
20411,908,.45,0
23672,1155,.45,0
24215,1004,.5,0
22020,958,.45,0
52863,1828,.6,.02
41822,1146,.6,.02
45104,1368,.6,.02
28154,1392,.65,.24
20943,1058,.65,.24
17851,1375,.55,.26
16616,648,.4,.06
38752,1313,.5,0
44377,1780,.55,0
43566,1148,.55,.32
38950,1363,.55,.32
44633,1262,.55,.32
12372,840,.35,0
12148,840,.4,0
19852,839,.5,0
20012,852,.55,0
20314,852,.55,0
22814,974,.55,0
24696,1135,.5,0
23443,1170,.7,.02
35904,960,.5,0
21799,1052,.5,0
28212,1296,.55,0
27553,1282,.55,0
15826,916,.35,0
18660,864,.5,0
21536,1404,.4,0
24147,1676,.4,0
17867,1131,.4,0
21583,1397,.4,0
15482,888,.4,0
24857,1448,.45,0
17716,1022,.45,0
224182,2251,.75,.04
182012,1126,.55,.04
201597,2617,.9,.03
49683,966,.6,.05
60647,1469,.65,.05
49024,1322,.7,.02
52092,1509,.65,.02
55645,1724,.65,.04
51919,1559,.65,.02
55174,2133,.55,0
48760,1233,.55,0
45906,1323,.55,0
52013,1733,.55,0
56612,1357,.6,0
69197,1234,.6,.17
84416,1434,.6,.15
60962,1384,.55,.17
47359,995,.55,.05
56302,1372,.65,.14
88285,1774,.7,.06
91862,1903,.7,.08
242690,3581,.8,.07
296251,4343,.8,.04
107132,1861,.75,.08
77797,1542,.65,.3

Solutions

Expert Solution

## first R code

## now output


Related Solutions

[In Python] Write a program that takes a .txt file as input. This .txt file contains...
[In Python] Write a program that takes a .txt file as input. This .txt file contains 10,000 points (i.e 10,000 lines) with three co-ordinates (x,y,z) each. From this input, use relevant libraries and compute the convex hull. Now, using all the points of the newly constructed convex hull, find the 50 points that are furthest away from each other, hence giving us an evenly distributed set of points.
The variables in the file are Price -Average selling price of houses Location -A code to...
The variables in the file are Price -Average selling price of houses Location -A code to indicate the location of the house Condition -A code to indicate the physical condition of the house Bedrooms Number of bedrooms in the house Bathrooms Number of bathrooms in the house Other Rooms Number of other rooms in the house (a) Run a regression of Price on Location, Condition, Bedrooms, Bathrooms and Other Rooms. Please attach your Excel file. (b) What variables seem to...
Python: The file, Program11.txt, on the I: drive contains a chronological list of the World Series’...
Python: The file, Program11.txt, on the I: drive contains a chronological list of the World Series’ winning teams from 1903 through 2018. The first line in the file is the name of the team that won in 1903, and the last line is the name of the team that won in 2018. (Note that the World Series was not played in 1904 or 1994. There are no entries in the file indicating this.) Write a program that reads this file...
The file CO2.txt, found on Blackboard with this assignment, contains 50 numbers, which represent the concentration...
The file CO2.txt, found on Blackboard with this assignment, contains 50 numbers, which represent the concentration of atmospheric carbon dioxide (parts per million) recorded at Mauna Loa, HI. The data in the file are the CO2 values on May 15th of each year from 1961 through 2010, (with background level CO2 removed). Fit an exponential model. Use the model to predict the CO2 value for May 15, 2015. Print the result to the screen using fprintf. The actual value was...
If the file circuit.txt contains the following data
Exercise 2: If the file circuit.txt contains the following data 3.0             2.1 1.5             1.1 2.6             4.1 The first column is voltage and the second column is the electric current. Write program that reads the voltages and currents then calculates the electric power (P) based on the equation: Voltage     Current        Power 3.0             2.1              (result) 1.5             1.1              (result) 2.6             4.1              (result)                         P = v * i Write your output to the file results.txt with voltage in the first, current in the second...
Please use R to solve part e and f The data file data2.txt gives a data...
Please use R to solve part e and f The data file data2.txt gives a data set with two variables x and y. The first column in the data set is just row numbers not useful for this question. (e) Use the Shapiro-Wilks test to test for Normality of the data. State your null and alternative hypotheses, p-value and conclusion. Use α = 0.05 (f) Apply the transformation y 0 = log(y) and run the regression on y 0 on...
(Write/read data) Write a Program in BlueJ to create a file name Excersise12_15.txt if it does...
(Write/read data) Write a Program in BlueJ to create a file name Excersise12_15.txt if it does not exist. Write 100 integers created randomly into the file using text I/O. Integers are separated by spaces in the file. Read data back from the file and display the data in increasing order. After writing the file to disk, the input file should be read into an array, sorted using the static Arrays.sort() method from the Java API and then displayed in the...
IN PYTHON File Data --- In file1.txt add the following numbers, each on its own line...
IN PYTHON File Data --- In file1.txt add the following numbers, each on its own line (20, 30, 40, 50, 60). Do not add data to file2.txt. Write a program. Create a new .py file that reads in the data from file1 and adds all together. Then output the sum to file2.txt. Add your name to the first line in file2.txt (see sample output) Sample Output Your Name 200 use a main function.
. The attached file contains the six variables. I have already attempted this answer and got...
. The attached file contains the six variables. I have already attempted this answer and got it wrong. Please ignore the checkmarks. Question Using the information below select all of the variables that are dichotomous (i.e., two categories). QN88 QN33 _SMOKER3 _SLEPTIM1 QN44 _RFBING5 Behavioral Risk Factor Surveillance System (BRFSS 2016) Calculated Variables https://www.cdc.gov/brfss/annual_data/2016/pdf/2016_calculated_variables_version4.pdf Youth Risk Behavior Surveillance System (YRBSS 2015) YRBS Data User's Guide https://www.cdc.gov/healthyyouth/data/yrbs/pdf/2015/2015_yrbs-data-users_guide_smy_combined.pdf
Assume there is a file called "mydata". each line of the file contains two data items
how do you read in a file in JAVA Assume there is a file called "mydata". each line of the file contains two data items: hours and rate. hours is the represented by the number of hours the worker worked and rate is represented as hourly rate of pay. The first item of data is count indicating how many lines of data are to follow.Methodspay- accepts the number of hours worked and the rate of pay. returns the dollor and cents...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT