A problem of interest is the relationship between the appraised value of a property and its sale price. The sale price for any given property will vary depending on the price set by the seller, the strength of appeal of the property to a specific buyer, and the state of the money and real estate markets. We want to examine the relationship between the mean sale price E(y) of a property and the variables: appraised land value of the property, appraised value of the improvements on the property and neighborhood in which the property is listed. The data were collected from the property appraiser’s office of Hillsborough County, Florida, and saved in the TAMSALES4 file. Four neighborhoods (Hyde Park, Cheval, Hunter’s Green, and Davis Isles), each relatively homogeneous but differing sociologically and in property types and values, were identified within the city and surrounding area.
1. Draw scatter plots between the response Y and the quantitative variables.
2. Using the information, you got from the scatter plot and the knowledge in chapter 4 and 5, propose at least 3 different model to find the relation between the mean sale price E(y) and the variables. Try to build the models in such a way that one is nested in other.
3. Conduct partial F test repeatedly and find a best model from the proposed models.
4. After the final model has been selected, Conduct the Global F test for the adequacy of the model.
5. Perform individual t test if you have doubt with any one of the variables.
6. Report the value of adjusted R-squared and standard error of the model.
7. Depending on your report in step six, explain in your word if the final model is good to use or it may need some improvements.
8. Provide all the R codes you have used. TAMSALES4 file:
Using Rstudio:
FOLIO SALES LNSALES
LAND IMP TOTVAL NBHD
129295710 378.0 5.93489
81.84 243.30 325.14
CHEVAL
129295830 273.0 5.60947
60.48 134.47 194.95
CHEVAL
129453028 321.2 5.77206
115.58 255.42 371.00
CHEVAL
129454156 327.0 5.78996
112.38 185.57 297.96
CHEVAL
129454194 382.5 5.94673
81.23 217.98 299.22
CHEVAL
129454316 242.0 5.48894
61.48 110.28 171.76
CHEVAL
129454374 242.0 5.48894
49.99 109.06 159.06
CHEVAL
129454524 510.0 6.23441
71.28 224.38 295.65
CHEVAL
129454536 285.0 5.65249
63.82 158.83 222.65
CHEVAL
129795122 500.0 6.21461
91.32 296.02 387.34
CHEVAL
129795276 825.0 6.71538
117.01 451.41 568.42
CHEVAL
129795456 515.0 6.24417
117.72 347.87 465.59
CHEVAL
129795468 520.0 6.25383
107.06 329.93 436.99
CHEVAL
129795488 780.0 6.65929
157.29 507.85 665.14
CHEVAL
129795506 825.0 6.71538
131.89 466.27 598.16
CHEVAL
129795510 841.0 6.73459
159.84 443.29 603.13
CHEVAL
129795606 1140.0 7.03878
157.49 563.91 721.41
CHEVAL
129842052 460.0 6.13123
59.36 206.84 266.20
CHEVAL
339720192 285.0 5.65249
39.75 166.61 206.36
HUNTERSGREEN
339721514 350.0 5.85793
54.00 227.43 281.43
HUNTERSGREEN
339721560 314.5 5.75098
48.84 187.13 235.97
HUNTERSGREEN
594030716 183.0 5.20949
34.75 158.05 192.80
HUNTERSGREEN
594030754 200.0 5.29832
29.10 145.80 174.90
HUNTERSGREEN
594030784 775.0 6.65286
142.27 350.43 492.70
HUNTERSGREEN
594030788 505.0 6.22456
99.62 306.43 406.05
HUNTERSGREEN
594030880 425.0 6.05209
47.52 225.74 273.26
HUNTERSGREEN
594030882 335.0 5.81413
49.01 217.35 266.36
HUNTERSGREEN
594031020 760.0 6.63332
96.10 335.59 431.69
HUNTERSGREEN
594031044 410.0 6.01616
75.79 274.22 350.01
HUNTERSGREEN
594031064 335.0 5.81413
67.22 273.66 340.88
HUNTERSGREEN
594033934 325.0 5.78383
47.02 241.87 288.90
HUNTERSGREEN
594033956 380.0 5.94017
47.74 307.20 354.94
HUNTERSGREEN
594033970 640.0 6.46147
45.55 336.44 381.99
HUNTERSGREEN
594034072 262.0 5.56834
51.97 152.96 204.94
HUNTERSGREEN
594034084 190.0 5.24702
51.25 151.68 202.94
HUNTERSGREEN
1176450000 500.0 6.21461
256.13 152.78 408.91
HYDEPARK
1186250000 550.0 6.30992
168.45 310.12 478.56
HYDEPARK
1186260000 205.8 5.32690
168.45 41.60 210.05
HYDEPARK
1692460000 590.0 6.38012
291.09 66.90 357.99
DAVISISLES
1692730000 525.0 6.26340
462.43 114.13 576.56
DAVISISLES
1692990000 340.0 5.82895
199.66 80.43 280.09
DAVISISLES
1693500000 3200.0 8.07091
1004.59 1129.65 2134.23
DAVISISLES
1694120000 500.0 6.21461
358.43 55.89 414.32
DAVISISLES
1694450000 1421.0 7.25912
391.46 550.78 942.24
DAVISISLES
1853130000 858.9 6.75565
242.47 281.62 524.09
HYDEPARK
1853270000 625.0 6.43775
264.00 184.63 448.63
HYDEPARK
1854060000 375.0 5.92693
242.00 33.79 275.79
HYDEPARK
1854250000 1240.0 7.12287
264.00 466.82 730.82
HYDEPARK
1855080000 850.0 6.74524
266.64 333.18 599.82
HYDEPARK
1855250000 2500.0 7.82405
533.28 1221.20 1754.48
HYDEPARK
1855590000 750.0 6.62007
266.64 366.05 632.69
HYDEPARK
1856430000 895.0 6.79682
266.64 358.77 625.41
HYDEPARK
1856800000 460.0 6.13123
264.00 183.54 447.54
HYDEPARK
1856960000 605.0 6.40523
210.00 259.67 469.67
HYDEPARK
1857420000 329.9 5.79879
210.00 105.94 315.94
HYDEPARK
1857460000 462.0 6.13556
210.00 145.19 355.19
HYDEPARK
1858190000 850.0 6.74524
264.00 362.71 626.71
HYDEPARK
1858440000 1525.0 7.32975
335.61 862.34 1197.95
HYDEPARK
1858450000 1400.0 7.24423
350.90 711.34 1062.24
HYDEPARK
1858770000 760.0 6.63332
264.00 257.47 521.47
HYDEPARK
1859190000 520.0 6.25383
250.23 157.75 407.98
HYDEPARK
1859400000 418.0 6.03548
193.20 147.43 340.63
HYDEPARK
1859590000 152.0 5.02388
118.80 106.14 224.94
HYDEPARK
1859620000 330.0 5.79909
118.80 123.98 242.78
HYDEPARK
1860510000 240.0 5.48064
192.01 221.54 413.55
HYDEPARK
1860580000 405.0 6.00389
223.74 122.18 345.92
HYDEPARK
1860890000 650.0 6.47697
203.49 209.11 412.60
HYDEPARK
1860920000 400.0 5.99146
203.49 101.58 305.07
HYDEPARK
1861000000 365.0 5.89990
207.48 112.78 320.26
HYDEPARK
1953960000 745.0 6.61338
291.63 83.22 374.85
DAVISISLES
1954130000 1650.0 7.40853
388.82 1023.36 1412.18
DAVISISLES
1954320000 1250.0 7.13090
648.96 81.01 729.97
DAVISISLES
1954870000 715.0 6.57228
344.12 265.18 609.30
DAVISISLES
1955990000 1215.0 7.10250
341.86 785.66 1127.52
DAVISISLES
1960100000 440.0 6.08677
179.52 58.40 237.92
DAVISISLES
1962380000 690.0 6.53669
357.07 121.25 478.33
DAVISISLES
1964590000 587.5 6.37588
300.93 75.99 376.92
DAVISISLES
1966620000 377.5 5.93357
238.86 43.95 282.81
DAVISISLES
1968140000 810.0 6.69703
195.84 479.00 674.84
DAVISISLES
1969380000 800.0 6.68461
242.47 367.40 609.87
HYDEPARK
In: Statistics and Probability
In the following problem, check that it is appropriate to use
the normal approximation to the binomial. Then use the normal
distribution to estimate the requested probabilities.
Do you try to pad an insurance claim to cover your deductible?
About 41% of all U.S. adults will try to pad their insurance
claims! Suppose that you are the director of an insurance
adjustment office. Your office has just received 136 insurance
claims to be processed in the next few days. Find the following
probabilities. (Round your answers to four decimal places.)
(a) half or more of the claims have been padded
(b) fewer than 45 of the claims have been padded
(c) from 40 to 64 of the claims have been padded
(d) more than 80 of the claims have not been padded
In: Statistics and Probability
A bank manager would like the know the mean wait time for his customers. He randomly selects 25 customers and record the amount of time wait to see a teller. The sample mean is 7.25 minutes with a standard deviation of 1.1 minutes. Construct a 99% confidence interval for the mean wait time. Round your answers to 2 decimal places.
Lower Limit =
Upper Limit =
Write a summary sentence for the confidence interval you calculated
In: Statistics and Probability
A bag contains 3 red marbles, 1 green one, 1 lavender one, 2
yellows, and 3 orange marbles.
How many sets of five marbles include at least two red ones?
A bag contains 3 red marbles, 3 green ones, 1 lavender one, 3
yellows, and 2 orange marbles.
How many sets of five marbles include at most one of the yellow
ones?
A bag contains 2 red marbles, 2 green ones, 1 lavender one, 3
yellows, and 3 orange marbles.
How many sets of five marbles include either the lavender one or
exactly one yellow one but not both colors?
In: Statistics and Probability
1. the probability of type 2 error will be affected by the choice of test statistics. but why???
Please draw the picture to explain and step by step
follow the comment as well
In: Statistics and Probability
Nerve growth factor (NGF) is a protein that has been shown to play a role in the development and maintenance of peripheral sympathetic neurons. One approach to study NGF is to deprive the animal of NGF and study the effect of this deprivation on various cell types. In this study, the effect on the today protein content in the dorsal root ganglia of rats is considered. Two groups of rats are compared: Those born to NGF deficient females (in utero) and those born to normal females but nursed by NGF deficient females (in milk). The data is shown below
In Milk (M): 0.19 , 0.21, 0.21, 0.23, 0.20, 0.22
In Utero( U): 0.12, 0.19, 0.17, 0.20, 0.09, 0.13, 0.21
Researchers are interested in whether the total protein content tends to be higher among rats deprived of NGF in milk.
In: Statistics and Probability
A security consultant has observed that the attempts to breach the security of the companys computer system occurs according to a Poisson process with a mean rate of 3 attempts per day. (The system is on 24 hours per day.)
(a) What is the probability that there will be four breach attempts tomorrow, and two of them will occur during the evening (eight-hour) shift?
In: Statistics and Probability
An insurance company finds that of 629 randomly selected auto
accidents, teenagers were driving the vehicle in 118 of them.
(a) Find the 95% confidence interval for the proportion of auto
accidents with teenaged drivers:
( , ) (Use 4 decimals.)
(b) What does this interval mean?
We are 95% confident that the proportion of all accidents with teenaged drivers is inside the above interval.
We are 95% confident that a randomly chosen accident with a teenaged driver will fall inside the above interval.
We are 95% confident that of the 629 sampled accidents, the proportion with a teenaged driver falls inside the above interval.
We are 95% confident that the percent of accidents with teenaged drivers is 18.8%.
(c) What does the 95% confidence level mean?
We expect that 95% of random samples of size 629 will
produce ---(A) a sample proportion (B) a true proportion
(C) confidence intervals --- that contain(s) the --- (A)
sample proportion (B) confidence interval (C) true proportion ---
of accidents that had teenaged drivers.
(d) A politician urging tighter restrictions on drivers' licenses
issued to teens says, "One of every five auto accidents has a
teenaged driver." Does the confidence interval support or
contradict this statement?
The confidence interval supports the assertion of the politician. The figure quoted by the politician is inside the interval.
The confidence interval supports the assertion of the politician. The figure quoted by the politician is outside the interval.
The confidence interval contradicts the assertion of the politician. The figure quoted by the politician is outside the interval.
The confidence interval contradicts the assertion of the politician. The figure quoted by the politician is inside the interval.
In: Statistics and Probability
A marketing research firm wishes to study the relationship
between wine consumption and whether a person likes to watch
professional tennis on television. One hundred randomly selected
people are asked whether they drink wine and whether they watch
tennis. The following results are obtained:
Watch Tennis |
Do Not Watch Tennis |
Totals | |
Drink Wine | 8 | 42 | 50 |
Do Not Drink Wine | 12 | 38 | 50 |
Totals | 20 | 80 | 100 |
(a) For each row and column total, calculate the corresponding row or column percentage.
Row 1 | % |
Row 2 | % |
Column 1 | % |
Column 2 | % |
(b) For each cell, calculate the corresponding
cell, row, and column percentages. (Round your answers to
the nearest whole number.)
Watch Tennis |
Do Not Watch Tennis |
||
Drink Wine | Cell= % | Cell= % | |
Row= % | Row= % | ||
Column= % | Column= % | ||
Do Not Drink Wine | Cell= % | Cell= % | |
Row= % | Row= % | ||
Column= % | Column= % | ||
(c) Test the hypothesis that whether people drink wine is independent of whether people watch tennis. Set α = .05. (Round your answer to 3 decimal places.)
χ2χ2 =
(Click to select)RejectDo not reject H0. Conclude that whether a person drinks wine and whether a person watches tennis are (Click to select)IndependentDependent events.
In: Statistics and Probability
Given the data below, what is the estimated bias given a reference value of 24.521?
Data
24.9872
24.2675
25.2027
24.6919
24.9361
25.5176
24.8503
24.1735
24.8481
24.898
24.8747
24.6096
24.6116
24.9186
25
24.519
25.7207
24.9817
25.2276
24.919
In: Statistics and Probability
In 625 at bats a baseball player got a hit 28% of the time. Find a 95% confidence interval for the true percentage of time he gets a hit.
In: Statistics and Probability
You are given the following hypotheses: Null hypothesis: p =
0.3
Alternative hypothesis: ? ≠ 0.30
You decide to take a sample of size 90. Suppose we will reject the null hypothesis if the probability of an outcome as surprising as ?̂ occurring is less than 5%. (i.e., a “p-value” of .05). What values ?̂ would cause us to reject the null hypothesis? Hint: Your answer should be “if ?̂ is anything bigger than ____ or anything smaller than____.”
In: Statistics and Probability
Dual-energy X-ray absorptiometry (DXA) is a technique for measuring bone health. One of the most common measures is total body bone mineral content (TBBMC). A highly skilled operator is required to take the measurements. Recently, a new DXA machine was purchased by a research lab, and two operators were trained to take the measurements. TBBMC for eight subjects was measured by both operators. The units are grams (g). A comparison of the means for the two operators provides a check on the training they received and allows us to determine if one of the operators is producing measurements that are consistently higher than the other. Here are the data.
Subject Operator
1 2 3 4 5 6 7 8 1 1.326 1.338 1.077 1.228 0.938 1.008 1.182 1.288
2 1.323 1.322 1.073 1.233 0.934 1.019 1.184 1.304
(a) Take the difference between the TBBMC recorded for Operator 1 and the TBBMC for Operator 2. (Use Operator 1 minus Operator 2. Round your answers to four decimal places.)
x =
s =
Describe the distribution of these differences using words.
The distribution is right skewed.
The distribution is uniform.
The distribution is Normal.
The distribution is left skewed.
The sample is too small to make judgments about skewness or symmetry.
(b) Use a significance test to examine the null hypothesis that the two operators have the same mean. Give the test statistic. (Round your answer to three decimal places.) t =
Give the degrees of freedom.
Give the P-value. (Round your answer to four decimal places.)
Give your conclusion. (Use the significance level of 5%.)
We can reject H0 based on this sample.
We cannot reject H0 based on this sample.
(c) The sample here is rather small, so we may not have much power to detect differences of interest. Use a 95% confidence interval to provide a range of differences that are compatible with these data. (Round your answers to four decimal places.) ,
(d) The eight subjects used for this comparison were not a random sample. In fact, they were friends of the researchers whose ages and weights were similar to the types of people who would be measured with this DXA machine. Comment on the appropriateness of this procedure for selecting a sample, and discuss any consequences regarding the interpretation of the significance-testing and confidence interval results.
The subjects from this sample, test results, and confidence interval are representative of future subjects.
The subjects from this sample may be representative of future subjects, but the test results and confidence interval are suspect because this is not a random sample.
In: Statistics and Probability
Think of an example of a study where randomization is not feasible. Put the example in the Potential Outcome Model Framework. Include the outcome variable and treatment. State the counterfactuals.
In: Statistics and Probability
A health psychologist seeks to predict compliance with medical prescriptions using a self-report measure of methodicalness (M) and a health knowledge test (HK), the latter scored as pass (1) / fail (0). Out of 120 participants available at the study's outset, only 30 are still involved at the 6-month point, when compliance (C) is assessed. Information on all three measures and their interrelationships is provided below.
Time 1 Time 2 Correlations/
(N = 120) (N = 30) reliabilities*
__________________ __________________ ______________________
Measure Range Mean SD Mean SD M HK C
Methodicalness 3 – 20 12.00 4.00 15.00 3.30 .82
Health Knowledge 0 – 1 .50 .50 .80 .40 .32 .62
Compliance 6 – 15 11.00 1.80 .40 .28 .72
*Reliabilities are shown in bold.
a) What would be the observed criterion validity of the M scale if the full range of M scores from time 1 was available for correlation with prescription compliance? Also, what explains the drop in SD from T1 to T2? (3 pts.)
In: Statistics and Probability