Questions
A problem of interest is the relationship between the appraised value of a property and its...

A problem of interest is the relationship between the appraised value of a property and its sale price. The sale price for any given property will vary depending on the price set by the seller, the strength of appeal of the property to a specific buyer, and the state of the money and real estate markets. We want to examine the relationship between the mean sale price E(y) of a property and the variables: appraised land value of the property, appraised value of the improvements on the property and neighborhood in which the property is listed. The data were collected from the property appraiser’s office of Hillsborough County, Florida, and saved in the TAMSALES4 file. Four neighborhoods (Hyde Park, Cheval, Hunter’s Green, and Davis Isles), each relatively homogeneous but differing sociologically and in property types and values, were identified within the city and surrounding area.

1. Draw scatter plots between the response Y and the quantitative variables.

2. Using the information, you got from the scatter plot and the knowledge in chapter 4 and 5, propose at least 3 different model to find the relation between the mean sale price E(y) and the variables. Try to build the models in such a way that one is nested in other.

3. Conduct partial F test repeatedly and find a best model from the proposed models.

4. After the final model has been selected, Conduct the Global F test for the adequacy of the model.

5. Perform individual t test if you have doubt with any one of the variables.

6. Report the value of adjusted R-squared and standard error of the model.

7. Depending on your report in step six, explain in your word if the final model is good to use or it may need some improvements.

8. Provide all the R codes you have used. TAMSALES4 file:

Using Rstudio:

FOLIO   SALES   LNSALES   LAND   IMP   TOTVAL   NBHD
129295710   378.0   5.93489   81.84   243.30   325.14   CHEVAL
129295830   273.0   5.60947   60.48   134.47   194.95   CHEVAL
129453028   321.2   5.77206   115.58   255.42   371.00   CHEVAL
129454156   327.0   5.78996   112.38   185.57   297.96   CHEVAL
129454194   382.5   5.94673   81.23   217.98   299.22   CHEVAL
129454316   242.0   5.48894   61.48   110.28   171.76   CHEVAL
129454374   242.0   5.48894   49.99   109.06   159.06   CHEVAL
129454524   510.0   6.23441   71.28   224.38   295.65   CHEVAL
129454536   285.0   5.65249   63.82   158.83   222.65   CHEVAL
129795122   500.0   6.21461   91.32   296.02   387.34   CHEVAL
129795276   825.0   6.71538   117.01   451.41   568.42   CHEVAL
129795456   515.0   6.24417   117.72   347.87   465.59   CHEVAL
129795468   520.0   6.25383   107.06   329.93   436.99   CHEVAL
129795488   780.0   6.65929   157.29   507.85   665.14   CHEVAL
129795506   825.0   6.71538   131.89   466.27   598.16   CHEVAL
129795510   841.0   6.73459   159.84   443.29   603.13   CHEVAL
129795606   1140.0   7.03878   157.49   563.91   721.41   CHEVAL
129842052   460.0   6.13123   59.36   206.84   266.20   CHEVAL
339720192   285.0   5.65249   39.75   166.61   206.36   HUNTERSGREEN
339721514   350.0   5.85793   54.00   227.43   281.43   HUNTERSGREEN
339721560   314.5   5.75098   48.84   187.13   235.97   HUNTERSGREEN
594030716   183.0   5.20949   34.75   158.05   192.80   HUNTERSGREEN
594030754   200.0   5.29832   29.10   145.80   174.90   HUNTERSGREEN
594030784   775.0   6.65286   142.27   350.43   492.70   HUNTERSGREEN
594030788   505.0   6.22456   99.62   306.43   406.05   HUNTERSGREEN
594030880   425.0   6.05209   47.52   225.74   273.26   HUNTERSGREEN
594030882   335.0   5.81413   49.01   217.35   266.36   HUNTERSGREEN
594031020   760.0   6.63332   96.10   335.59   431.69   HUNTERSGREEN
594031044   410.0   6.01616   75.79   274.22   350.01   HUNTERSGREEN
594031064   335.0   5.81413   67.22   273.66   340.88   HUNTERSGREEN
594033934   325.0   5.78383   47.02   241.87   288.90   HUNTERSGREEN
594033956   380.0   5.94017   47.74   307.20   354.94   HUNTERSGREEN
594033970   640.0   6.46147   45.55   336.44   381.99   HUNTERSGREEN
594034072   262.0   5.56834   51.97   152.96   204.94   HUNTERSGREEN
594034084   190.0   5.24702   51.25   151.68   202.94   HUNTERSGREEN
1176450000   500.0   6.21461   256.13   152.78   408.91   HYDEPARK
1186250000   550.0   6.30992   168.45   310.12   478.56   HYDEPARK
1186260000   205.8   5.32690   168.45   41.60   210.05   HYDEPARK
1692460000   590.0   6.38012   291.09   66.90   357.99   DAVISISLES
1692730000   525.0   6.26340   462.43   114.13   576.56   DAVISISLES
1692990000   340.0   5.82895   199.66   80.43   280.09   DAVISISLES
1693500000   3200.0   8.07091   1004.59   1129.65   2134.23   DAVISISLES
1694120000   500.0   6.21461   358.43   55.89   414.32   DAVISISLES
1694450000   1421.0   7.25912   391.46   550.78   942.24   DAVISISLES
1853130000   858.9   6.75565   242.47   281.62   524.09   HYDEPARK
1853270000   625.0   6.43775   264.00   184.63   448.63   HYDEPARK
1854060000   375.0   5.92693   242.00   33.79   275.79   HYDEPARK
1854250000   1240.0   7.12287   264.00   466.82   730.82   HYDEPARK
1855080000   850.0   6.74524   266.64   333.18   599.82   HYDEPARK
1855250000   2500.0   7.82405   533.28   1221.20   1754.48   HYDEPARK
1855590000   750.0   6.62007   266.64   366.05   632.69   HYDEPARK
1856430000   895.0   6.79682   266.64   358.77   625.41   HYDEPARK
1856800000   460.0   6.13123   264.00   183.54   447.54   HYDEPARK
1856960000   605.0   6.40523   210.00   259.67   469.67   HYDEPARK
1857420000   329.9   5.79879   210.00   105.94   315.94   HYDEPARK
1857460000   462.0   6.13556   210.00   145.19   355.19   HYDEPARK
1858190000   850.0   6.74524   264.00   362.71   626.71   HYDEPARK
1858440000   1525.0   7.32975   335.61   862.34   1197.95   HYDEPARK
1858450000   1400.0   7.24423   350.90   711.34   1062.24   HYDEPARK
1858770000   760.0   6.63332   264.00   257.47   521.47   HYDEPARK
1859190000   520.0   6.25383   250.23   157.75   407.98   HYDEPARK
1859400000   418.0   6.03548   193.20   147.43   340.63   HYDEPARK
1859590000   152.0   5.02388   118.80   106.14   224.94   HYDEPARK
1859620000   330.0   5.79909   118.80   123.98   242.78   HYDEPARK
1860510000   240.0   5.48064   192.01   221.54   413.55   HYDEPARK
1860580000   405.0   6.00389   223.74   122.18   345.92   HYDEPARK
1860890000   650.0   6.47697   203.49   209.11   412.60   HYDEPARK
1860920000   400.0   5.99146   203.49   101.58   305.07   HYDEPARK
1861000000   365.0   5.89990   207.48   112.78   320.26   HYDEPARK
1953960000   745.0   6.61338   291.63   83.22   374.85   DAVISISLES
1954130000   1650.0   7.40853   388.82   1023.36   1412.18   DAVISISLES
1954320000   1250.0   7.13090   648.96   81.01   729.97   DAVISISLES
1954870000   715.0   6.57228   344.12   265.18   609.30   DAVISISLES
1955990000   1215.0   7.10250   341.86   785.66   1127.52   DAVISISLES
1960100000   440.0   6.08677   179.52   58.40   237.92   DAVISISLES
1962380000   690.0   6.53669   357.07   121.25   478.33   DAVISISLES
1964590000   587.5   6.37588   300.93   75.99   376.92   DAVISISLES
1966620000   377.5   5.93357   238.86   43.95   282.81   DAVISISLES
1968140000   810.0   6.69703   195.84   479.00   674.84   DAVISISLES
1969380000   800.0   6.68461   242.47   367.40   609.87   HYDEPARK

In: Statistics and Probability

In the following problem, check that it is appropriate to use the normal approximation to the...

In the following problem, check that it is appropriate to use the normal approximation to the binomial. Then use the normal distribution to estimate the requested probabilities.

Do you try to pad an insurance claim to cover your deductible? About 41% of all U.S. adults will try to pad their insurance claims! Suppose that you are the director of an insurance adjustment office. Your office has just received 136 insurance claims to be processed in the next few days. Find the following probabilities. (Round your answers to four decimal places.)

(a) half or more of the claims have been padded


(b) fewer than 45 of the claims have been padded


(c) from 40 to 64 of the claims have been padded


(d) more than 80 of the claims have not been padded

In: Statistics and Probability

A bank manager would like the know the mean wait time for his customers. He randomly...

A bank manager would like the know the mean wait time for his customers. He randomly selects 25 customers and record the amount of time wait to see a teller. The sample mean is 7.25 minutes with a standard deviation of 1.1 minutes. Construct a 99% confidence interval for the mean wait time. Round your answers to 2 decimal places.

Lower Limit =

Upper Limit =

Write a summary sentence for the confidence interval you calculated

In: Statistics and Probability

A bag contains 3 red marbles, 1 green one, 1 lavender one, 2 yellows, and 3...

A bag contains 3 red marbles, 1 green one, 1 lavender one, 2 yellows, and 3 orange marbles.
How many sets of five marbles include at least two red ones?

A bag contains 3 red marbles, 3 green ones, 1 lavender one, 3 yellows, and 2 orange marbles.
How many sets of five marbles include at most one of the yellow ones?

A bag contains 2 red marbles, 2 green ones, 1 lavender one, 3 yellows, and 3 orange marbles.
How many sets of five marbles include either the lavender one or exactly one yellow one but not both colors?

In: Statistics and Probability

1. the probability of type 2 error will be affected by the choice of test statistics....

1. the probability of type 2 error will be affected by the choice of test statistics. but why???

Please draw the picture to explain and step by step

follow the comment as well

In: Statistics and Probability

Nerve growth factor (NGF) is a protein that has been shown to play a role in...

Nerve growth factor (NGF) is a protein that has been shown to play a role in the development and maintenance of peripheral sympathetic neurons. One approach to study NGF is to deprive the animal of NGF and study the effect of this deprivation on various cell types. In this study, the effect on the today protein content in the dorsal root ganglia of rats is considered. Two groups of rats are compared: Those born to NGF deficient females (in utero) and those born to normal females but nursed by NGF deficient females (in milk). The data is shown below

In Milk (M): 0.19 , 0.21, 0.21, 0.23, 0.20, 0.22

In Utero( U): 0.12, 0.19, 0.17, 0.20, 0.09, 0.13, 0.21

Researchers are interested in whether the total protein content tends to be higher among rats deprived of NGF in milk.

  1. Let Y be total protein content of a randomly selected rat from the in utero population and X be that of a randomly selected rat from the milk population. What is the appropriate alternative hypothesis?
  2. Find the value of the Wilcoxon Rank-Sum test statistic

In: Statistics and Probability

A security consultant has observed that the attempts to breach the security of the companys computer...

A security consultant has observed that the attempts to breach the security of the companys computer system occurs according to a Poisson process with a mean rate of 3 attempts per day. (The system is on 24 hours per day.)

(a) What is the probability that there will be four breach attempts tomorrow, and two of them will occur during the evening (eight-hour) shift?

In: Statistics and Probability

An insurance company finds that of 629 randomly selected auto accidents, teenagers were driving the vehicle...

An insurance company finds that of 629 randomly selected auto accidents, teenagers were driving the vehicle in 118 of them.

(a) Find the 95% confidence interval for the proportion of auto accidents with teenaged drivers:
( , ) (Use 4 decimals.)

(b) What does this interval mean?

We are 95% confident that the proportion of all accidents with teenaged drivers is inside the above interval.

We are 95% confident that a randomly chosen accident with a teenaged driver will fall inside the above interval.   

We are 95% confident that of the 629 sampled accidents, the proportion with a teenaged driver falls inside the above interval.

We are 95% confident that the percent of accidents with teenaged drivers is 18.8%.



(c) What does the 95% confidence level mean?
We expect that 95% of random samples of size 629 will produce  ---(A) a sample proportion (B) a true proportion (C) confidence intervals --- that contain(s) the  --- (A) sample proportion (B) confidence interval (C) true proportion --- of accidents that had teenaged drivers.

(d) A politician urging tighter restrictions on drivers' licenses issued to teens says, "One of every five auto accidents has a teenaged driver." Does the confidence interval support or contradict this statement?

The confidence interval supports the assertion of the politician. The figure quoted by the politician is inside the interval.

The confidence interval supports the assertion of the politician. The figure quoted by the politician is outside the interval.    

The confidence interval contradicts the assertion of the politician. The figure quoted by the politician is outside the interval.

The confidence interval contradicts the assertion of the politician. The figure quoted by the politician is inside the interval.

In: Statistics and Probability

A marketing research firm wishes to study the relationship between wine consumption and whether a person...

A marketing research firm wishes to study the relationship between wine consumption and whether a person likes to watch professional tennis on television. One hundred randomly selected people are asked whether they drink wine and whether they watch tennis. The following results are obtained:

Watch
Tennis
Do Not
Watch Tennis
Totals
Drink Wine 8 42 50
Do Not Drink Wine 12 38 50
Totals 20 80 100

(a) For each row and column total, calculate the corresponding row or column percentage.

Row 1 %
Row 2 %
Column 1 %
Column 2 %

  
(b) For each cell, calculate the corresponding cell, row, and column percentages. (Round your answers to the nearest whole number.)

Watch
Tennis
Do Not
Watch Tennis
Drink Wine Cell= % Cell= %
Row= % Row= %
Column= % Column= %
Do Not Drink Wine Cell= % Cell= %
Row= % Row= %
Column= % Column= %

(c) Test the hypothesis that whether people drink wine is independent of whether people watch tennis. Set α = .05. (Round your answer to 3 decimal places.)

χ2χ2 =          

(Click to select)RejectDo not reject H0. Conclude that whether a person drinks wine and whether a person watches tennis are (Click to select)IndependentDependent events.

In: Statistics and Probability

Given the data below, what is the estimated bias given a reference value of 24.521? Data...

Given the data below, what is the estimated bias given a reference value of 24.521?

Data
24.9872
24.2675
25.2027
24.6919
24.9361
25.5176
24.8503
24.1735
24.8481
24.898
24.8747
24.6096
24.6116
24.9186
25
24.519
25.7207
24.9817
25.2276
24.919

In: Statistics and Probability

In 625 at bats a baseball player got a hit 28% of the time. Find a...

In 625 at bats a baseball player got a hit 28% of the time. Find a 95% confidence interval for the true percentage of time he gets a hit.

In: Statistics and Probability

You are given the following hypotheses: Null hypothesis: p = 0.3 Alternative hypothesis: ? ≠ 0.30...

You are given the following hypotheses: Null hypothesis: p = 0.3
Alternative hypothesis: ? ≠ 0.30

You decide to take a sample of size 90. Suppose we will reject the null hypothesis if the probability of an outcome as surprising as ?̂ occurring is less than 5%. (i.e., a “p-value” of .05). What values ?̂ would cause us to reject the null hypothesis? Hint: Your answer should be “if ?̂ is anything bigger than ____ or anything smaller than____.”

In: Statistics and Probability

Dual-energy X-ray absorptiometry (DXA) is a technique for measuring bone health. One of the most common...

Dual-energy X-ray absorptiometry (DXA) is a technique for measuring bone health. One of the most common measures is total body bone mineral content (TBBMC). A highly skilled operator is required to take the measurements. Recently, a new DXA machine was purchased by a research lab, and two operators were trained to take the measurements. TBBMC for eight subjects was measured by both operators. The units are grams (g). A comparison of the means for the two operators provides a check on the training they received and allows us to determine if one of the operators is producing measurements that are consistently higher than the other. Here are the data.

Subject Operator

1 2 3 4 5 6 7 8 1 1.326 1.338 1.077 1.228 0.938 1.008 1.182 1.288

2 1.323 1.322 1.073 1.233 0.934 1.019 1.184 1.304

(a) Take the difference between the TBBMC recorded for Operator 1 and the TBBMC for Operator 2. (Use Operator 1 minus Operator 2. Round your answers to four decimal places.)

x =

s =

Describe the distribution of these differences using words.

The distribution is right skewed.

The distribution is uniform.

The distribution is Normal.

The distribution is left skewed.

The sample is too small to make judgments about skewness or symmetry.

(b) Use a significance test to examine the null hypothesis that the two operators have the same mean. Give the test statistic. (Round your answer to three decimal places.) t =

Give the degrees of freedom.

Give the P-value. (Round your answer to four decimal places.)

Give your conclusion. (Use the significance level of 5%.)

We can reject H0 based on this sample.

We cannot reject H0 based on this sample.

(c) The sample here is rather small, so we may not have much power to detect differences of interest. Use a 95% confidence interval to provide a range of differences that are compatible with these data. (Round your answers to four decimal places.) ,

(d) The eight subjects used for this comparison were not a random sample. In fact, they were friends of the researchers whose ages and weights were similar to the types of people who would be measured with this DXA machine. Comment on the appropriateness of this procedure for selecting a sample, and discuss any consequences regarding the interpretation of the significance-testing and confidence interval results.

The subjects from this sample, test results, and confidence interval are representative of future subjects.

The subjects from this sample may be representative of future subjects, but the test results and confidence interval are suspect because this is not a random sample.

In: Statistics and Probability

Think of an example of a study where randomization is not feasible. Put the example in...

Think of an example of a study where randomization is not feasible. Put the example in the Potential Outcome Model Framework. Include the outcome variable and treatment. State the counterfactuals.  

In: Statistics and Probability

A health psychologist seeks to predict compliance with medical prescriptions using a self-report measure of methodicalness...

A health psychologist seeks to predict compliance with medical prescriptions using a self-report measure of methodicalness (M) and a health knowledge test (HK), the latter scored as pass (1) / fail (0). Out of 120 participants available at the study's outset, only 30 are still involved at the 6-month point, when compliance (C) is assessed. Information on all three measures and their interrelationships is provided below.

       Time 1   Time 2   Correlations/

       (N = 120)   (N = 30)   reliabilities*

       __________________   __________________   ______________________

   Measure   Range   Mean   SD   Mean   SD   M   HK   C

      

   Methodicalness   3 – 20   12.00   4.00   15.00   3.30   .82

   Health Knowledge   0 – 1   .50   .50   .80   .40   .32   .62

   Compliance   6 – 15           11.00   1.80   .40   .28   .72

      

   *Reliabilities are shown in bold.

a) What would be the observed criterion validity of the M scale if the full range of M scores from time 1 was available for correlation with prescription compliance? Also, what explains the drop in SD from T1 to T2? (3 pts.)

In: Statistics and Probability