Question

In: Statistics and Probability

Consider the following set of observations: Obs. 1 2 3 4 5 6 7 8 9...

Consider the following set of observations:

Obs.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

input

1

2

3

4

5

6

7

8

9

10

11

12

13

14

result

1

2

3

5

8

13

21

34

55

89

144

233

377

610

Enter the data in L1 and L2 in your TI calculator, find the regression line, and construct a scatterplot with the regression line included. Does a line appear to be a good model for these data? Be sure to check your residuals plot. (7 points: 2 points regression line, 2 points scatter plot, 2 points for residual plot; 1 points comment)   

What is r2?

What type of relationship does the data appear to have (linear, logarithmic, exponential, etc.)?

What type of re-expression would work in this case? (1 point)

Find the natural logarithm of the y-values.  

Draw a scatterplot of x vs. ln y. Find the regression equation on ln y on x and include it on the graph. Does it appear to be a better fit than the fit in part (a)? Be sure to check your residuals plot. (7 points: 2 points regression line, 2 points scatter plot, 2 points for residual plot; 1 points comment)   

Write a prediction (regression) equation for your re-expressed data

Use the regression equation you found in part (f) to predict the value of y when x = 10.5.

Does your answer for part (h) seem reasonable? Why or why not?

Explain the importance of checking the residuals plot before re-expressing data and then again after re-expressing data.

Solutions

Expert Solution

We can use simple R-coding (or excel) to solve this question, as it would be no different than using the TI calculator (meaning that we do not have to solve calculation by hands). We will use R-studio for this.

The data inputs would be as below.

> input <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14)
> result <- c(1,2,3,5,8,13,21,34,55,89,144,233,377,610)

The regression equation would be as below.

> lm(result ~ input)

Call:
lm(formula = result ~ input)

Coefficients:
(Intercept)        input  
    -143.69        34.35

.

The scatter plot with regression line and the residual plot would be as below.

> plot(x=input,y=result)
> abline(lm(result ~ input))

plot(x=input,y=resid(summary(lm(result~input))))

By seeing the scatter plot with regression line, we may say that the line does not appears to be a good model for these data. The reason being that line does not match the pattern of the data, as the data points seems parabolic/exponential, while the regression line is linear. Moreover, the first four data points are all on the left of the regression line, and then next eight data points are on the right, and then the last data point is again on the left. Also, the data points in the residual plot are not random, but strictly seem to have a pattern.

The r-squared of the regression would be as below.

> summary(lm(result~input))$r.squared
[1] 0.6385676

Hence, the r-squared is 0.6385676.

The data seems to have exponential relationship. The reason being that the data have increasing slope.

Exponential data would have the regression relationship as , and can be transformed to a linear line as . A re-expression would be such as that.

The natural log of result values would be as below.

log(result)
 [1] 0.0000000 0.6931472 1.0986123 1.6094379 2.0794415
 [6] 2.5649494 3.0445224 3.5263605 4.0073332 4.4886364
[11] 4.9698133 5.4510385 5.9322452 6.4134590

The regression for the log of y and x would be as below.

> lm(log(result) ~ input)

Call:
lm(formula = log(result) ~ input)

Coefficients:
(Intercept)        input  
    -0.3584       0.4847 

The regression equation would be or or or .

The scatter plot with the regression line and the residual plot would be as below.

> plot(x=input,y=log(result))
> abline(lm(log(result)~input))

plot(x=input, y=resid(summary(lm(log(result)~input))))

By seeing the scatter plot, it does appear to be a much better plot than the usual linear regression. Also, the reidual plot also seems better than before (but there is still a pattern after 3rd data point).

The regression equation would be or .

For input be 10.5, we have or . Using , we have .

As the value lies between 89 and 144, and is around mid point of these point, which is 116.5, the predicted result seems reasonable.

The importance of the residual plots is that it indicates the data fit and further relation of the re-expression. As the residual plot have a pattern, that suggests a non-linear relationship. Otherwise, the r-square of over 69% is not that bad at all. After re-expressing data, the residual plot plot seems to be more stable and random than before, suggesting that the fit is much better than before. The new r-square would be 99.95416%, whihc is quite high, suggesting that the re-expression gives a better model.


Related Solutions

x 2 8 5 9 4 3 9 6 7 8 y 3 6 5 7...
x 2 8 5 9 4 3 9 6 7 8 y 3 6 5 7 9 7 4 6 9 9 -5.48x + 0.17 5.48x + 0.17 -0.17x + 5.48 0.17x + 5.48
For the following data set, X: 9, 6, 8, 3, 8, 9, 3, 4, 3, 7:...
For the following data set, X: 9, 6, 8, 3, 8, 9, 3, 4, 3, 7: Calculate: 1. Variance 2. Mode 3. Mean 4. Mean Average Deviation (MAD) about the mean 5. Median
1.Consider the program: .data myArray: .word 1, 2, 3, 4, 5, 6, 7, 8, 9, 10...
1.Consider the program: .data myArray: .word 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 .text la $s0, myArray li $s1, 0 loop: sll $t0, $s1, 2 add $t0, $t0, $s0 lw $s2, 0($t0) lw $s3, 4($t0) add $s2, $s2, $s3 sw $s2, 0($t0) addi $s1, $s1, 1 slti $t1, $s1, 9 bne $t1, $zero, loop .end Explain what does this program do? How is the data bound from the .data segment to the base address register $s0? What...
Consider the following set of vectors in R6 S={[-9 7 -8 3 0 -5], [1 -7...
Consider the following set of vectors in R6 S={[-9 7 -8 3 0 -5], [1 -7 3 2 -8 -8], [-6 -14 1 9 -23 -29], [11 -21 14 1 -16 -11], [8 16 -8 8 10 1], [17 -7 13 -8 8 18] (a) (2 points) Demonstrate that S is not a basis for R6. (b) (4 points) Let H = Span S. Find a basis for H and determine its dimension. (c) (2 points) Determine whether v= [1,1,1,−1,−1,−1]...
Consider the following data set: 3 -5 5 7 9 10 -3 35 2 1 1...
Consider the following data set: 3 -5 5 7 9 10 -3 35 2 1 1 a) Determine by hand (you can use the calculator) the mean, median and mode of this data set. show enough details in your work for it to be clear that you did this work. b) Use MINITAB to obtain the above results for this data set (and only these results). c) By hand clearly determine the 5 number summary for this data set and...
tens Units 1 5 2 3 4 8 5 2 5 6 9 6 1 3...
tens Units 1 5 2 3 4 8 5 2 5 6 9 6 1 3 5 4 7 9 7 0 0 4 5 6 9 9 8 1 3 5 6 8 9 9 0 1 2 3 5 9 The table represent a random sample of 31 test scores taken from a large lecture class. Find the following [round to 2 decimal points X. XX] a) [2 pts] Find the 5 number summary [L, Q1, Q2, Q3,...
Consider the following. n = 8 measurements: 4, 3, 7, 8, 5, 6, 4, 6 Calculate...
Consider the following. n = 8 measurements: 4, 3, 7, 8, 5, 6, 4, 6 Calculate the sample variance, s2, using the definition formula. (Round your answer to four decimal places.) s2 = Calculate the sample variance, s2 using the computing formula. (Round your answer to four decimal places.) s2 = Find the sample standard deviation, s. (Round your answer to three decimal places.) s =
3 6 4 8 1 10 2 9 11 12 15 22 3 6 7 5...
3 6 4 8 1 10 2 9 11 12 15 22 3 6 7 5 8 1 12 14 Each column represents a different treatment given to sick rats. Each cell is a different rat. Use statistical analysis and use post hoc testing using contrasts to find the best treatment. Treatment 1: vitamins Treatment 2: prescription pills Treatment 3: brain surgery Treatment 4: shock therapy Treatment 5: dietary changes
Considering the following time series data: Week 1 2 3 4 5 6 7 8 9...
Considering the following time series data: Week 1 2 3 4 5 6 7 8 9 10 Sales 8 11 14 19 16 10 8 12 14 16 Compute the naïve forecast and the three-week Moving Average and evaluate the forecast accuracy considering the Mean Absolute Error (MAE), Mean Squared Error (MSE) and the Mean Absolute Percentage Error (MAPE) for each of these two predictions. Compare both of them and determine which is the best model
3, 7, 8, 5, 6, 4, 9, 10, 7, 8, 6, 5 Using the previous question...
3, 7, 8, 5, 6, 4, 9, 10, 7, 8, 6, 5 Using the previous question 's scores, If three points were added to every score in this distribution, what would be the new mean? If three points were added to every score in this distribution, what would be the new standard deviation. Remember, you have already calculated population standard deviation in a previous problem. This problem requires two answers.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT