In: Math
3. Load the dataset called ec122a.csv and decide the appropriate regression to run. Write down what transformations, corrections, etc... you make and why.
y1 |
x1 |
5.3478787576716 |
-0.930542577578737 |
-69.4411002445282 |
-14.3360876802962 |
17.6647698924475 |
1.81741420842464 |
98.6511466667161 |
16.8769469917607 |
14.7965900933862 |
1.44147861051093 |
-34.5302655286703 |
-8.00737844994315 |
93.0899709372717 |
15.9601981407006 |
9.21693205816442 |
0.677367144474178 |
82.6007511115692 |
13.940352476942 |
115.798113882096 |
21.2544523041556 |
210.387049747658 |
38.2407928740359 |
25.53810654411 |
2.87106608048978 |
103.832140647001 |
18.1287219709914 |
69.9887102526973 |
11.9894172917371 |
115.53192498448 |
20.8016798770016 |
121.344292025264 |
22.0189019228638 |
92.7341812552436 |
15.9508245127554 |
141.336831165046 |
25.3838968113616 |
43.9676084746945 |
6.62783843142594 |
170.312498248916 |
30.6056891002234 |
100.141722965535 |
18.0744156617512 |
135.127526516403 |
25.1557427275658 |
35.4910615569294 |
5.34067840867235 |
49.0886162426323 |
7.66630180243485 |
183.23305880313 |
33.6747888141339 |
133.899669788226 |
31.0484776835843 |
119.472386558899 |
19.6774321421239 |
158.382012262513 |
38.7948124929967 |
158.265751170527 |
32.3449530783571 |
143.893438668698 |
31.318399069747 |
209.554152576129 |
39.0470592670422 |
269.696741210151 |
47.0260908373683 |
214.277835307116 |
39.4621037661542 |
137.448728114245 |
32.1951502465506 |
207.142331867495 |
30.4353133960267 |
195.530279391204 |
37.8895119687649 |
260.613365801387 |
47.5777648932458 |
193.358564414283 |
32.544387671943 |
214.355032319599 |
35.3968738633248 |
236.246295426679 |
46.8573949752216 |
179.510295035057 |
40.1659721878024 |
212.202997184581 |
42.9660084481672 |
207.263001917022 |
38.3563234239438 |
189.537080855405 |
36.9994190965688 |
293.77520107103 |
52.1838828310905 |
275.816868619373 |
46.9729303988204 |
213.777730095761 |
51.623410535034 |
234.515710668196 |
54.7713564277226 |
305.755164538293 |
58.2281799071355 |
247.574028943277 |
48.2612135595578 |
216.487201880791 |
44.4805702248886 |
298.939398951728 |
59.5098240982159 |
294.087515977881 |
59.7323992058337 |
242.47071086964 |
56.2423387201774 |
314.216664214321 |
52.099463290858 |
198.64183568504 |
45.5524703388986 |
451.501739075897 |
71.7371279051027 |
334.639764748968 |
56.3003387632005 |
325.539711644784 |
61.3753202076771 |
334.360999254428 |
65.6509595487347 |
375.501692057963 |
69.0188962996762 |
279.92394271145 |
50.820033316856 |
391.747159079897 |
84.4387655124175 |
256.755426083081 |
61.8425207887276 |
335.348364682454 |
78.2972291401232 |
326.862654481865 |
67.2701863797509 |
409.199061682728 |
67.0226394402898 |
315.278602307445 |
62.6960929012151 |
389.115799067651 |
67.3988546408951 |
324.558498258645 |
74.4613819502999 |
277.860262868103 |
51.1152020598209 |
348.220952805656 |
73.4318499899927 |
394.101591698092 |
69.6828387504708 |
378.574744964529 |
70.3390300051774 |
345.129291309579 |
65.1443486887627 |
431.388383747861 |
86.5385881418601 |
461.246340384882 |
80.7778216315798 |
393.128587286873 |
79.4875434916298 |
457.413617158369 |
93.3535591485397 |
490.030080973679 |
86.1469790728216 |
445.013611790392 |
88.2858459293727 |
502.433226880918 |
90.1840214865989 |
531.919402102633 |
84.5845337879384 |
459.430685958911 |
101.584476353668 |
524.534588061157 |
93.7218122436017 |
384.831820262549 |
88.3997031202485 |
369.255646051443 |
64.059370789963 |
460.011550161416 |
94.6172825629485 |
581.849448405881 |
100.291462036955 |
487.238487436963 |
86.277129080676 |
554.389543790077 |
106.054358170647 |
476.138213629779 |
77.2218509347784 |
360.434234419891 |
84.2953663204438 |
497.064285198229 |
98.5692172324988 |
559.620017287958 |
104.394884303588 |
570.274724422607 |
113.867023345632 |
526.006391282654 |
110.550311578395 |
668.85329391523 |
118.103935026271 |
567.23894595309 |
105.330310697067 |
551.525236136496 |
104.258750461435 |
Let's use excel:
Simple linear regression using Excel.
Step 1) First enter the given data set in excel columns.
Step 2) Then click on Data >>> Data Analysis >>>Regression >>>>OK
Step 3) Input Y Range: Select the data of column "A"
Input X Range: Select the data of column "B"
Click on Lable
then Click on Ouput Range
Select the box "Residual plot"
Look the following Image
Then Click on OK, so we get following result.
From the above output the Correlation = R = 0979733
Even though the correlation between y1 and x1 is very hight the ordinary least square estimators are misleading. Because the assumption of constant variance of the residual is violated.
From the residual plot, we conclude that as the level of x1 increases the variance of the residuals also increases.
So we need to used transformation.
The transformation here such that the weight of the small values of x is greater than the weight of the larger vaues of X.