In: Statistics and Probability
USING R STUDIO- Write the r commands for the following.
1. Non-Linear Models
1.1 Load the {ISLR} and {GGally} libraries. Load and attach the College{ISLR} data set.
[For you only]: Open the College data set and its help file and familiarize yourself with the data set and its fields.
1.2 Inspect the data with the ggpairs(){GGally} function, but do not run the ggpairs plots on all variables because it will take a very long time. Only include these variables in your ggpairs plot: “Outstate”,“S.F.Ratio”,“Private”,“PhD”,“Grad.Rate”.
1.3 Briefly answer: if we are interested in predicting out of state tuition (Outstate), can you tell from the plots if any of the other variables have a curvilinear relationship with Outstate? Briefly explain.
1.4 Regardless of your answer, plot Outstate (Y axis) against S.F.Ratio (X axis). Then, please answer, do you now see a more curvilinear pattern in the relationship?
1.5 Fit a linear model to predict Outstate as a function of the other 4 predictors in your ggpairs plot. Store your model results in an object named fit.linear. Display a summary of your results.
1.6 Briefly answer: Does this seem like a good model fit? Why or why not?
1.7 Now add an interaction term for S.F.Ratio with Private. Store your model results in an object named fit.inter. Display a summary of your results. Then do an anova() test to evaluate if fit.inter has more predictive power than fit.linear.
1.8 Briefly interpret the coefficients of the interaction term and the ANOVA results
1.9 Now use the poly() function to fit a polynomial of degree 4 for S.F.Ratio. Store your model results in an object named fit.poly. Display the summary results. Then conduct an anova() test to evaluate if fit.poly has more predictive power than fit.linear.
1.10 Briefly interpret your results.
1.3 S.F. Ratio has a curvilinear relationship with Outstate as seen from plot
> install.packages("ISLR") > library(ISLR) > install.packages("GGally") package ‘progress’ successfully unpacked and MD5 sums checked package ‘reshape’ successfully unpacked and MD5 sums checked package ‘GGally’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\H311857\AppData\Local\Temp\RtmpOiMLSp\downloaded_packages > library(GGally) Loading required package: ggplot2 > view(College) Error in view(College) : could not find function "view" > attach(College) > View(College) > dim(College) [1] 777 18 > head(College) Private Apps Accept Enroll Top10perc Top25perc F.Undergrad Abilene Christian University Yes 1660 1232 721 23 52 2885 Adelphi University Yes 2186 1924 512 16 29 2683 Adrian College Yes 1428 1097 336 22 50 1036 Agnes Scott College Yes 417 349 137 60 89 510 Alaska Pacific University Yes 193 146 55 16 44 249 Albertson College Yes 587 479 158 38 62 678 P.Undergrad Outstate Room.Board Books Personal PhD Terminal Abilene Christian University 537 7440 3300 450 2200 70 78 Adelphi University 1227 12280 6450 750 1500 29 30 Adrian College 99 11250 3750 400 1165 53 66 Agnes Scott College 63 12960 5450 450 875 92 97 Alaska Pacific University 869 7560 4120 800 1500 76 72 Albertson College 41 13500 3335 500 675 67 73 S.F.Ratio perc.alumni Expend Grad.Rate Abilene Christian University 18.1 12 7041 60 Adelphi University 12.2 16 10527 56 Adrian College 12.9 30 8735 54 Agnes Scott College 7.7 37 19016 59 Alaska Pacific University 11.9 2 10922 15 Albertson College 9.4 11 9727 55 > str(College) 'data.frame': 777 obs. of 18 variables: $ Private : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ... $ Apps : num 1660 2186 1428 417 193 ... $ Accept : num 1232 1924 1097 349 146 ... $ Enroll : num 721 512 336 137 55 158 103 489 227 172 ... $ Top10perc : num 23 16 22 60 16 38 17 37 30 21 ... $ Top25perc : num 52 29 50 89 44 62 45 68 63 44 ... $ F.Undergrad: num 2885 2683 1036 510 249 ... $ P.Undergrad: num 537 1227 99 63 869 ... $ Outstate : num 7440 12280 11250 12960 7560 ... $ Room.Board : num 3300 6450 3750 5450 4120 ... $ Books : num 450 750 400 450 800 500 500 450 300 660 ... $ Personal : num 2200 1500 1165 875 1500 ... $ PhD : num 70 29 53 92 76 67 90 89 79 40 ... $ Terminal : num 78 30 66 97 72 73 93 100 84 41 ... $ S.F.Ratio : num 18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ... $ perc.alumni: num 12 16 30 37 2 11 26 37 23 15 ... $ Expend : num 7041 10527 8735 19016 10922 ... $ Grad.Rate : num 60 56 54 59 15 55 63 73 80 52 ... > ggdf <- College[, c(1,9,13,15,18)] > head(ggdf) Private Outstate PhD S.F.Ratio Grad.Rate Abilene Christian University Yes 7440 70 18.1 60 Adelphi University Yes 12280 29 12.2 56 Adrian College Yes 11250 53 12.9 54 Agnes Scott College Yes 12960 92 7.7 59 Alaska Pacific University Yes 7560 76 11.9 15 Albertson College Yes 13500 67 9.4 55 > ggpairs(ggdf, ggplot2::aes(colour=Private)) plot: [2,1] [===============>---------------------------------------------------] 24% est: 3s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. plot: [3,1] [============================>--------------------------------------] 44% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. plot: [4,1] [==========================================>------------------------] 64% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. plot: [5,1] [=======================================================>-----------] 84% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
> plot(College$S.F.Ratio, College$Outstate, type="b") |
|
|