Question

In: Statistics and Probability

USING R STUDIO- Write the r commands for the following. 1. Non-Linear Models 1.1 Load the...

USING R STUDIO- Write the r commands for the following.

1. Non-Linear Models

1.1 Load the {ISLR} and {GGally} libraries. Load and attach the College{ISLR} data set.

[For you only]: Open the College data set and its help file and familiarize yourself with the data set and its fields.

1.2 Inspect the data with the ggpairs(){GGally} function, but do not run the ggpairs plots on all variables because it will take a very long time. Only include these variables in your ggpairs plot: “Outstate”,“S.F.Ratio”,“Private”,“PhD”,“Grad.Rate”.

1.3 Briefly answer: if we are interested in predicting out of state tuition (Outstate), can you tell from the plots if any of the other variables have a curvilinear relationship with Outstate? Briefly explain.

1.4 Regardless of your answer, plot Outstate (Y axis) against S.F.Ratio (X axis). Then, please answer, do you now see a more curvilinear pattern in the relationship?

1.5 Fit a linear model to predict Outstate as a function of the other 4 predictors in your ggpairs plot. Store your model results in an object named fit.linear. Display a summary of your results.

1.6 Briefly answer: Does this seem like a good model fit? Why or why not?

1.7 Now add an interaction term for S.F.Ratio with Private. Store your model results in an object named fit.inter. Display a summary of your results. Then do an anova() test to evaluate if fit.inter has more predictive power than fit.linear.

1.8 Briefly interpret the coefficients of the interaction term and the ANOVA results

1.9 Now use the poly() function to fit a polynomial of degree 4 for S.F.Ratio. Store your model results in an object named fit.poly. Display the summary results. Then conduct an anova() test to evaluate if fit.poly has more predictive power than fit.linear.

1.10 Briefly interpret your results.

Solutions

Expert Solution

1.3 S.F. Ratio has a curvilinear relationship with Outstate as seen from plot

> install.packages("ISLR")

> library(ISLR)

> install.packages("GGally")

package ‘progress’ successfully unpacked and MD5 sums checked

package ‘reshape’ successfully unpacked and MD5 sums checked

package ‘GGally’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in

        C:\Users\H311857\AppData\Local\Temp\RtmpOiMLSp\downloaded_packages

> library(GGally)

Loading required package: ggplot2

> view(College)

Error in view(College) : could not find function "view"

> attach(College)

> View(College)

> dim(College)

[1] 777 18

> head(College)

                             Private Apps Accept Enroll Top10perc Top25perc F.Undergrad

Abilene Christian University     Yes 1660   1232    721        23        52        2885

Adelphi University               Yes 2186   1924    512        16        29        2683

Adrian College                   Yes 1428   1097    336        22        50        1036

Agnes Scott College              Yes 417    349    137        60        89         510

Alaska Pacific University        Yes 193    146     55        16        44         249

Albertson College                Yes 587    479    158        38        62         678

                             P.Undergrad Outstate Room.Board Books Personal PhD Terminal

Abilene Christian University         537     7440       3300   450     2200 70       78

Adelphi University                  1227    12280       6450   750     1500 29       30

Adrian College                        99    11250       3750   400     1165 53       66

Agnes Scott College                   63    12960       5450   450      875 92       97

Alaska Pacific University            869     7560       4120   800     1500 76       72

Albertson College                     41    13500       3335   500      675 67       73

                             S.F.Ratio perc.alumni Expend Grad.Rate

Abilene Christian University      18.1          12   7041        60

Adelphi University                12.2          16 10527        56

Adrian College                    12.9          30   8735        54

Agnes Scott College                7.7          37 19016        59

Alaska Pacific University         11.9           2 10922        15

Albertson College                  9.4          11   9727        55

> str(College)

'data.frame': 777 obs. of 18 variables:

$ Private    : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...

$ Apps       : num 1660 2186 1428 417 193 ...

$ Accept     : num 1232 1924 1097 349 146 ...

$ Enroll     : num 721 512 336 137 55 158 103 489 227 172 ...

$ Top10perc : num 23 16 22 60 16 38 17 37 30 21 ...

$ Top25perc : num 52 29 50 89 44 62 45 68 63 44 ...

$ F.Undergrad: num 2885 2683 1036 510 249 ...

$ P.Undergrad: num 537 1227 99 63 869 ...

$ Outstate   : num 7440 12280 11250 12960 7560 ...

$ Room.Board : num 3300 6450 3750 5450 4120 ...

$ Books      : num 450 750 400 450 800 500 500 450 300 660 ...

$ Personal   : num 2200 1500 1165 875 1500 ...

$ PhD        : num 70 29 53 92 76 67 90 89 79 40 ...

$ Terminal   : num 78 30 66 97 72 73 93 100 84 41 ...

$ S.F.Ratio : num 18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...

$ perc.alumni: num 12 16 30 37 2 11 26 37 23 15 ...

$ Expend     : num 7041 10527 8735 19016 10922 ...

$ Grad.Rate : num 60 56 54 59 15 55 63 73 80 52 ...

> ggdf <- College[, c(1,9,13,15,18)]

> head(ggdf)

                             Private Outstate PhD S.F.Ratio Grad.Rate

Abilene Christian University     Yes     7440 70      18.1        60

Adelphi University               Yes    12280 29      12.2        56

Adrian College                   Yes    11250 53      12.9        54

Agnes Scott College              Yes    12960 92       7.7        59

Alaska Pacific University        Yes     7560 76      11.9        15

Albertson College                Yes    13500 67       9.4        55

> ggpairs(ggdf, ggplot2::aes(colour=Private))

plot: [2,1] [===============>---------------------------------------------------] 24% est: 3s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot: [3,1] [============================>--------------------------------------] 44% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot: [4,1] [==========================================>------------------------] 64% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot: [5,1] [=======================================================>-----------] 84% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

                                                                                              

> plot(College$S.F.Ratio, College$Outstate, type="b")

>



Related Solutions

** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the ISLR package and the Auto data Determine the median value for mpg Use the median to create a new column in the data set named mpglevel, which is 1 if mpg>median and otherwise is 0. Make sure this variable is a factor. We will use mpglevel as the target (response) variable for the algorithms. Use the names() function to verify that your new column...
This question requires using Rstudio. This is following commands to install and import data into R:...
This question requires using Rstudio. This is following commands to install and import data into R: > install.packages("ISLR") > library(ISLR) > data(Wage) The required data installed and imported, now this is description of the data: This dataset contains economic and demographic data for 3000 individuals living in the mid-Atlantic region. For each of the 3000 individuals, the following 11 variables are recorded: year: Year that wage information was recorded age: Age of worker maritl: A factor with levels 1. Never...
This question requires using Rstudio. This is following commands to install and import data into R:...
This question requires using Rstudio. This is following commands to install and import data into R: > install.packages("ISLR") > library(ISLR) > data(Wage) The required data installed and imported, now this is description of the data: This dataset contains economic and demographic data for 3000 individuals living in the mid-Atlantic region. For each of the 3000 individuals, the following 11 variables are recorded: year: Year that wage information was recorded age: Age of worker maritl: A factor with levels 1. Never...
This question requires using Rstudio. This is following commands to install and import data into R:...
This question requires using Rstudio. This is following commands to install and import data into R: > install.packages("ISLR") > library(ISLR) > data(Wage) The required data installed and imported, now this is description of the data: This dataset contains economic and demographic data for 3000 individuals living in the mid-Atlantic region. For each of the 3000 individuals, the following 11 variables are recorded: year: Year that wage information was recorded age: Age of worker maritl: A factor with levels 1. Never...
1. Basic use of R/R Studio. Solve the following problem in R and print out the...
1. Basic use of R/R Studio. Solve the following problem in R and print out the commands and outputs. (a) Create a vector of the positive odd integers less than 100; Remove the values greater than 60 and less than 80; Find the variance of the remaining set of values (b) What’s the difference in output between the commands 2*1:5 and (2*1):5? Why is there a difference? (c) If you wanted to enter the odd numbers from 1 to 19...
Load the package nycflights13 with library(nycflights13). If you are on running R Studio locally, you must...
Load the package nycflights13 with library(nycflights13). If you are on running R Studio locally, you must install this package before you can use it! # install.packages("nycflights13") library(nycflights13) library(ggplot2) library(dplyr) data(flights) data(airports) data(airlines) Question 2 The dataset `airlines` contains the full name of the carrier (examine it!). Join the dataset with the flights dataset so all of the information in `flights` is retained. Using the merged dataset, which carrier (`name`) has the longest average departure delay? Which has the shortest?
I want this to be solved using R studio or R software, please. Here is the...
I want this to be solved using R studio or R software, please. Here is the example: The data in stat4_prob5 present the performance of a chemical process as a function of sever controllable process variables. (a) Fit a multiple regression modelrelating CO2product (y) to total solvent (x1) and hydrogen consumption (x2) and report the fitted regression line. (b) Find a point estimatefor the variance term σ2. (c) Construct the ANOVA tableand test for the significance of the regression using...
1. Discuss the difference in linear and non-linear data 2. explain a quadratic function. Write the...
1. Discuss the difference in linear and non-linear data 2. explain a quadratic function. Write the standard form of a quadratic function 3. Explain and write the vertex formula for a quadratic function 4. What is the vertex of a quadratic function? 5. What is the square root property? 6. Write the quadratic formula and explain what this formula is used for> 7. Explain what a complex number is and write the properties of an imaginary unti. 8. Explain the...
Need To Do this in R Studio...Here are the Instruction steps: 1. Using the 1:n construct,...
Need To Do this in R Studio...Here are the Instruction steps: 1. Using the 1:n construct, create the sequence 4,8,12, ..., 48. 2. Similarly, create the sequence 0,5,10,15, ..., 100. 3. Using a for() loop and the print() function, print the values 2,3,4,..., 7. 4. Using a for() loop and the print() function, print the values 8,11,14, ..., 26. 5. Create a vector with a length of 10. Then, using a for() loop, assign the values 3,6,9, ..., 30. to...
Solve following using Program R studio. Please show code and results. Thank you. 1. Assume that...
Solve following using Program R studio. Please show code and results. Thank you. 1. Assume that ? is a random variable follows binomial probability distribution with parameters 15 and 0.25.   a. Simulate 100 binomial pseudorandom numbers from the given distribution (using set.seed(200)) and assign them to vector called binran. b. Calculate ?(? < 8) using cumulative probability function. c. Calculate ?(? = 8) using probability distribution function. d. Calculate the average of simulated data and compare it with the corresponding...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT