Question

In: Computer Science

1. Load the cpus dataset from the MASS package. Use syct, mmin , mmax , cach...

1. Load the cpus dataset from the MASS package.

Use syct, mmin , mmax , cach , chmin, chmax as the predictors (independent variables) to predict performance (perf)
Perform the best subset selection in order to choose the best predictors from the above predictors.
What is the best model obtained according to Cp, BIC, and adjusted R2?
Show some plots to provide evidence for your answer, and report the coefficients of the best model obtained for each criterion.
Repeat using forward stepwise selection and also using backward stepwise selection. How does your answer compare to the best subset results?

Solutions

Expert Solution

CONSIDERING THE GIVEN PARAMETERS WE SHOW THE FOLL

1. The library function loads a package into your workspace, such as:

library (MASS)

It's going to be a MASS package. The Boston data- frame that can be accessed immediately is already included in this package.

2. Predictive models are extremely helpful in R programming for predicting future performance and estimating parameters that are impractical to calculate. For example, predictive models may be used by data scientists to forecast crop yields based on precipitation and temperature, or to assess if patients with certain characteristics are more likely to respond adversely to a new drug.

Let's remind ourselves of what a standard data science workflow could look like before we talk about linear regression specifically. We will start with a question a lot of the time that we want to answer, and do something like the following:

Gather some information that is important to the issue (more is almost always better).
If necessary, clean, augment, and preprocess the data in a convenient form.
To get a better sense of it, conduct an exploratory analysis of the data.
Build a model of a certain aspect of the data using what you find as a guide.
To answer the question you started with, use the model and validate your findings.
3. To explore this data set and learn the basics of linear regression, we will use R in this post. We suggest our R Fundamentals and R Programming: Intermediate courses from our R Data Analyst route, if you're new to learning the R language. It would also help to have a very simple knowledge of statistics, so if you know what a mean and standard deviation is, you will follow along with it. If you want to practise yourself designing the templates and visualisations, we can use the following R package :

Data sets This package includes a large range of data sets for instruction. In order to learn about creating linear regression models, we will use one of them, "trees".
Ggplot2 We'll build plots of our models using this popular data visualisation package.
GGally This kit expands ggplot2 's capabilities. We'll be using it to construct a plot matrix as part of our initial exploratory data visualisation.
Scatterplot3d This package will be used to visualise more complicated linear regression models with multiple predictor.
4. To choose the best model containing the predictors, use the regsubsets() function to make the best subset selection i.e. X,X2,…,X10.

For each variable added, if we define 'best' as the most marginal reduction in error, then all models indicate by their shape that the best fit is provided by 3 variables. Now, the model with the coefficient is:

Y=16.973+3.007X+0.842X2−1.986X3

5. Plots to provide evidence for answer are as follows

OWING AS:

6. The performance of the stepwise selection techniques, both forward and backward, yields the same recommended model as that in part(4) :

Y=16.973+3.007X+0.842X2−1.986X3

PLEASE UPVOTE ITS VERY NECESSRY FOR ME

THANKING YOU


Related Solutions

Solve it by R Use the ‘cement’ dataset in ‘MASS’ package to answer the question. (1)...
Solve it by R Use the ‘cement’ dataset in ‘MASS’ package to answer the question. (1) Conduct the multiple linear regression, regress y value on x1, x2, x3 and x4 (without intercept). Report the estimated coefficients. Which predictor variables have strong linear relationship with response variable y at significance level 0.05? (2) What is the adjusted R square of your regression? What is the interquartile range (IQR) of the residuals from your regression? (3) Conduct a best subset regression (with...
Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....
Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1. Explore the dataset & state insights 2. Create plots for dataset 3: Provide summary of descriptive stats) 4. Run the regressions, research, Investigate & comment on R^2 & on regression plots - 1 line each. #=========================================== # DV = Price, IV or IVs = your choice # Can we create and compare models to predict "Price"? # Question- Investigate & comment on R^2 &...
Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....
Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1. Explore the dataset & state insights 2. Create plots for dataset 3: Provide summary of descriptive stats) 4. Run the regressions, research, Investigate & comment on R^2 & on regression plots - 1 line each. #=========================================== # DV = Price, IV or IVs = your choice # Can we create and compare models to predict "Price"? # Question- Investigate & comment on R^2 &...
Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....
Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1. Explore the dataset & state insights 2. Create plots for dataset 3: Provide summary of descriptive stats) 4. Run the regressions, research, Investigate & comment on R^2 & on regression plots - 1 line each. #=========================================== # DV = Price, IV or IVs = your choice # Can we create and compare models to predict "Price"? # Question- Investigate & comment on R^2 &...
Install and load the dataset named Carseats (in the ISLR package) into R. Run a multiple...
Install and load the dataset named Carseats (in the ISLR package) into R. Run a multiple linear regression with all the variables. Using the coefficients, write down the model. ( be careful with the qualitative variable ShelveLoc. ) obtain the interaction plot of ShelveLoc and price.
Install and load the dataset named Carseats (in the ISLR package) into R. Create a new...
Install and load the dataset named Carseats (in the ISLR package) into R. Create a new dataframe that is a copy of Carseats. Create two indicator (dummy) variables: Bad_Shelf = 1 if ShelveLoc = “Bad”, 0 otherwise Good_Shelf = 1 if ShelveLoc = “Good”, 0 otherwise Also, create two interaction variables: Price_Bad_Shelf = Price* Bad_Shelf Price_Good_Shelf = Price* Good_Shelf For Questions 1-2, please estimate a linear regression model (using the lm function) with Sales as the dependent variable and Price,...
load the MASS library in R. A. Package ‘MASS’ which provides a description of the datasets...
load the MASS library in R. A. Package ‘MASS’ which provides a description of the datasets available in the MASS package. Then, answer each of the following questions using the appropriate test statistic and following formal steps of hypothesis testing. A:Test of equal or given proportions: Use the “bacteria” data set to answer the question, “did the drug treatment have a significant effect of the presence of the bacteria compared with the placebo?” B: F-test: Use the “cats” data set...
R code: ## 2. __Basic dplyr exercises__ ## Install the package `fueleconomy` and load the dataset...
R code: ## 2. __Basic dplyr exercises__ ## Install the package `fueleconomy` and load the dataset `vehicles`. Answer the following questions. install.packages("fueleconomy") library(fueleconomy) library(dplyr) library(tidyr) data(vehicles) e. Finally, for the years 1994, 1999, 2004, 2009, and 2014, find the average city mpg of midsize cars for each manufacturer for each year. Use tidyr to transform the resulting output so each manufacturer has one row, and five columns (a column for each year). I have included sample output for the first...
1. The dataset prostate (in R package ”faraway”) is from a study on 97 men with...
1. The dataset prostate (in R package ”faraway”) is from a study on 97 men with prostatecancer who were due to receive a radical prostatectomy.Fit a model withlpsa(y) as the response variable andlcavol(x) as the predictor andanswer the following question: •Calculate and plot the 90%confidenceandpredictionbands. Which type ofintervals are wider?
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treatments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if there is a difference in mean weight gain between those receiving no treatment and those receiving...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT