In: Computer Science
1. Load the cpus dataset from the MASS package.
Use syct, mmin , mmax , cach , chmin, chmax as the predictors
(independent variables) to predict performance (perf)
Perform the best subset selection in order to choose the best
predictors from the above predictors.
What is the best model obtained according to Cp, BIC, and adjusted
R2?
Show some plots to provide evidence for your answer, and report the
coefficients of the best model obtained for each criterion.
Repeat using forward stepwise selection and also using backward
stepwise selection. How does your answer compare to the best subset
results?
CONSIDERING THE GIVEN PARAMETERS WE SHOW THE FOLL
1. The library function loads a package into your workspace, such as:
library (MASS)
It's going to be a MASS package. The Boston data- frame that can be accessed immediately is already included in this package.
2. Predictive models are extremely helpful in R programming for predicting future performance and estimating parameters that are impractical to calculate. For example, predictive models may be used by data scientists to forecast crop yields based on precipitation and temperature, or to assess if patients with certain characteristics are more likely to respond adversely to a new drug.
Let's remind ourselves of what a standard data science workflow could look like before we talk about linear regression specifically. We will start with a question a lot of the time that we want to answer, and do something like the following:
Gather some information that is important to the issue (more is
almost always better).
If necessary, clean, augment, and preprocess the data in a
convenient form.
To get a better sense of it, conduct an exploratory analysis of the
data.
Build a model of a certain aspect of the data using what you find
as a guide.
To answer the question you started with, use the model and validate
your findings.
3. To explore this data set and learn the basics of linear
regression, we will use R in this post. We suggest our R
Fundamentals and R Programming: Intermediate courses from our R
Data Analyst route, if you're new to learning the R language. It
would also help to have a very simple knowledge of statistics, so
if you know what a mean and standard deviation is, you will follow
along with it. If you want to practise yourself designing the
templates and visualisations, we can use the following R package
:
Data sets This package includes a large range of data sets for
instruction. In order to learn about creating linear regression
models, we will use one of them, "trees".
Ggplot2 We'll build plots of our models using this popular data
visualisation package.
GGally This kit expands ggplot2 's capabilities. We'll be using it
to construct a plot matrix as part of our initial exploratory data
visualisation.
Scatterplot3d This package will be used to visualise more
complicated linear regression models with multiple predictor.
4. To choose the best model containing the predictors, use the
regsubsets() function to make the best subset selection i.e.
X,X2,…,X10.
For each variable added, if we define 'best' as the most marginal reduction in error, then all models indicate by their shape that the best fit is provided by 3 variables. Now, the model with the coefficient is:
Y=16.973+3.007X+0.842X2−1.986X3
5. Plots to provide evidence for answer are as follows
OWING AS:
6. The performance of the stepwise selection techniques, both forward and backward, yields the same recommended model as that in part(4) :
Y=16.973+3.007X+0.842X2−1.986X3
PLEASE UPVOTE ITS VERY NECESSRY FOR ME
THANKING YOU