Question

In: Math

When using r programming or statistical software: (A) From the summary, which variables seem useful for...

When using r programming or statistical software:

(A) From the summary, which variables seem useful for predicting changes in independent variable?

(B) For the purpose of variable selection, does the ANOVA table provide any useful information not already in the summary?

Solutions

Expert Solution

Sol:

(A) From the summary, which variables seem useful for predicting changes in independent variable?

yes from summary of model look for p values .
Choose alpha=0.05

if p<0.05 then that variables are significant variables

if p>0.05 then that variables are not significant variables.

We can exclude that variabls

(B) For the purpose of variable selection, does the ANOVA table provide any useful information not already in the summary?

ANOVA does not serve the pupose of variable selection.

It says whether the model is important or not good.

from F statistic and p value we get that information

if p<0.05 model is significant.

we can use model for predicting dependent variable

Examle we have inbuilt dataset iris .we shall build a regression model as

sepal.length=dependent varaible rest all are independent variables

Rcode is

regmod <- lm(iris$Sepal.Length~.,data=iris)
summary(regmod)
anova(regmod)

output from summary

oefficients:

Estimate Std. Error t value Pr(>|t|)   

(Intercept) 2.17127 0.27979 7.760 1.43e-12 ***

Sepal.Width 0.49589 0.08607 5.761 4.87e-08 ***

Petal.Length 0.82924 0.06853 12.101 < 2e-16 ***

Petal.Width -0.31516 0.15120 -2.084 0.03889 *  

Speciesversicolor -0.72356 0.24017 -3.013 0.00306 **

Speciesvirginica -1.02350 0.33373 -3.067 0.00258 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3068 on 144 degrees of freedom

Multiple R-squared: 0.8673, Adjusted R-squared: 0.8627

F-statistic: 188.3 on 5 and 144 DF, p-value: < 2.2e-16

here p for Sepal.Width is p= 4.87e-08 ***

p<0.05 Sepal width is significant

p for Petal.Length is  < 2e-16 ***

p<0.05

Petal.Length is signfiicant variable can be used for prediction

p for Petal.Width is 0.03889 *

p<0.05

Petal.Width is signfiicant variable can be used for prediction of sepal length

Speciesversicolor is significant as p =0.00306 and p<0.05

For
Speciesvirginica p=0.00258 ,p<0.05

Speciesvirginica is signifcant variable.

For Anova model

anova(regmod)

Analysis of Variance Table

Response: iris$Sepal.Length
Df Sum Sq Mean Sq F value Pr(>F)   
Sepal.Width 1 1.412 1.412 15.0011 0.0001625 ***
Petal.Length 1 84.427 84.427 896.8059 < 2.2e-16 ***
Petal.Width 1 1.883 1.883 20.0055 1.556e-05 ***
Species 2 0.889 0.444 4.7212 0.0103288 *  
Residuals 144 13.556 0.094

P here is p<0.05 model is significant.


Related Solutions

How is statistical software useful?
How is statistical software useful?
SOLVE THE FOLLOWING USING STATISTICAL SOFTWARE R. SHOW YOUR CODE AND ANSWERS, USING AN RMD FILE...
SOLVE THE FOLLOWING USING STATISTICAL SOFTWARE R. SHOW YOUR CODE AND ANSWERS, USING AN RMD FILE (SHOW ANSWERS IN R MARKDOWN FORWAT WITH CODE AND ANSWERS) PROBLEM 1 A study of 400 glaucoma patients yields a sample mean of 140 mm and a sample standard deviation of 25 mm for the the following summaries for the systolic blood pressure readings. Construct the 95% and 99% confidence intervals for μ, the population average systolic blood pressure for glaucoma patients. PROBLEM 2...
How do I even begin to solve this using R statistical software? A random sample of...
How do I even begin to solve this using R statistical software? A random sample of eight pairs of twins was randomly assigned to treatment A or treatment B. The data are given in the following table: Twins 1 2 3 4 5 6 7 8 Treatment A 48.3 44.6 49.7 40.5 54.3 55.6 45.8 35.4 Treatment B 43.5 43.8 53.7 43.9 54.4 54.7 45.2 34.4 What is the p-value of the Wilcoxon signed-rank test? Is there any significant evidence...
Perform Monte Carlo integration using R statistical programming to estimate the value of π. Generate N...
Perform Monte Carlo integration using R statistical programming to estimate the value of π. Generate N pairs of uniform random numbers (x,y), where x ∼ U(0,1)and y ∼ U(0,1) and each (x,y) pair represents a point in the unit square. To obtain an estimate of π count the fraction of points that fall inside the unit quarter circle and multiply by 4. Note that the fraction of points that fall inside the quarter circle should tend to the ratio between...
SOLVE THE FOLLOWING USING STATISTICAL SOFTWARE R. SHOW YOUR CODE PROBLEM 1 A study of 400...
SOLVE THE FOLLOWING USING STATISTICAL SOFTWARE R. SHOW YOUR CODE PROBLEM 1 A study of 400 glaucoma patients yields a sample mean of 140 mm and a sample standard deviation of 25 mm for the the following summaries for the systolic blood pressure readings. Construct the 95% and 99% confidence intervals for μ, the population average systolic blood pressure for glaucoma patients. PROBLEM 2 Suppose that fasting plasma glucose concentrations (FPG) in some population are normally distributed with a mean...
SOLVE THE FOLLOWING USING STATISTICAL SOFTWARE R, POST YOUR CODE: PROBLEM 1. Suppose we have 100...
SOLVE THE FOLLOWING USING STATISTICAL SOFTWARE R, POST YOUR CODE: PROBLEM 1. Suppose we have 100 independent draws from some population distribution whose shape is unknown but where the population mean is 10 and SD is 2.5. Suppose tha tn=100 is sufficiently large that for the sample mean to have an approximately normal distribution. (a) What is the chance that the sample mean is within 0.1 units of the population mean? (b) What is the chance that the sample mean...
Q3. Consider the matrix A . Use R statistical software to determine the eigenvalues and normalized...
Q3. Consider the matrix A . Use R statistical software to determine the eigenvalues and normalized eigenvectors of A, trace of A, determinant of A, and inverse of A. Also determine the eigenvalues and normalized eigenvectors of A-1. Your answer should include your R code (annotated with comments) and a hand-written or typed summary of the answers from the R output.
I want this to be solved using R studio or R software, please. Here is the...
I want this to be solved using R studio or R software, please. Here is the example: The data in stat4_prob5 present the performance of a chemical process as a function of sever controllable process variables. (a) Fit a multiple regression modelrelating CO2product (y) to total solvent (x1) and hydrogen consumption (x2) and report the fitted regression line. (b) Find a point estimatefor the variance term σ2. (c) Construct the ANOVA tableand test for the significance of the regression using...
R Programming Exercise 3.4 From a normal distribution which has a standard deviation of 40 and...
R Programming Exercise 3.4 From a normal distribution which has a standard deviation of 40 and mean of 10, generate 2 to 600 samples. After generating the samples utilize the plot command to plot the mean of the generated sample (x-axis) against the number of samples (Y-axis). Use proper axis labels. Create a second plot of the density of the 600 samples that you generated. Use adequate comments to explain your reasoning. This code can be solved in 4 to...
When computers with statistical software operate, the speed of the hard drive is very important in...
When computers with statistical software operate, the speed of the hard drive is very important in performing statistical tasks. Because of this, the brand of hard drive used is an important factor when large data sets are analyzed. The Acme Statistics Group randomly selected 10 computers and installed either Brand A or Brand B hard drives to the computers so that there were 5 of each type of hard drive installed. (Each hard drive was "rated" at the same data...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT