Question

In: Statistics and Probability

The dataset ”chickwts” you can access in R by typing Courtelary 80.2 17.0 15 12 9.96...

The dataset ”chickwts” you can access in R by typing
Courtelary 80.2 17.0 15 12 9.96
Delemont 83.1 45.1 6 9 84.84
Franches-Mnt 92.5 39.7 5 5 93.40
Moutier 85.8 36.5 12 7 33.77
Neuveville 76.9 43.5 17 15 5.16
Porrentruy 76.1 35.3 9 7 90.57
Infant.Mortality
Courtelary 22.2
Delemont 22.2
Franches-Mnt 20.2
Moutier 20.3
Neuveville 20.6
Porrentruy 26.6
I.
Let X1; ...X6 be random variables for the 6 different columns.
Let
xj =∑ xij /n
and
s^2 =∑(xij - xj )^2 /n

be estimators for the mean and variance of each of the variables.
Let
Cov(Xj ; Xk) = E[(Xj -E(Xk))(Xk -E(Xk))]
be the Covariance between any two of the columns. Estimate all the - pairwise - covariances. Make
some comments.
Do any of the variables appear to be independent. Why or why not?

Solutions

Expert Solution

All R commands are shown in bold.

The data is,

> head(swiss)
Fertility Agriculture Examination Education Catholic Infant.Mortality
Courtelary 80.2 17.0 15 12 9.96 22.2
Delemont 83.1 45.1 6 9 84.84 22.2
Franches-Mnt 92.5 39.7 5 5 93.40 20.2
Moutier 85.8 36.5 12 7 33.77 20.3
Neuveville 76.9 43.5 17 15 5.16 20.6
Porrentruy 76.1 35.3 9 7 90.57 26.6

All the - pairwise - covariances can be found as,

cov(swiss, use = "pairwise")

Fertility Agriculture Examination Education Catholic Infant.Mortality
Fertility 156.04249769 100.169148936 -64.366928770 -79.729509713 241.56320305 15.156193340
Agriculture 100.16914894 515.799417206 -124.392830712 -139.657400555 379.90437558 -4.025851064
Examination -64.36692877 -124.392830712 63.646623497 53.575855689 -190.56061055 -2.649537465
Education -79.72950971 -139.657400555 53.575855689 92.456059204 -61.69882979 -2.781683626
Catholic 241.56320305 379.904375578 -190.560610546 -61.698829787 1739.29453719 21.318116096
Infant.Mortality 15.15619334 -4.025851064 -2.649537465 -2.781683626 21.31811610 8.483802035

If any of the variables appear to be independent, then covariance between those variables should be close to 0.

Since for none of the variables, the covariance is 0, no variable appears to be independent.

Very low covariance is found between

Infant.Mortality and Examination which is equal to -2.649537465

Infant.Mortality and Education which is equal to -2.781683626

Infant.Mortality and Agriculture which is equal to -4.025851064

We can say that Infant.Mortality is least dependent on Examination, Education and Agriculture.


Related Solutions

The dataset "chickwts" you can access in R Weight    Feed 1 179 horsebean 2 160...
The dataset "chickwts" you can access in R Weight    Feed 1 179 horsebean 2 160 horsebean 3 136 horsebean 4 227 horsebean 5 217   horsebean 6 168   horsebean Let X1, .... X6 be random variables for the 6 different feedtypes. Let xj =∑ xij /n and s2 =∑ (xij -xj )^2 / n be estimators for the mean and variance of the chickweights for each of the feedtypes. Find the values of thee estimators for each. Assume these random...
Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can...
Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can be found on Canvas). Now fit a simple linear regression model with X = lotSize and Y = workHrs. Summarize the output from the model: the least square estimators, their standard errors, and corresponding p-values. b. Draw the scatterplot of Y versus X and add the least squares line to the scatterplot. c. Obtain the fitted values ˆyi and residuals ei . Print the...
R Problem Set: #Work with the inbuilt dataset "Cars" View(cars) This will show you the dataset...
R Problem Set: #Work with the inbuilt dataset "Cars" View(cars) This will show you the dataset on 2 variables speed and distance. ?cars This will explain what the variables mean. #Q1) Describe the dataset. What are the main findings? #Q2) Design a relevant question to model using linear regressions #Q3) Run the regression and report the std error, t-stat, p value and f stat. #Q4) Is this a valid regression? Is the normality assumption justified? Show clearly. #Q5) Are there...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing the level in feet of Lake Huron from 1872- 1972. To assign the values into an ordinary vector,x, we can do the following 'x <- as.vector(LakeHuron)'. From there, we can access the data easily. Assume the values in X are a random sample from a normal population with distribution X. Also assume the X has an unknown mean and unknown standard deviation. With this...
For this step, you should draw an E-R Diagram for this dataset. To this goal, you...
For this step, you should draw an E-R Diagram for this dataset. To this goal, you should conduct the following steps: 1. Identify the entities and composite entities 2. Identify the entities’ attributes and primary key 3. Identify the relationships among the entities and between entities and composite entities 4. Specify the relationships' integrities
Instructions tell you how to get the data in R R has built in dataset called...
Instructions tell you how to get the data in R R has built in dataset called Iris. This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. We are interested in estimating the length of Petal (Y) using the length of Sepal (X). First, load the...
word typing, please. no photo or handwriting 1. Name TWO (2) inadequacies in the dataset that...
word typing, please. no photo or handwriting 1. Name TWO (2) inadequacies in the dataset that might cause you to prefer a nonparametric test over parametric test to analyze the data. 2. Write a relevant study objective in the field of finance or economics which Wilcoxon Signed-Rank test will be the most appropriate test to solve the problem. Brief the characteristics of the data which make the test is the most appropriate for the analysis.
Please do these questions in the R language 1. Load the cars dataset into R. It...
Please do these questions in the R language 1. Load the cars dataset into R. It is a built-in dataset. 2. Do an str() to determine the number of observations and variables. Enter your answer as a comment. 3. Plot speed on x axis and distance on y axis. 4. Find the correlation between speed and distance. What does the magnitude and sign indicate? Enter your answer as a comment. 5. Build a linear regression model with speed as the...
Consider the beauty dataset from the wooldridge package in R. Suppose you wish to estimate the...
Consider the beauty dataset from the wooldridge package in R. Suppose you wish to estimate the following equation: lwage=β0+β1educ+u Using heteroscedastic-robust standard errors, conduct the hypothesis test H0:β1=0 H1:β1≠0 What is the t-value associated with this test?
can anyone explain the code from R below? spam is a dataset sample <- sample( c(TRUE,...
can anyone explain the code from R below? spam is a dataset sample <- sample( c(TRUE, FALSE), nrow(spam), replace=TRUE) train <- spam[sample,] test <- spam[!sample,] when i run train and test, we got two different datasets that are split from the original spam dataset. But i don't understand the first line of code. not sure why this line of code can split data into two sets.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT