In: Statistics and Probability
Observation | x | y |
1 | -22 | 22 |
2 | -33 | 49 |
3 | 2 | 8 |
4 | 29 | -16 |
5 | -13 | 10 |
6 | 21 | -28 |
7 | -13 | 27 |
8 | -23 | 35 |
9 | 14 | -5 |
10 | 3 | -3 |
11 | -37 | 48 |
12 | 34 | -29 |
13 | 9 | -18 |
14 | -33 | 31 |
15 | 20 | -16 |
16 | -3 | 14 |
17 | -15 | 18 |
18 | 12 | 17 |
19 | -20 | -11 |
20 | -7 | -22 |
Answer the following questions
a. Calculate the covariance between variables X and Y. Is it a positive or negative relationship between the two variables?
b. Calculate correlation coefficient between X and Y. Is it a positive or negative relationship? Is it a strong linear, weak linear or nonlinear relationship between X and Y?
c. Use the Y data to calculate mean, range, standard deviation and variance.
d. Use the first Y value to calculate the Z-score. Is it an outlier?
e. Calculate the 60th percentile for the Y data.
Solution-:
By using R- Software:
>
Obs=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20);Obs
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
>
x=c(-22,-33,2,29,-13,21,-13,-23,14,3,-37,34,9,-33,20,-3,-15,12,-20,-7);x
[1] -22 -33 2 29 -13 21 -13 -23 14 3 -37 34 9 -33 20 -3 -15 12
-20
[20] -7
>
y=c(22,49,8,-16,10,-28,27,35,-5,-3,48,-29,-18,31,-16,14,18,17,-11,-22);y
[1] 22 49 8 -16 10 -28 27 35 -5 -3 48 -29 -18 31 -16 14 18 17
-11
[20] -22
> length(Obs)
[1] 20
> length(x)
[1] 20
> length(y)
[1] 20
> Table=data.frame(Obs, x, y);Table
Obs x y
1 1 -22 22
2 2 -33 49
3 3 2 8
4 4 29 -16
5 5 -13 10
6 6 21 -28
7 7 -13 27
8 8 -23 35
9 9 14 -5
10 10 3 -3
11 11 -37 48
12 12 34 -29
13 13 9 -18
14 14 -33 31
15 15 20 -16
16 16 -3 14
17 17 -15 18
18 18 12 17
19 19 -20 -11
20 20 -7 -22
> #(a) For covariance
> cov(x,y)
[1] -421.6711
> #Iterpretation: It is a negative relationship between the two
variables.
> #(b)For correlation
> cor(x,y)
[1] -0.8133727
> #Interpretation: It is highly (or strongly) negative
relationship between x and y
> #(c) For mean,range,variance and standard deviation ofvariable
y
> mean=mean(y);mean
[1] 6.55
> max=max(y);max
[1] 49
> min=min(y);min
[1] -29
> Range=max-min;Range
[1] 78
> # For variance and standard Deviation
> n=30
> v=var(x);v
[1] 451.1447
> var=((n-1)/n)*v;var
[1] 436.1066
> sd=sqrt(var);sd
[1] 20.88316
> #(d) for first value of Z-score
> y=22
> Z=(y-mean)/sd;Z
[1] 0.7398304
> # The boundaries for outliers in a box plot are Q1 - 1.5*IQR
and Q3 + 1.5IQR. If the data point lie betwen them then it is not
considered as an outlier.
> # Here, we find Q1,Q3,IQR
>
y=c(22,49,8,-16,10,-28,27,35,-5,-3,48,-29,-18,31,-16,14,18,17,-11,-22);y
[1] 22 49 8 -16 10 -28 27 35 -5 -3 48 -29 -18 31 -16 14 18 17
-11
[20] -22
> Q1=quantile(y,0.25);Q1
25%
-16
> Q3=quantile(y,0.75);Q3
75%
23.25
> IQR=Q3-Q1;IQR
75%
39.25
> #The lower boundaries for outliers = Q1 - 1.5*IQR
> Q1 - 1.5*IQR
25%
-74.875
> #The upper boundary for outliers = Q3 + 1.5*IQR
> Q3 + 1.5*IQR
75%
82.125
> # Therefore, y=22 is not outlier
> #(e)For 60th percentile
> P60=quantile(y,0.60);P60
60%
15.2
R-code:
Obs=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20);Obs
x=c(-22,-33,2,29,-13,21,-13,-23,14,3,-37,34,9,-33,20,-3,-15,12,-20,-7);x
y=c(22,49,8,-16,10,-28,27,35,-5,-3,48,-29,-18,31,-16,14,18,17,-11,-22);y
length(Obs)
length(x)
length(y)
Table=data.frame(Obs, x, y);Table
#(a) For covariance
cov(x,y)
#Iterpretation: It is a negative relationship between the two
variables.
#(b)For correlation
cor(x,y)
#Interpretation: It is highly (or strongly) negative relationship
between x and y
#(c) For mean,range,variance and standard deviation ofvariable
y
mean=mean(y);mean
max=max(y);max
min=min(y);min
Range=max-min;Range
# For variance and standard Deviation
n=30
v=var(x);v
var=((n-1)/n)*v;var
sd=sqrt(var);sd
#(d) for first value of Z-score
y=22
Z=(y-mean)/sd;Z
# The boundaries for outliers in a box plot are Q1 - 1.5*IQR and Q3
+ 1.5IQR. If the data point lie betwen them then it is not
considered as an outlier.
# Here, we find Q1,Q3,IQR
y=c(22,49,8,-16,10,-28,27,35,-5,-3,48,-29,-18,31,-16,14,18,17,-11,-22);y
Q1=quantile(y,0.25);Q1
Q3=quantile(y,0.75);Q3
IQR=Q3-Q1;IQR
#The lower boundaries for outliers = Q1 - 1.5*IQR
Q1 - 1.5*IQR
#The upper boundary for outliers = Q3 + 1.5*IQR
Q3 + 1.5*IQR
# Therefore, y=22 is not outlier
#(e)For 60th percentile
P60=quantile(y,0.60);P60