In: Statistics and Probability
A study wants to look at the correlation between sugar consumption and the development of cavities.
The table below shows the average daily intake of sugar (g) and the total number of cavities per patient over the one-year study period.
Daily Sugar Intake / Number of Cavities
(X) (Y)
30 2
40 3
150 3
90 0
75 1
25 1
110 4
4. What is the sample correlation coefficient given Σ(??−?̅)27?=1=12821.4, Σ(??−?̅)27?=1=12, and Σ(?−?̅)(?−?̅)=130? a. 0.33 b. 0.70 c. 0.87 d. -0.45
5. What type of correlation does this represent? a. Strong positive b. Strong inverse c. Weak positive d. Weak inverse
The investigator wants to construct a regression equation based on his current sample to be able to predict the number of cavities that a patient develops based only on their sugar intake given the standard deviation for the daily sugar intake is 43.25 and the standard deviation for the number of cavities is 1.41.
6. What is the slope of the line (i.e. what is b1)? a. 0.87 b. 0.01 c. 1.41 d. 0.50
7. What is the y-intercept (i.e. what is b0)? a. 1.26 b. 0.50 c. 0.01 d. 1.15
8. What is the predicted number of cavities for someone who consumes on average 45 grams of sugar a day? a. 1.71 b. 1.55 c. 0.67 d. 1.10
1. Correlation: Correlation shows the linear relationship between two continues variable.
The range of correlation value between -1 to 1.
If correlation value near to -1 means strong negative correlation in x and y i.e. if value of x increases then value of y decreases.
If correlation value near to 1 means strong positive correlation in x and y i.e. if value of x increases then value of y increases.
If correlation value near to 0 means no relation between x and y.
r = n(Σxy) – (Σx × Σy) / [√[[n(Σx2) – ((Σx)2)] × [n(Σy2) – ((Σy)2)]]]
Put the value of x and y in this formula then correlation is,
r = 0.3314
Q - 4. a) correlation = 0.33
Q - 5. c) Weak positive
because correlation value is less than 0.5
Linear Regression:
Linear regression is used to predict the value of dependent variable if value of independent variable known.
It is simple linear regression in x and y
y = target or dependent variable
x = independent or predictor variable
Fitted Regression line for x and y:
Y = b0 + b1 * X
where b0 = intercept or constant
b1 = slope of regression line
we can compute value of b0 using,
b0 = y – b1 * x ................
so we need to compute b1 first,
b1 = Σ(x - mean(x)) * (y - mean(y)) / Σ(x - mean(x))^2
put values of x and y are,
Q - 6: slope of the line = 0.01
Q - 7: intercept of the line = 1.26
So fitted regression line is,
Y = 1.26 + 0.01 * X
put X = 45
Y = 1.71
Q - 8: Sugar consumption = 1.71