In: Statistics and Probability
Activity 9: Linear Regression and Correlation Analysis
Scenario: A graduate student has administered a pro-inflammatory substance, lipopolysaccharide (LPS), to humans in the form of a pill (several doses – 0mg or placebo, 5mg, 10mg, and 15mg). She then determines the blood concentration of a particular protein that is thought to be upregulated due to LPS (mg) called Inflammotin (pg/ml) using ELISA. Find the linear model and the correlation coefficient of the experimental data (in JMP and Excel) using the data posted in the Activity 9 folder.
DATA:
LPS | Inflammotin |
0 | 4.12 |
0 | 2.02 |
0 | 3.75 |
0 | 2.34 |
0 | 4.2 |
0 | 1.57 |
0 | 5.2 |
0 | 5.23 |
0 | 4.87 |
0 | 4.07 |
5 | 11.01 |
5 | 9.55 |
5 | 8.74 |
5 | 10.02 |
5 | 8.32 |
5 | 7.66 |
5 | 9.01 |
5 | 6.67 |
5 | 11.99 |
5 | 10.09 |
10 | 101.22 |
10 | 78 |
10 | 234.42 |
10 | 81.22 |
10 | 69.22 |
10 | 97.88 |
10 | 139.14 |
10 | 78.22 |
10 | 138.22 |
10 | 178.12 |
15 | 652.32 |
15 | 772.12 |
15 | 672.99 |
15 | 688.12 |
15 | 452.22 |
15 | 690.22 |
15 | 852.12 |
15 | 462.98 |
15 | 581.49 |
15 | 578.9 |
JMP Directions:
|
Task #1: Write the equation for the linear regression. _____________________________________________________________________
Task #2: According to your JMP results, is there a correlation between LPS and Inflammotin concentration? Why or why not? Hint: what is your correlation coefficient?
____________________________________________________________________________________________________________________________________
____________________________________________________________________________________________________________________________________
Task #3: Is this linear model the best fit for the data? Include your plot here. Why or why not? Hint: what is your coefficient of determination?
____________________________________________________________________________________________________________________________________
____________________________________________________________________________________________________________________________________
Excel Directions:
a. Select a linear trendline by clicking the black arrow beside “trendline”. |
Activity 9: Linear Regression and Correlation Analysis
|
Task #4: Write the equation for the linear regression. ______________________________________________________________________
Task #5: According to your Excel results, is there a correlation between LPS and Inflammotin concentration? Why or why not? Hint: what is your correlation coefficient?
____________________________________________________________________________________________________________________________________
____________________________________________________________________________________________________________________________________
Task #6: Is this linear model the best fit for the data? Include your plot here. Why or why not? Hint: what is your coefficient of determination?
____________________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________________
Task #7: Do the answers in JMP and Excel match?
____________________________________________________________________________________________________________________________________
____________________________________________________________________________________________________________________________________
Note : Allow to solve only 4 sub-question in one part. Hence solved and provided a detailed method to do in only excel along with screenshot.
Step 1 : Put the data in excel as shown, highlight it, go to INSERT, and click on a scatterplot
.
Step 2 : The scatter plot will be insert, click on the title and update the name of the plot.
Step 3 : To insert the axes title and the regression line, click on the plus sign and tick Axis titles and trendline.
Update the label and the you will get graph as shown below.
Step 4 : To display the regression equation and
R2, go to the plus sign once again, go to trendline and then into
it to more options.
Step 5: In the menu, click on the highlight
parts.
Step 6: The plot will be produced with the needed details.
To find the correlation
The formula to use along with the value is shown below.
Task #4: Write the equation for the linear
regression.
y = 40.402x - 109.77
According to your Excel results, is there a correlation between LPS and Inflammotin concentration? Why or why not? Hint: what is your correlation coefficient?
Correlation coefficient = 0.8361
Hence we can say that LPS and inflammation have a strong positive
correlation.
Correlation between two variables defines the strength and the
relationship between two variables.
By strength we mean, how strong or weak is the association between
the two variables.
The correlation coefficient takes a value between 0 and 1 and it can have a positive or negative sign depending on the relationship.
Higher the value, stronger is the relationship.
A positive sign indicates that as one variables increase or
decreases, the other variable also increases or decreases in the
same proportion.
A negative sign indicates that as one variable increases the other decreases and vice versa.
Is this linear model the best fit for the data? Include your plot here. Why or why not? Hint: what is your coefficient of determination?
Understanding Rsquare.
Coefficient of determination(rsqaure) = 0.6991
It is the measure of the amount of variability in y explained by x. Its value lies between 0 and 1. Greater the value, better is the model. In this case, it 70%, hence the model is good.
In this case, we can say that LPS explains 70% of the variability in Inflammation.
Additional info.
But still is not a very good model because the standard error of
prediction or RMSE for the model will be very high.
Root MSE or Root Mean square error( which is calculated to be about 150), tells us the standard deviation of the residuals (acutal minus predicted values). In other words, the residuals tell us how far the actual points is from the regression line. RMSE helps us understand the spread of the residuals. If the RMSE value is high it indicates that the residuals are far away from the regression line or if it low, it indicates the actual points are very close to the regression line.