Question

In: Statistics and Probability

Activity 9: Linear Regression and Correlation Analysis Scenario: A graduate student has administered a pro-inflammatory substance,...

Activity 9: Linear Regression and Correlation Analysis

Scenario: A graduate student has administered a pro-inflammatory substance, lipopolysaccharide (LPS), to humans in the form of a pill (several doses – 0mg or placebo, 5mg, 10mg, and 15mg). She then determines the blood concentration of a particular protein that is thought to be upregulated due to LPS (mg) called Inflammotin (pg/ml) using ELISA. Find the linear model and the correlation coefficient of the experimental data (in JMP and Excel) using the data posted in the Activity 9 folder.   

DATA:

LPS Inflammotin
0 4.12
0 2.02
0 3.75
0 2.34
0 4.2
0 1.57
0 5.2
0 5.23
0 4.87
0 4.07
5 11.01
5 9.55
5 8.74
5 10.02
5 8.32
5 7.66
5 9.01
5 6.67
5 11.99
5 10.09
10 101.22
10 78
10 234.42
10 81.22
10 69.22
10 97.88
10 139.14
10 78.22
10 138.22
10 178.12
15 652.32
15 772.12
15 672.99
15 688.12
15 452.22
15 690.22
15 852.12
15 462.98
15 581.49
15 578.9

JMP Directions:

  1. Open the JMP file (.jmp) “Activity 9 Data” in JMP.
  2. Click “Analyze” → “Fit Y by X”  
  3. Select “Inflammotin” → “Y, Response” and then “LPS” → “X, Factor”. Click OK.
  4. Click the red arrow next to “Bivariate Fit of Inflammotin By LPS” and select “Fit Line”.
  5. Record the equation underneath the “Linear Fit” header.
  6. Record the R2 value from the “Summary of Fit”. Use a calculator to square root it and find R.   
  7. Answer the questions below.

Task #1: Write the equation for the linear regression. _____________________________________________________________________

Task #2: According to your JMP results, is there a correlation between LPS and Inflammotin concentration? Why or why not? Hint: what is your correlation coefficient?

____________________________________________________________________________________________________________________________________

____________________________________________________________________________________________________________________________________

Task #3: Is this linear model the best fit for the data? Include your plot here. Why or why not? Hint: what is your coefficient of determination?

____________________________________________________________________________________________________________________________________

____________________________________________________________________________________________________________________________________

Excel Directions:

  1. Open the Excel file “Activity 9 Data” in Excel.
  2. Graph the data using a scatter plot [x-axis = LPS (mg) and y-axis = Inflammotin (pg/ml)].
  3. Label the graph well (axes and title).
  4. Add a trendline to the graph by clicking the “+”.

a. Select a linear trendline by clicking the black arrow beside “trendline”.

Activity 9: Linear Regression and Correlation Analysis

  

  1. Add the equation and R2 value to the graph by clicking “More Options” (also found by clicking the black arrow by “trendline”.
    1. On the right side, find the “Format Trendline” box.
    2. Scroll to the bottom and select “Display Equation on chart” and “Display R-squared value on chart”.
  2. In an empty cell on the worksheet, type “=Pearson(“ and then highlight the LPS data column (exclude header), place a comma, and then highlight the Inflammotin data column (exclude header), close the parenthesis, and press enter.
    1. Fit this form: Pearson(array1, array2) where array1 is the independent variable and array2 is the dependent variable.
  3. Answer the questions below.

Task #4: Write the equation for the linear regression. ______________________________________________________________________

Task #5: According to your Excel results, is there a correlation between LPS and Inflammotin concentration? Why or why not? Hint: what is your correlation coefficient?

____________________________________________________________________________________________________________________________________

____________________________________________________________________________________________________________________________________

Task #6: Is this linear model the best fit for the data? Include your plot here. Why or why not? Hint: what is your coefficient of determination?

____________________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________________

Task #7: Do the answers in JMP and Excel match?

____________________________________________________________________________________________________________________________________

____________________________________________________________________________________________________________________________________

Solutions

Expert Solution

Note : Allow to solve only 4 sub-question in one part. Hence solved and provided a detailed method to do in only excel along with screenshot.

Step 1 : Put the data in excel as shown, highlight it, go to INSERT, and click on a scatterplot

.

Step 2 : The scatter plot will be insert, click on the title and update the name of the plot.

Step 3 : To insert the axes title and the regression line, click on the plus sign and tick Axis titles and trendline.

Update the label and the you will get graph as shown below.


Step 4 : To display the regression equation and R2, go to the plus sign once again, go to trendline and then into it to more options.


Step 5: In the menu, click on the highlight parts.

Step 6: The plot will be produced with the needed details.

To find the correlation

The formula to use along with the value is shown below.

Task #4: Write the equation for the linear regression.
y = 40.402x - 109.77

According to your Excel results, is there a correlation between LPS and Inflammotin concentration? Why or why not? Hint: what is your correlation coefficient?

Correlation coefficient = 0.8361
Hence we can say that LPS and inflammation have a strong positive correlation.


Correlation between two variables defines the strength and the relationship between two variables.
By strength we mean, how strong or weak is the association between the two variables.

The correlation coefficient takes a value between 0 and 1 and it can have a positive or negative sign depending on the relationship.

Higher the value, stronger is the relationship.
A positive sign indicates that as one variables increase or decreases, the other variable also increases or decreases in the same proportion.

A negative sign indicates that as one variable increases the other decreases and vice versa.

Is this linear model the best fit for the data? Include your plot here. Why or why not? Hint: what is your coefficient of determination?

Understanding Rsquare.
Coefficient of determination(rsqaure) = 0.6991

It is the measure of the amount of variability in y explained by x. Its value lies between 0 and 1. Greater the value, better is the model. In this case, it 70%, hence the model is good.

In this case, we can say that LPS explains 70% of the variability in Inflammation.

Additional info.
But still is not a very good model because the standard error of prediction or RMSE for the model will be very high.

Root MSE or Root Mean square error( which is calculated to be about 150), tells us the standard deviation of the residuals (acutal minus predicted values). In other words, the residuals tell us how far the actual points is from the regression line. RMSE helps us understand the spread of the residuals. If the RMSE value is high it indicates that the residuals are far away from the regression line or if it low, it indicates the actual points are very close to the regression line.



Related Solutions

How is a linear regression analysis is different from a correlation analysis?
How is a linear regression analysis is different from a correlation analysis?
HW 9: Linear Regression and Correlation Analysis INSTRUCTIONS: Please modify the Excel sheet attached to complete...
HW 9: Linear Regression and Correlation Analysis INSTRUCTIONS: Please modify the Excel sheet attached to complete this assignment. Turn in the following work including: 1) written formulas, 2) values from Excel plugged into formulas, and 3) final answer. Scenario: A graduate student has administered a pro-inflammatory substance, lipopolysaccharide (LPS), to rats in the form of a pill (several doses – 0 mg or placebo, 1.25 mg, 2.5 mg, 5.0 mg, and 10 mg). She then determines the blood concentration of...
8. Linear equations and the regression line Suppose a graduate student does a survey of undergraduate...
8. Linear equations and the regression line Suppose a graduate student does a survey of undergraduate study habits on his university campus. He collects data on students who are in different years in college by asking them how many hours of course work they do for each class in a typical week. A sample of four students provides the following data on year in college and hours of course work per class: Student Year in College Course Work Hours per...
Regression and correlation analysis both describe the strength of linear relationships between variables. Consider the concepts...
Regression and correlation analysis both describe the strength of linear relationships between variables. Consider the concepts of education and income. Many people would say these two variables are related in a linear fashion. As education increases, income usually increases (although not necessarily at the same rate). Can you think of two variables that are related in such a way that their relationship changes over their range of possible values (i.e., in a curvilinear fashion)? How would you analyze the relationship...
You are developing a simple linear regression analysis model. The simple correlation coefficient between y and...
You are developing a simple linear regression analysis model. The simple correlation coefficient between y and x is -0.72. What do you know must be true about b1. The least squares estimator of B1? Why? In a multiple linear regression analysis with k = 3. From the t test associated with B1, you conclude that B1 = 0. When you do the f test will you reject or fail to reject the null hypothesis? Why? In a simple bilinear regression...
9. The data presented in Problem 7 are analyzed using muliple linear regression analysis and the...
9. The data presented in Problem 7 are analyzed using muliple linear regression analysis and the models are shown here. In the models, the data are coded as 1= new treatment and 0= standard treatment, and age greater than 65 is coded as 1= yes and 0= no. y= 53.85- 23.54 (Treatment) y= 45.31- 19.88 (Treatment) + 14.64 (Age > 65) y= 45.51 - 20.21 (Treatment) + 14.29 (Age> 65) + .75 (Treatment X Age > 65) Patients < 65...
Problem Set 2: Linear Regression Analysis Research Scenario: A social psychologist is interested in whether the...
Problem Set 2: Linear Regression Analysis Research Scenario: A social psychologist is interested in whether the number of days spent in a refugee camp predicts trauma levels in recently resettled refugees. He interviews 17 refugees to determine how many days they spent in a refugee camp before being resettled, then administers the Harvard Trauma Questionnaire Part IV (HTQ Part 4), where a higher score indicates higher levels of trauma (Mollica et al., 1992). He compiles the information in the table...
Problem Set 2: Linear Regression Analysis Research Scenario: A social psychologist is interested in whether the...
Problem Set 2: Linear Regression Analysis Research Scenario: A social psychologist is interested in whether the number of days spent in a refugee camp predicts trauma levels in recently resettled refugees. He interviews 17 refugees to determine how many days they spent in a refugee camp before being resettled, then administers the Harvard Trauma Questionnaire Part IV (HTQ Part 4), where a higher score indicates higher levels of trauma (Mollica et al., 1992). He compiles the information in the table...
Problem Set 2: Linear Regression Analysis Research Scenario: A social psychologist is interested in whether the...
Problem Set 2: Linear Regression Analysis Research Scenario: A social psychologist is interested in whether the number of days spent in a refugee camp predicts trauma levels in recently resettled refugees. He interviews 17 refugees to determine how many days they spent in a refugee camp before being resettled, then administers the Harvard Trauma Questionnaire Part IV (HTQ Part 4), where a higher score indicates higher levels of trauma (Mollica et al., 1992). He compiles the information in the table...
Correlation and Simple Linear Regression Analysis Determine the following using data about "hours studying per weekend"...
Correlation and Simple Linear Regression Analysis Determine the following using data about "hours studying per weekend" and "grade point average" below. a) scatter diagram with hours the independent variable b) coefficient of correlation c) coefficient of determination d) coefficient of nondetermination e) average error for predicting grades f) .01 level of significance test for slope being 0 g) regression equation h) expected grades for people who study 5 hours Hours Grades 3 3.0 2 2.0 6 3.8 3 2.6 4...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT