Question

In: Statistics and Probability

Please find one medical dataset that is suitable for correlation, logistic regression and linear regression.

Please find one medical dataset that is suitable for correlation, logistic regression and linear regression.

Solutions

Expert Solution

For example, here are two graphs.

For the first, I dusted off the elliptical machine in our basement and measured my pulse after one minute of ellipticizing at various speeds:

Speed, kph Pulse, bpm
0 57
1.6 69
3.1 78
4 80
5 85
6 87
6.9 90
7.7 92
8.7 97
12.4 108
15.3 119

Graph of my pulse rate vs. speed on an elliptical exercise machine.

For the second graph, I dusted off some data from McDonald (1989): I collected the amphipod crustacean Platorchestia platensis on a beach near Stony Brook, Long Island, in April, 1987, removed and counted the number of eggs each female was carrying, then freeze-dried and weighed the mothers:

Weight, mg Eggs
5.38 29
7.36 23
6.13 22
4.75 20
8.10 25
8.62 25
6.30 17
7.44 24
7.26 20
7.17 27
7.78 24
6.23 21
5.42 22
7.87 22
5.25 23
7.37 35
8.01 27
4.92 23
7.03 25
6.45 24
5.06 19
6.72 21
7.00 20
9.39 33
6.49 17
6.34 21
6.16 25
5.74 22

Graph of number of eggs vs. dry weight in the amphipod Platorchestia platensis.

::There are three things you can do with this kind of data:

(1) One is a hypothesis test::

To see if there is an association between the two variables; in other words, as the X variable goes up, does the Y variable tend to change (up or down). For the exercise data, you'd want to know whether pulse rate was significantly higher with higher speeds. The P value is 1.3×10−8, but the relationship is so obvious from the graph, and so biologically unsurprising (of course my pulse rate goes up when I exercise harder!), that the hypothesis test wouldn't be a very interesting part of the analysis. For the amphipod data, you'd want to know whether bigger females had more eggs or fewer eggs than smaller amphipods, which is neither biologically obvious nor obvious from the graph. It may look like a random scatter of points, but there is a significant relationship (P=0.015).

(2) The second goal is to describe how tightly the two variables are associated::

This is usually expressed with r, which ranges from −1 to 1, or r2, which ranges from 0 to 1. For the exercise data, there's a very tight relationship, as shown by the r2 of 0.98; this means that if you knew my speed on the elliptical machine, you'd be able to predict my pulse quite accurately. The r2 for the amphipod data is a lot lower, at 0.21; this means that even though there's a significant relationship between female weight and number of eggs, knowing the weight of a female wouldn't let you predict the number of eggs she had with very much accuracy.

(3) The final goal is to determine the equation of a line that goes through the cloud of points::

The equation of a line is given in the form Ŷ=a+bX, where Ŷ is the value of Y predicted for a given value of X, a is the Y intercept (the value of Y when X is zero), and b is the slope of the line (the change in Ŷ for a change in X of one unit). For the exercise data, the equation is Ŷ=63.5+3.75X; this predicts that my pulse would be 63.5 when the speed of the elliptical machine is 0 kph, and my pulse would go up by 3.75 beats per minute for every 1 kph increase in speed. This is probably the most useful part of the analysis for the exercise data; if I wanted to exercise with a particular level of effort, as measured by pulse rate, I could use the equation to predict the speed I should use. For the amphipod data, the equation is Ŷ=12.7+1.60X. For most purposes, just knowing that bigger amphipods have significantly more eggs (the hypothesis test) would be more interesting than knowing the equation of the line, but it depends on the goals of your experiment.


Related Solutions

So, as we look at Linear Regression and correlation this week, please find provide an example...
So, as we look at Linear Regression and correlation this week, please find provide an example of how and when linear regression is used.
What are the similarities and differences between multiple linear regression, and logistic regression? Please consider Typical...
What are the similarities and differences between multiple linear regression, and logistic regression? Please consider Typical Application (used when), assumptions needed, Data Type
Determine and interpret the linear correlation coefficient, and use linear regression to find a best fit...
Determine and interpret the linear correlation coefficient, and use linear regression to find a best fit line for a scatter plot of the data and make predictions. Scenario According to the U.S. Geological Survey (USGS), the probability of a magnitude 6.7 or greater earthquake in the Greater Bay Area is 63%, about 2 out of 3, in the next 30 years. In April 2008, scientists and engineers released a new earthquake forecast for the State of California called the Uniform...
If a dependent variable is binary, is it optimal to use linear regression or logistic regression?...
If a dependent variable is binary, is it optimal to use linear regression or logistic regression? Explain your answer and include the theoretical and practical concerns associated with each regression model. Provide a business-related example to illustrate your ideas.
Find the linear regression equation (line of best fit), determine the correlation, and then make a...
Find the linear regression equation (line of best fit), determine the correlation, and then make a prediction. 1. The table below gives the amount of time students in a class studied for a test and their test scores. Graph the data on a scatter plot, find the line of best fit, and write the equation for the line you draw. Hours Studied 1 0 3 1.5 2.75 1 0.5 2 Test Score 78 75 90 89 97 85 81 80...
Please find at least one application of a multiple linear regression model in business analysis and...
Please find at least one application of a multiple linear regression model in business analysis and post your comments/thoughts and the web link of the information source,
Find the equation of the least-squares regression line ŷ and the linear correlation coefficient r for...
Find the equation of the least-squares regression line ŷ and the linear correlation coefficient r for the given data. Round the constants, a, b, and r, to the nearest hundredth. {(0, 10.8), (3, 11.3), (5, 11.2), (−4, 10.7), (1, 9.3)}
What approaches are there by which coefficients are estimated for linear and logistic regression? How is...
What approaches are there by which coefficients are estimated for linear and logistic regression? How is the deviance affected when an explanatory term is omitted (i know that it increases, but surely there is more to it?) In what situations would we use Beta-binomial regression?
Examine classification using logistic regression. In R console, type mtcars. The dataset mtcars is a generic...
Examine classification using logistic regression. In R console, type mtcars. The dataset mtcars is a generic dataset in R. This dataset comprises of fuel consumption and 10 aspects of automobile design and performance for 32 automobiles. Using only the variables am (0 = automatic, 1 = manual) and mpg, your task is to fit a logistic regression model. Complete the following steps using R. Create a scatter plot of am vs. mpg. Describe the relationship and explain why a simple...
This question involves the use of simple linear regression on the fat dataset that can be...
This question involves the use of simple linear regression on the fat dataset that can be found in the faraway library. data set. Use the lm() function to perform a simple linear regression with brozek (percent body fat using the reference method) on abdom (abdomen circumference in cm) as the predictor. Print the results of the summary(function) and submit along with your answers to the following questions. Is there a relationship between the predictor and the response? How strong is...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT