Question

In: Statistics and Probability

Using SAS is preferred but not required. Please provide SAS code and the relevant output if...

Using SAS is preferred but not required. Please provide SAS code and the relevant output if SAS is used.

•             Link to referenced FEV.CSV: https://drive.google.com/open?id=1t1CRIbnTE7xL_OE9Bmajb564RXcg8nDo

For all hypothesis testing problems:

•             state the null and alternative hypotheses,

•             calculate the value of the test statistic,

•             determine if the results are statistically significant (using rejection region or p-value approaches),

•             then state your conclusion in terms of the problem.

1.            FEV (forced expiratory volume) is an index of pulmonary function that measures the volume of air expelled after one second of constant effort. The data fev.csv contains determinations of FEV on 654 children ages 3-19 who were seen in the Childhood Respiratory Disease Study in East Boston, Massachusetts. The variables in the data include:

ID:                          subject ID number

Age:                       age in years

FEV:                       FEV in liters

Height:                 height in inches

Sex:                       Male or Female

Smoker:               non = nonsmoker, Current = current smoker

a.            Make a boxplot of the FEV for children with age 3-8 years, 9-12 years, and 13 years or above. Does it appear that the FEV is the same for children from these three age groups?

b.            Is FEV the same across the three age groups? Perform a hypothesis test to answer the question. Use ? = 0.05.

c.             Rank the three age groups using the multiple comparison approach.

d.            What assumptions are made with regard to the analysis in part b? Check whether these assumptions are violated.

e.            Is FEV is more strongly related to sex or smoking status? Carry out appropriate statistical analysis to answer the question.

f.             The investigator is also interested in how height is associated with age. Construct the scatter plot of height against age. What is the relationship between height and age?

g.            Regardless of what you observed in f, fit the regression model with height as the response and age as the independent variable. What is the fitted regression equation?

h.            Test whether there is a positive correlation between age and height. Perform the hypothesis test using ? = 0.05.

i.              Is it appropriate to use the above regression model? Why or why not?

Solutions

Expert Solution

a. Make a boxplot of the FEV for children with age 3-8 years, 9-12 years, and 13 years or above. Does it appear that the FEV is the same for children from these three age groups?

I have used EXCEL to construct box plot of the FEV for children with age 3-8 years, 9-12 years, and 13 years or above. The first step to sort the data from youngest to oldest age group. Divide the data into three groups age 3-8 years, 9-12 years, and 13 years or above. Insert > Box & Whisker

No, it doesn't seem that FEV is same for children from these three age groups.

Findings from boxplot:

  • FEV Age 9-12 have few outliers (dots in boxplot)
  • FEV Age 3-8 and Age >=13 is slightly skewed towards above.
  • All three data is roughly normal as data is almost equally distributed around the median.
  • Median of each group seems to be different.

b. Is FEV the same across the three age groups? Perform a hypothesis test to answer the question. Use alpha = 0.05.

To find the difference between the three age groups we performed ANOVA analysis on the data

The null and alternative hypotheses:

Calculate the value of the test statistic:

EXCEL > DATA> Data Analysis (AddIn) > ANOVA Single factor

Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
FEV Age 3-8 215 399.519 1.858228 0.176462
FEV Age 9-12 322 903.802 2.806839 0.410088
FEV Age >=13 117 421.133 3.599427 0.633303
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 248.0558 2 124.0279 332.4582 3.2E-100 3.00956
Within Groups 242.8641 651 0.373063
Total 490.9198 653

F(2,651) = 332.4582, p = 3.2E-100

Determine if the results are statistically significant (using rejection region or p-value approaches),

P = 3.2E-100 which is highly significant and F value (332.4582) is way above F critical value of 3.00956

State your conclusion in terms of the problem

Since ANOVA shows that the between-group difference is significant thus we reject the null hypothesis and accept the alternate hypothesis that there is a significant difference in FEV values between the age groups.

c. Rank the three age groups using the multiple comparison approaches.

Do Not function available in EXCEL for post host tests like Tukey etc. So I have perfomed paired t test for each pair

3-8 vs 9-12 p = 8.64E-45

9-12 vs >=13 p = 2.47E-23

3-8 vs >=13 p = 2.57E-50

All three pair are significant

d. What assumptions are made with regard to the analysis in part b? Check whether these assumptions are violated.

There are three main assumptions, listed here:

  1. The dependent variable is normally distributed in each group that is being compared in the one-way ANOVA

  2. There is the homogeneity of variances. This means that the population variances in each group are equal.

  3. Independence of observations. This is mostly a study design issue and, as such, you will need to determine whether you believe it is possible that your observations are not independent based on your study design

We carried out the descriptive analysis of the FEV in three age groups. The result obtained is as follows

FEV Age 3-8 FEV Age 9-12 FEV Age >=13
Mean 1.858228 Mean 2.806839 Mean 3.59942735
Standard Error 0.028649 Standard Error 0.035687 Standard Error 0.07357204
Median 1.79 Median 2.756 Median 3.519
Mode 1.624 Mode 2.352 Mode 3.297
Standard Deviation 0.420073 Standard Deviation 0.640381 Standard Deviation 0.795803284
Sample Variance 0.176462 Sample Variance 0.410088 Sample Variance 0.633302868
Kurtosis -0.0875 Kurtosis 0.804245 Kurtosis -0.227353716
Skewness 0.242282 Skewness 0.693698 Skewness 0.411303102
Range 2.202 Range 3.766 Range 3.595
Minimum 0.791 Minimum 1.458 Minimum 2.198
Maximum 2.993 Maximum 5.224 Maximum 5.793
Sum 399.519 Sum 903.802 Sum 421.133
Count 215 Count 322 Count 117
Confidence Level(95.0%) 0.05647 Confidence Level(95.0%) 0.07021 Confidence Level(95.0%) 0.145718695
Shapiro-Wilk Test
FEV Age 3-8 FEV Age 9-12 FEV Age >=13
W-stat 0.988854391 0.97202266 0.977644226
p-value 0.093440645 6.72808E-06 0.047848341
alpha 0.05 0.05 0.05
normal yes no no

We also performed Shapiro Wilks test RealStats(AddIn) in EXCEL. Which shows that two of the age groups is not normal.

Sample variance of three groups are different in numerical value.

e. Is FEV is more strongly related to sex or smoking status? Carry out appropriate statistical analysis to answer the question.

Using EXCEL > AddIn > RealStats > Data Analysis > Corr > Correlation test

Carried correlation test between FEV and SEX and FEV and SMOKING

RESULT of correlation test on FEV and SMOKING

Correlation Coefficients
Pearson -0.245424571
Spearman -0.258349236
Kendall -0.211145277
Pearson's coeff (t test) Pearson's coeff (Fisher)
Alpha 0.05 Rho 0
Tails 2 Alpha 0.05
Tails 2
corr -0.245424571
std err 0.037965248 corr -0.245424571
t -6.464453173 std err 0.039133024
p-value 1.99285E-10 z -6.392409034
lower -0.319973478 p-value 1.63292E-10
upper -0.170875664 lower -0.316142406
upper -0.171994479

RESULT of correlation test on FEV and SEX

Correlation Coefficients
Pearson -0.20841
Spearman -0.14364
Kendall -0.11739
Pearson's coeff (t test) Pearson's coeff (Fisher)
Alpha 0.05 Rho 0
Tails 2 Alpha 0.05
Tails 2
corr -0.20841
std err 0.038303 corr -0.208414959
t -5.44121 std err 0.039133024
p-value 7.5E-08 z -5.39671039
lower -0.28363 p-value 6.78738E-08
upper -0.1332 lower -0.28059776
upper -0.13388797

Conclusion:

  • Both sex and smoking have a significant correlation with FEV and have small negative correlation (-0.1 to -0.3).
  • Smoking is more strongly correlated with spearmen correlation of -0.25 while has a correlation of -0.14

f. The investigator is also interested in how height is associated with age. Construct the scatter plot of height against age. What is the relationship between height and age?

The height and Age have a linear relationship with R square valued of 0.6272

g. Regardless of what you observed in f, fit the regression model with height as the response and age as the independent variable. What is the fitted regression equation?

Regression Analysis of height as response and Age as the independent variable

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.791943602
R Square 0.627174669
Adjusted R Square 0.626602851
Standard Error 3.485201717
Observations 654
ANOVA
df SS MS F Significance F
Regression 1 13322.52461 13322.52461 1096.808 8.0598E-142
Residual 652 7919.603418 12.14663101
Total 653 21242.12803
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 45.95779737 0.478358112 96.07404205 0 45.01848904 46.89710571
Age 1.529099387 0.046171116 33.1180949 8.1E-142 1.438437365 1.619761409

The regression equation will be

height = 45.95 + 1.52*Age

Thus for one every unit increase in Age, height will increase with 1.52. The regression analysis is significant with p <0.00000001. The coefficient of determination, R square = 0.62 which implies 62% variance in height is explained by Age.

h.Test whether there is a positive correlation between age and height. Perform the hypothesis test using alpha = 0.05.

Correlation test on height and age

Correlation Coefficients
Pearson 0.791973295
Spearman 0.818796185
Kendall 0.660161965
Pearson's coeff (t test) Pearson's coeff (Fisher)
Alpha 0.05 Rho 0
Tails 2 Alpha 0.05
Tails 2
corr 0.791973295
std err 0.023929566 corr 0.791973295
t 33.09601623 std err 0.039163022
p-value 0 z 27.450651
lower 0.744984849 p-value 6.8244E-166
upper 0.838961742 lower 0.761521493
upper 0.818936333

The Pearson correlation coefficient is 0.79 which depicts a strong positive correlation.


Related Solutions

Using SAS (also, please provide the syntax the output data) In a study investigating mortality among...
Using SAS (also, please provide the syntax the output data) In a study investigating mortality among teenage victims of motor vehicle accidents, information regarding the effectiveness of seatbelts was collected. In a sample of 250 teenagers who were wearing a seatbelt at the time of the accident, 15 died. In a sample of 125 teens who were not wearing a seatbelt, 17 died. We would like to test the null hypothesis that the proportions of teens who die as a...
Using SAS programming Please include the syntax and output of the information: In a study of...
Using SAS programming Please include the syntax and output of the information: In a study of factors thought to be responsible for the adverse effects of smoking on human reproduction, cadmium level determinations (nanograms per gram) were made on placenta tissue of a sample of 14 mothers who were smokers and an independent random sample of 18 nonsmoking mothers. The results were as follows- Nonsmokers: 10.0, 8.4, 12.8, 25.0, 11.8, 9.8, 12.5, 15.4, 23.5, 9.4, 25.1, 19.5, 25.5, 9.8, 7.5,...
Using SAS programming Please include the syntax and output of the information: In a study of...
Using SAS programming Please include the syntax and output of the information: In a study of factors thought to be responsible for the adverse effects of smoking on human reproduction, cadmium level determinations (nanograms per gram) were made on placenta tissue of a sample of 14 mothers who were smokers and an independent random sample of 18 nonsmoking mothers. The results were as follows- Nonsmokers: 10.0, 8.4, 12.8, 25.0, 11.8, 9.8, 12.5, 15.4, 23.5, 9.4, 25.1, 19.5, 25.5, 9.8, 7.5,...
I would need both SAS & R code, the output and .txt file/s, please. Because many...
I would need both SAS & R code, the output and .txt file/s, please. Because many HMOs either do not cover mental health costs or provide only minimal coverage, min- isters and priests often need to provide counseling to persons suffering from mental illness. An in- terdenominational organization wanted to determine whether the clerics from different religions have different levels of awareness with respect to the causes of mental illness. Fifteen clerics from different Christian denominations were sampled. Each was...
C++ program. Please explain how the code resulted in the output. The code and output is...
C++ program. Please explain how the code resulted in the output. The code and output is listed below. Code: #include <iostream> #include <string> using namespace std; int f(int& a, int b) {    int tmp = a;    a = b;    if (tmp == 0) { cout << tmp << ' ' << a << ' ' << b << endl; }    b = tmp;    return b;    return a; } int main() {    int a...
Using this SAS code for the data. The outcome variable is SBP and should be coded...
Using this SAS code for the data. The outcome variable is SBP and should be coded to appear on the y-axis. Note: you have to write the appropriate program Answer the following questions: Report the correlation coefficient and the p-value. (2pts) Report the parameter estimate for the intercept and heart rate (4pts) Using the regression equation, what is the expected blood pressure for someone with a heart rate of 153? (4pts) data sas_test; input hr sbp; datalines; 35 80 50...
provide a C code (only C please) that gives the output below: ************************************ *         Menu HW...
provide a C code (only C please) that gives the output below: ************************************ *         Menu HW #4 * * POLYNOMIAL OPERATIONS * * 1. Creating polynomials * * 2. Adding polynomials * * 3. Multiplying polynomials. * * 4. Displaying polynomials * * 5. Clearing polynomials. * * 6. Quit. * *********************************** Select the option (1 through 6): 7 You should not be in this class! ************************************* *         Menu HW #4 * * POLYNOMIAL OPERATIONS * * 1. Creating polynomials...
Using Fiscal Policy, if there is a gap, provide the change(s) required to eliminate any output...
Using Fiscal Policy, if there is a gap, provide the change(s) required to eliminate any output gap (don’t forget to consider the multiplier) Cchange only one component at a time – do not use a combination of changes – but consider all possible options. (Formula for multiplier:     Multiplier =( ) a) what is the multiplier? b) what are the change(s) using fiscal policy to close output gap?
I need the code in SAS and R and outputs please 2. The data below come...
I need the code in SAS and R and outputs please 2. The data below come from a study investigating a method of measuring body composition, and give the body fat percentage (% fat), age and sex for 18 adults aged between 23 and 61 years. Source: Mazess, R.B., Peppler, W.W., and Gibbons, M. (1984) Total body composition by dual-photon (153GD) absorptiometry. American Journal of Clinical Nutrition, 40, 834-839. age % fat sex 23 9.5 male 23 27.9 female 27...
In SAS code please. Tennis balls are tested in a machine to show how many bounces...
In SAS code please. Tennis balls are tested in a machine to show how many bounces they can withstand before they fail to bounce 30% of their dropping height. Two brands of balls (W and P) are compared. In addition, the effect of shelf life on these brands is tested. Half of the balls of each brand are 6 months old, the other half, fresh. Using a two-way analysis of variance, what conclusions can you reach? The data are shown...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT