In: Statistics and Probability
STAT2112 – Project#1
Case: Drug content assessment
Scientists at GlaxoSmithKline Medicines Research Center used high-performance liquid chromatography (HPLC) to determine the amount of drug in a tablet produced by the company (Reported in Analytical Chemistry, Dec 15, 2009).
Drug concentrations (measured as a percentage) from randomly selected tablets are tested for quality control purposes.
In this study, random samples of 25 tables were taken each from two different, independent production sites, Site#1 and Site#2. That is, there is a total of 50 random sample of tables in two groups together.
Drug concentration in each tablet is given in the data table (in a separate Excel file).
Researchers want to know: Is there a significant difference in drug concentration between the two sites?
Your task as statisticians is to conduct statistical analyses to investigate the issues and find answers. Present your analyses and findings as a report.
Project Outline
Your analysis and report must include the following:
1. Descriptive statistics of the variables. That is, data analysis of Site#1 and Site#2 data. Use a statistical software for the purpose. Need to explain the major findings.
2. Graphs of Site#1 and Site#2 data. They must include histogram with normal curve superimposed, stem-and-leaf, boxplot and QQ plot. Need to provide brief comments on histogram, boxplot and QQ Plot.
3. Explain the shapes of the data distribution for Site#1 and Site#2.
4. Conduct a confidence interval estimate for the mean difference in drug concentration between Site#1 and Site#2. Explain the assumptions and findings.
5. Conduct a hypothesis test for the mean difference in drug concentration between Site#1 and Site#2. State the hypotheses. Explain the assumptions and findings.
Project Outline
Your analysis and report must include the following:
1. Descriptive statistics of the variables. That is, data analysis of Site#1 and Site#2 data. Use a statistical software for the purpose. Need to explain the major findings.
2. Graphs of Site#1 and Site#2 data. They must include histogram with normal curve superimposed, stem-and-leaf, boxplot and QQ plot. Need to provide brief comments on histogram, boxplot and QQ Plot.
3. Explain the shapes of the data distribution for Site#1 and Site#2.
4. Conduct a confidence interval estimate for the mean difference in drug concentration between Site#1 and Site#2. Explain the assumptions and findings.
5. Conduct a hypothesis test for the mean difference in drug concentration between Site#1 and Site#2. State the hypotheses. Explain the assumptions and findings.
Report Format
Title page
[Page numbering is required.]
DATA
HPLC TEST DATA | HPLC TEST DATA | |
Site_1 Drug Concentration | Site_2 Drug Concentration | |
91.28 | 89.35 | |
92.83 | 86.51 | |
89.35 | 89.04 | |
91.9 | 91.82 | |
82.85 | 93.02 | |
94.83 | 88.32 | |
89.83 | 88.76 | |
89 | 89.26 | |
84.62 | 90.36 | |
86.96 | 87.16 | |
88.32 | 91.74 | |
91.17 | 86.12 | |
83.86 | 92.1 | |
89.74 | 83.33 | |
92.24 | 87.61 | |
92.59 | 88.2 | |
84.21 | 92.78 | |
89.36 | 86.35 | |
90.96 | 93.84 | |
92.85 | 91.2 | |
89.39 | 93.44 | |
89.82 | 86.77 | |
89.91 | 83.77 | |
92.16 | 93.19 | |
88.67 | 81.79 |
1)
Site_1 Drug Concentration | Site_2 Drug Concentration | |
Mean | 89.548 | 89.0332 |
Standard Error | 0.613388947 | 0.667734159 |
Median | 89.82 | 89.04 |
Mode | #N/A | #N/A |
Standard Deviation | 3.066944734 | 3.338670793 |
Sample Variance | 9.40615 | 11.14672267 |
Kurtosis | 0.068030112 | -0.540643964 |
Skewness | -0.732784427 | -0.396875034 |
Range | 11.98 | 12.05 |
Minimum | 82.85 | 81.79 |
Maximum | 94.83 | 93.84 |
Sum | 2238.7 | 2225.83 |
Count | 25 | 25 |
2)
Histogram of site 2 data looks normal but not of site 1
P-value of site 1 < 0.05
P-value ≤ α: The data do not follow a normal distribution (Reject H0)
If the p-value is less than or equal to the significance level, the decision is to reject the null hypothesis and conclude that your data do not follow a normal distribution.
So site 1 data does not follow the normal distribution.
3. Explain the shapes of the data distribution for Site#1 and Site#2.
Site 1 data does not have a normal distribution but site 2 has a normal distribution data.
4)
5)