Question

In: Statistics and Probability

G-Test of Independence In a study of the relation between blood type and disease, large samples...

G-Test of Independence

In a study of the relation between blood type and disease, large samples of patients with peptic ulcers, patients with gastric cancer, and control persons (free from any of these diseases) were classified as to blood type (O, A, B, AB). In this problem, the relatively small numbers of AB patients were omitted for simplicity. The observed numbers are as follows:

Blood Type           Peptic Ulcer                           Gastric Cancer                     Controls

O                             983                                         383                                         2892

A                             679                                         416                                         2625

B                             134                                         84                                           570        

Perform the G-test of independence to analyze the dataset. Work the problem by hand and with SAS.

Graph the relative frequencies of the observed data.

Solutions

Expert Solution

Null Hypothesis (H0): Blood group of a patient does not affect whether they have Peptic Ulcer, Gastric Cancer, or no disease.

Total patients = (983+679+134+383+416+84+2892+2625+570) = 8766

% of patients with Peptic Ulcer = (983+679+134)/8766 = 20.49%

% of patients with Gastric Cancer = (383+416+84)/8766 = 10.07%

% of patients with no disease = (2892+2625+570)/8766 = 69.44%

Calculating expected values

E(O blood group patients having Peptic Ulcer) = 20.49% = 0.2049*(983+383+2892) = 872.46

E(O blood group patients having Gastric Cancer) = 10.07% = 0.1007*(983+383+2892) = 428.78

E(O blood group patients having no disease) = 69.44% = 0.6944*(983+383+2892) = 2956.76

E(A blood group patients having Peptic Ulcer) = 20.49% = 0.2049*(679+416+2625) = 762.23

E(A blood group patients having Gastric Cancer) = 10.07% = 0.1007*(679+416+2625) = 374.60

E(A blood group patients having no disease) = 69.44% = 0.6944*(679+416+2625) = 2583.17

E(B blood group patients having Peptic Ulcer) = 20.49% = 0.2049*(134+84+570) = 161.46

E(B blood group patients having Gastric Cancer) = 10.07% = 0.1007*(134+84+570) = 79.35

E(B blood group patients having no disease) = 69.44% = 0.6944*(134+84+570) = 547.19

G = 2*sum(Oi * ln(Oi/Ei)) for i = 1 to 9

G = 2*[983*ln(983/872.46) + 383*ln(383/428.78) + 2892*ln(2892/2956.76) + 679*ln(679/762.23) + 416*ln(416/374.60) + 2625*ln(2625/2583.17) + 134*ln(134/161.46) + 84*ln(84/79.35) + 570*ln(570/547.19)]

G = 40.6422

degrees of freedom = (3-1)*(3-1) = 4

p-value = 0.00001 < alpha at alpha = 0.1 => Null Hypothesis rejected.


Related Solutions

The National Heart, Lung, and Blood Institute completed a large-scale study of cholesterol and heart disease,...
The National Heart, Lung, and Blood Institute completed a large-scale study of cholesterol and heart disease, and reported that the national average for blood cholesterol level of 50-year old males was 210 mg/dl. A total of 89 men with cholesterol readings in the average range (200 – 220) volunteered for a low cholesterol diet for 12 weeks. At the end of the dieting period their average cholesterol reading was 204 mg/dl with a SD of 33 mg/dl. a. What is...
You are conducting a test of independence for the claim that there is an association between...
You are conducting a test of independence for the claim that there is an association between the row variable and the column variable. X Y Z A 27 19 8 B 42 25 29 What is the chi-square test-statistic for this data? χ2= Report all answers accurate to three decimal places. You are conducting a test of the claim that the row variable and the column variable are dependent in the following contingency table. X Y Z A 43 45...
To save money, lab usually merges blood samples for test. Samples from multiple people will be...
To save money, lab usually merges blood samples for test. Samples from multiple people will be tested only once, if the result is negative (i.e. no virus, every criterion is in the normal range, etc.), then all these people tested are healthy. If the result is positive (at least one person from this batch whose blood sample is abnormal), then these samples are tested one-by-one. Suppose all samples are taken independently. If the probability that for a person to get...
We are only given the accuracy of the blood test and prevalence of the disease in...
We are only given the accuracy of the blood test and prevalence of the disease in our population. We are told that the blood test is 98 percent reliable, this means that the test will yield an accurate positive result in 98% of the cases where the disease is actually present. Gestational diabetes affects 9 percent of the population in our patient’s age group, and that our test has a false positive rate of 12 percent. Use your knowledge of...
The following is the data obtained from a set of samples on the relation between statistics...
The following is the data obtained from a set of samples on the relation between statistics final exam scores and the students’ confidence rating on mathematical skills. Student ID Exam score Math confidence Z exam Z math conf A 60 1 A -.77 -1.61 B 80 5 B .77 1.36 C 70 3 C 0 -0.12 D 50 2 D -1.55 -0.87 E 90 4 E 1.55 0.62 F 70 4 F 0 0.62 Mean 70 3.17 SD 12.9 1.34...
The following is the data obtained from a set of samples on the relation between statistics...
The following is the data obtained from a set of samples on the relation between statistics final exam scores and the number of missed classes. Student ID Exam score Class missed Z exam Z class missed A 90 0 A 1.46 -1.46 B 80 1 B 0.88 -0.88 C 70 3 C 0.29 0.29 D 60 2 D -0.29 -0.29 E 50 4 E -0.87 0.88 F 40 5 F -1.46 1.46 Mean 65 2.5 SD 17.08 1.71 9. a....
The Chi-Square test of independence is used to determine if there is a significant relationship between...
The Chi-Square test of independence is used to determine if there is a significant relationship between two nominal categorical variables. The frequency of one nominal variable is compared with different values of the second nominal variable. The chi-squared goodness of fit test is appropriate when the following conditions are met: The sampling method is simple random sampling The variable under study is categorical The expected value of the number of sample observations in each level of the variable is at...
A blood test indicates the presence of a particular disease 96​% of the time when the...
A blood test indicates the presence of a particular disease 96​% of the time when the disease is actually present. The same test indicates the presence of the disease 0.8​% of the time when the disease is not present. Two percent of the population actually has the disease. Calculate the probability that a person has the​ disease, given that the test indicates the presence of the disease.
In a study on a blood disease, the normal distribution of hemoglobin values and its arithmetic...
In a study on a blood disease, the normal distribution of hemoglobin values and its arithmetic mean is 12.5 and its standard deviation is 1.0. Accordingly, this type of patients: a) What is the probability that hemoglobin values ​​in the blood will be between 11.5 and 13.0? P (11.5 <X≤ 13.0) =? b) What symmetrical limits are the hemoglobin values ​​of 80% of patients relative to the mean? c) What is the hemoglobin value of the patient who has hemoglobin...
Researchers at a hospital lab are studying the relationship between blood type and a type of...
Researchers at a hospital lab are studying the relationship between blood type and a type of surgical procedure. Typically, they hypothesize that distribution of blood type of people undergoing surgery should be similar to that of the population it serves. The distribution of the population that the hospital services is: 44% type O, 45% Type A, 8% Type B and 3% Type AB. They have blood type data on 187 patients who underwent the surgery at that hospital as follows:...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT