In: Statistics and Probability
Q5. [20] We would like to determine whether any association exists between the survival time and level of water toxicity, region and age of the patients.
Survival time is coded as 1 if < 1 month, 2 = 1-3 months, and 3 = more than 3 months.
Survival |
Region |
Toxic Level |
Age |
1 |
1 |
62.00 |
67 |
1 |
2 |
46.00 |
72 |
2 |
1 |
48.50 |
56 |
3 |
2 |
32.00 |
35 |
2 |
1 |
63.50 |
60 |
1 |
1 |
41.25 |
65 |
2 |
2 |
40.00 |
45 |
3 |
1 |
34.25 |
40 |
2 |
1 |
34.75 |
54 |
1 |
2 |
46.25 |
63 |
2 |
1 |
43.50 |
60 |
2 |
2 |
46.00 |
55 |
3 |
1 |
72.50 |
29 |
1 |
2 |
53.00 |
89 |
1 |
2 |
43.50 |
75 |
1 |
1 |
56.00 |
59 |
2 |
1 |
40.00 |
51 |
3 |
2 |
48.00 |
51 |
2 |
1 |
46.50 |
61 |
2 |
2 |
72.00 |
57 |
3 |
2 |
31.00 |
42 |
1 |
1 |
48.00 |
61 |
2 |
2 |
36.50 |
57 |
2 |
2 |
43.75 |
55 |
2 |
1 |
34.25 |
61 |
2 |
1 |
41.25 |
47 |
3 |
1 |
38.00 |
52 |
2 |
2 |
59.00 |
55 |
2 |
1 |
52.50 |
81 |
3 |
1 |
57.50 |
35 |
Hello,
Here, according to the given dataset, we observe that our dependent variable, survival is ordinal in nature, Hence the most appropriate model to apply to this dataset for the required analysis is the Ordinal Logistic Regression Model. We will be using the R programming language for our analysis of the given dataset:
Note: Copy the dataset into an excel sheet, and save it in a comma-delimited format(.csv) first
R codes:
install.packages("MASS")
library(MASS)
data1=read.csv(file.choose(), header=TRUE) # Acess the dataset from the destined location
attach(data1)
surv=as.ordered(survival)
model= polr(surv~region+toxic_level+age, data=data1, Hess=TRUE)
ctable=coef(summary(model))
p=pnorm(abs(table[,"t value"]), lower.tail=FALSE)*2
ctable=cbind(table, "p value"=p)
Results and Interpretation:
From the above test, we obtained a value of Residual Deviance of 34.91346 and the corresponding values of p for the independent variables of the region, toxic_level, and age are 0.93228, 0.67567, 0.00065 and the corresponding values of the coefficients are: -0.07708, -0.01901, 0.23033.
a] We have used an Ordinal Logistic Regression Model and as per the value of the DEVIANCE is concerned, it is a small value, indicating that our model is a Good Fit to the given dataset.
b] As per the values of coefficients and p values are concerned, The variable AGE has the most significant impact upon the survival time when compared to the region and toxic_level factors, given in our dataset
Thank You ...