Question

In: Statistics and Probability

What are the different commands in R software to analyze the Mayo clinic's "pbc" dataset?

What are the different commands in R software to analyze the Mayo clinic's "pbc" dataset?

Solutions

Expert Solution

Importing package

library(survival)

Loading data set

data(pbc)

Checking the head of data

head(pbc)

Structure of PBC data

str(pbc)

Summary of PBC data

summary(pbc)

The data set pbc (primary biliary cirrhosis) contains the main variable time, which is the observed, potentially censored survival time and the status, which indicates if the observation has been censored. In addition there are 17 observed covariates, where we show above five that in previous analyzes have been found to be important.

plot of the survival times

nr <- 50

plot(c(0, pbc$time[1]), c(1, 1), type = "l", ylim = c(0, nr + 1), xlim = c(0,

max(pbc$time[1:nr]) + 10), ylab = "nr", xlab = "Survival time")

for (i in 2:nr) lines(c(0, pbc$time[i]), c(i, i))

for (i in 1:nr) {

if (pbc$status[i] == 0)

points(pbc$time[i], i, col = "red", pch = 20) #censored

if (pbc$status[i] == 1)

points(pbc$time[i], i) }

In the complete dataset there are 418 patients, and the status variable shows that 161 have died, 232 have been censored and then 25 have got a transplant. We exclude those 25 observations from the further analysis. Furthermore, we change the status variable to a logical for subsequent use.

pbc <- subset(pbc, status != 1)

pbc <- transform(pbc, status = as.logical(status))

-> Estimation of the survival function using the Kaplan-Meier estimator can be done using the survfit function.

pbcSurv <- survfit(Surv(time, status) ~ 1, data = pbc)

plot(pbcSurv, mark.time = FALSE)

The result from survfit is an R object of type survfit. Calling summary with a survfit object as argument provides the non-parametric estimate of the survival function including additional information as a long list. The plot with the added, pointwise confidence bands
is perhaps more informative. Setting mark.time = FALSE prevents all the censored observations to be marked on the plot by a “+”.

The function Surv applied to the time and status variables for the PBC data is a function that creates a survival object.

-> We can compute the Kaplan-Meier and the Nelson-Aalen estimators directly. First we form the individuals at risk process for the subsequent computations

risk <- with(pbc, data.frame(Y = nrow(pbc):1, time = sort(time)))
surv <- with(pbc, risk[status[order(time)], ])

plot(Y ~ time, data = risk, type = "l")
points(Y ~ time, data = surv, col = "red") ## uncensored survivals

-> Then we compute the two non-parametric estimators of the survival function.

### Kaplan-Meier and Nelson-Aalen estimates
surv <- transform(surv, KM = cumprod(1 - 1/Y), NAa = exp(-cumsum(1/Y)))
plot(KM ~ time, data = surv, type = "s", ylim = c(0, 1), xlim = c(0, 4200))
lines(NAa ~ time, data = surv, type = "s", col = "blue")

-> It is difficult to see any differences. We can also compute and plot the estimated cumulative hazard functions. Then it is possible to observe small differences in the tail.

plot(-log(KM) ~ time, data = surv, type = "s", xlim = c(0, 4200))
lines(-log(NAa) ~ time, data = surv, type = "s", col = "blue")


Related Solutions

5. a. Analyze the Bread variable in the SandwichAnts dataset using aov() in R and interpret...
5. a. Analyze the Bread variable in the SandwichAnts dataset using aov() in R and interpret your results. The data may be found here: install.packages("Lock5Data") library(Lock5Data) data(SandwichAnts,package="Lock5Data") attach(SandwichAnts) b. State the linear model for this problem. Define all notation and model terms. c. Create the design matrix for this problem. d. Estimate model parameters for this problem using ? = (?T?)-1?T? e. Interpret the meaning of the estimates from part d. f. Rerun this problem using lm()in R. Interpret the...
Please use Statistical Software R Consider a dataset called fandango in fivethirtyeight package: Identify the Top...
Please use Statistical Software R Consider a dataset called fandango in fivethirtyeight package: Identify the Top 5 best rated and Top 5 worst rated movies based on rottentomatoes. Identify the Top 5 best rated and Top 5 worst rated movies based on the average of three users’ scores (rottentomatoes_user, metacritic_user, and imdb). Visualize the difference between Fandango stars and actual Fandango ratings. Comment on what you see. Construct a formal test to see if there is a significant difference between...
Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can...
Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can be found on Canvas). Now fit a simple linear regression model with X = lotSize and Y = workHrs. Summarize the output from the model: the least square estimators, their standard errors, and corresponding p-values. b. Draw the scatterplot of Y versus X and add the least squares line to the scatterplot. c. Obtain the fitted values ˆyi and residuals ei . Print the...
What are some of the available commands that can be utilized in R for carrying out...
What are some of the available commands that can be utilized in R for carrying out a one-sample and two-sample t-test? What are the differences between carrying out a t-test with equal and unequal variance? Please provide an example to illustrate your assertions.
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing the level in feet of Lake Huron from 1872- 1972. To assign the values into an ordinary vector,x, we can do the following 'x <- as.vector(LakeHuron)'. From there, we can access the data easily. Assume the values in X are a random sample from a normal population with distribution X. Also assume the X has an unknown mean and unknown standard deviation. With this...
What is meant by "open source" software? How is it different from "proprietary" software? What are...
What is meant by "open source" software? How is it different from "proprietary" software? What are some of the benefits and drawbacks to each style of software? Give some examples of each type of software (proprietary and open source).
In r studio, what is a method to find significant variables within an entire dataset?
In r studio, what is a method to find significant variables within an entire dataset?
Please do these questions in the R language 1. Load the cars dataset into R. It...
Please do these questions in the R language 1. Load the cars dataset into R. It is a built-in dataset. 2. Do an str() to determine the number of observations and variables. Enter your answer as a comment. 3. Plot speed on x axis and distance on y axis. 4. Find the correlation between speed and distance. What does the magnitude and sign indicate? Enter your answer as a comment. 5. Build a linear regression model with speed as the...
R Problem Set: #Work with the inbuilt dataset "Cars" View(cars) This will show you the dataset...
R Problem Set: #Work with the inbuilt dataset "Cars" View(cars) This will show you the dataset on 2 variables speed and distance. ?cars This will explain what the variables mean. #Q1) Describe the dataset. What are the main findings? #Q2) Design a relevant question to model using linear regressions #Q3) Run the regression and report the std error, t-stat, p value and f stat. #Q4) Is this a valid regression? Is the normality assumption justified? Show clearly. #Q5) Are there...
USING R STUDIO- Write the r commands for the following. 1. Non-Linear Models 1.1 Load the...
USING R STUDIO- Write the r commands for the following. 1. Non-Linear Models 1.1 Load the {ISLR} and {GGally} libraries. Load and attach the College{ISLR} data set. [For you only]: Open the College data set and its help file and familiarize yourself with the data set and its fields. 1.2 Inspect the data with the ggpairs(){GGally} function, but do not run the ggpairs plots on all variables because it will take a very long time. Only include these variables in...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT