Question

In: Statistics and Probability

R-Studio; Statistics The data set in the table considers information on the spread of prostate cancer...

R-Studio; Statistics
The data set in the table considers information on the spread of prostate cancer to the lymph nodes for 53 patients.
For a sample of prostate cancer patients, a set of possible predictor variables were measured before surgery to determine if the lymph nodes were compromised. Subsequently, the patient underwent surgery and the status of his lymph nodes was determined.
The data set contains 53 observations of 7 variables:
id: identifiers for each subject in the study.

ssln: takes the value of 1 if the cancer has spread to the lymph nodes and 0 if not.

age: a numeric vector containing the age of the patient at the time of diagnosis.

acid: a numerical vector that contains the levels of acid phosphatase in the blood (serum acid phosphatase or prostatic acid phosphatase PAP). High PAP levels may be associated with the presence of prostate cancer.

xray: a measure of the seriousness of the cancer obtained from a radiological examination.
A value of 1 represents a more serious case.

size: Size of the tumor determined by palpation. 
A value of 1 identifies a large tumor that can be palpated without problems.

grade: Another measure of tumor seriousness obtained from a pathologist reading a biopsy obtained using a needle prior to surgery.
1 corresponds to a more serious case.
Use R-studio to determine which of the variables taken before surgery are associated with the spread of cancer to the lymph nodes.

Please provide the code you used to solve this problem.

id ssln age acid xray size grade
1 1 0 66 0.48 0 0 0
2 2 0 68 0.56 0 0 0
3 3 0 66 0.5 0 0 0
4 4 0 56 0.52 0 0 0
5 5 0 58 0.5 0 0 0
6 6 0 60 0.49 0 0 0
7 7 0 65 0.46 1 0 0
8 8 0 60 0.62 1 0 0
9 9 1 50 0.56 0 0 1
10 10 0 49 0.55 1 0 0
11 11 0 61 0.62 0 0 0
12 12 0 58 0.71 0 0 0
13 13 0 51 0.65 0 0 0
14 14 1 67 0.67 1 0 1
15 15 0 67 0.47 0 0 1
16 16 0 51 0.49 0 0 0
17 17 0 56 0.5 0 0 1
18 18 0 60 0.78 0 0 0
19 19 0 52 0.83 0 0 0
20 20 0 56 0.98 0 0 0
21 21 0 67 0.52 0 0 0
22 22 0 63 0.75 0 0 0
23 23 1 59 0.99 0 0 1
24 24 0 64 1.87 0 0 0
25 25 1 61 1.36 1 0 0
26 26 1 56 0.82 0 0 0
27 27 0 64 0.4 0 1 1
28 28 0 61 0.5 0 1 0
29 29 0 64 0.5 0 1 1
30 30 0 63 0.4 0 1 0
31 31 0 52 0.55 0 1 1
32 32 0 66 0.59 0 1 1
33 33 1 58 0.48 1 1 0
34 34 1 57 0.51 1 1 1
35 35 1 65 0.49 0 1 0
36 36 0 65 0.48 0 1 1
37 37 0 59 0.63 1 1 1
38 38 0 61 1.02 0 1 0
39 39 0 53 0.76 0 1 0
40 40 0 67 0.95 0 1 0
41 41 0 53 0.66 0 1 1
42 42 1 65 0.84 1 1 1
43 43 1 50 0.81 1 1 1
44 44 1 60 0.76 1 1 1
45 45 1 45 0.7 0 1 1
46 46 1 56 0.78 1 1 1
47 47 1 46 0.7 0 1 0
48 48 1 67 0.67 0 1 0
49 49 1 63 0.82 0 1 0
50 50 1 57 0.67 0 1 1
51 51 1 51 0.72 1 1 0
52 52 1 64 0.89 1 1 0
53 53 1 68 1.26 1 1 1

Solutions

Expert Solution

Using Rstudio,

We are asked to establish a causal relationship between the binary response "ssln" and the predictors - age, acid, x-ray, size, grade. The appropriate model to fit this data would be a logistic regression, run as follows:

Looking at the deviance residuals, we find that the that the quartiles are approximately equidistant from the median and the figures can be said to be centered around zero, which implies that the data is symmetrical (approx.).

From the p-values of the predictors, we find that the variables "x-ray" (p-value = 0.01177<0.05) and "size"(p-value = 0.01380 < 0.05) are the only significant predictors contributing to the model and are hence, associated with the spread of cancer to the lymph nodes.


Related Solutions

Plot logistic regression in Rstudio: The data set in the table considers information on the spread...
Plot logistic regression in Rstudio: The data set in the table considers information on the spread of prostate cancer to the lymph nodes for 53 patients. For a sample of prostate cancer patients, a set of possible predictor variables were measured before surgery to determine if the lymph nodes were compromised. Subsequently, the patient underwent surgery and the status of his lymph nodes was determined. The data set contains 53 observations of 7 variables: id: identifiers for each subject in...
( In R / R studio ) im not sure how to share my data set,...
( In R / R studio ) im not sure how to share my data set, but below is the title of my data set and the 12 columns of my data set. Please answer as best you can wheather its pseudo code, partial answers, or just a suggestion on how i can in to answer the question. thanks #---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The dataset incovid_sd_20201001.RDatacontains several variables related to infections of covid-19 for eachzip code in San Diego County as of October...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the ISLR package and the Auto data Determine the median value for mpg Use the median to create a new column in the data set named mpglevel, which is 1 if mpg>median and otherwise is 0. Make sure this variable is a factor. We will use mpglevel as the target (response) variable for the algorithms. Use the names() function to verify that your new column...
Consider the prostate dataset containing data from a study on 97 men with prostate cancer. You...
Consider the prostate dataset containing data from a study on 97 men with prostate cancer. You will have to install the 'faraway' package and use the 'prostate' dataset for Questions 1-5. Description of dataset Lcavol - log(cancer volume) Lweight - log(prostate weight) Age - age Lbph - log(benign prostatic hyperplasia amount) Svi - seminal vesicle invasion Lcp - log(capsular penetration) Gleason - Gleason score Pgg45 - percentage Gleason scores 4 or 5 Lpsa - log(prostate specific antigen) Build a KNN...
Write R code: Here are the first six observations from the prostate data set found in...
Write R code: Here are the first six observations from the prostate data set found in the faraway library. Use help(prostate) to describe the dataset and the variables in the data sets. obs lcavol lweight age lbph svi lcp gleason pgg45 lpsa 1 -0.579819 2.7695 50 -1.38629 0 -1.38629 6 0 -0.43078 2 -0.994252 3.3196 58 -1.38629 0 -1.38629 6 0 -0.16252 3 -0.510826 2.6912 74 -1.38629 0 -1.38629 7 20 -0.16252 4 -1.203973 3.2828 58 -1.38629 0 -1.38629 6...
Use R studio to do this problem. This problem uses the wblake data set in the...
Use R studio to do this problem. This problem uses the wblake data set in the alr4 package. This data set includes samples of small mouth bass collected in West Bearskin Lake, Minnesota, in 1991. Interest is in predicting length with age. Finish this problem without using Im() (a) Compute the regression of length on age, and report the estimates, their standard errors, the value of the coefficient of determination, and the estimate of variance. Write a sentence or two...
Using R studio 1. Read the iris data set into a data frame. 2. Print the...
Using R studio 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint: You could...
Using R Studio/R programming... Usually, we will use a random sample to estimate the statistics of...
Using R Studio/R programming... Usually, we will use a random sample to estimate the statistics of the underlying population. If we assume a given population is a standard normal distribution and we want to estimate its mean, which is the better technique to estimate that mean from a sample: Use the mean of one random sample of size 500 Use the mean of 300 random samples of size 10 Run your own experiment and use your results as a supporting...
Please Use R studio to answer the question. This is the Statistics section of Comparing Groups....
Please Use R studio to answer the question. This is the Statistics section of Comparing Groups. One month before the election, a poll of 630 randomly selected votes showed 54% planning to vote for a certain candidate. A week later, it became known that he had had an extramarital affair, and a new poll showed only 51% of 1010 voters supporting him. Do these results indicate a decrease in voter support fo his candidacy? a) Test an appropriate hypothesis as...
1. Risk assessment and screening procedure of:Prostate 2. Relevant information of Prostate cancer based on: a....
1. Risk assessment and screening procedure of:Prostate 2. Relevant information of Prostate cancer based on: a. Chief complaints b. Functional patterns c. Physical examination of patient with Prostate cancer 3. Pathophysiologic mechanics of prostate cancer is it a Solid tumor or Liquid tumors
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT