Question

In: Statistics and Probability

(1) Read in the data and create an R data frame named tennis.dfr that has the...

(1) Read in the data and create an R data frame named tennis.dfr 
that has the following names for its columns:  first.name, last.name,
major.match.wins, major.match.losses, overall.match.wins, 
overall.match.losses, major.titles, overall.titles.  (Note that the 
data file has several explanatory lines before the real data begin 
that should be skipped when reading in the data lines.)
NOTE:  For the file name, you must use the following web address (URL): 
"http://people.stat.sc.edu/hitchcock/tennisplayers2018.txt".  
Please do not have your code read in the file from your own personal directory.

(2) Create and add two more columns called major.winning.pct and 
overall.winning.pct (showing winning percentage in the "major" and 
"overall" categories, respectively) to this data frame.
  
Note that "winning percentage" is defined 
as (match wins)/(match wins + match losses).

(3) Sort the data frame by major titles, from most to least.  
Have your program print the sorted data frame.

(4) Perform a nested sort, sorting the data frame first by major
titles (from most to least), and then by major winning percentage 
(from most to least) within major-title levels.
Have your program print this sorted data frame.

(5) Have R extract the subset of the data frame consisting of players
with at least 6 major titles.  Call this new data frame: greatest.dfr
Have your program print this new data frame.

(6)  In the most efficient way possible, have R calculate the sample means 
for each of the numeric variables in the tennis.dfr data set.
(Hint: Extract the appropriate subset of the data frame first.)

(7) Use the write.table() function to write the data set tennis.dfr to an
external file simply called "tennisdata.txt".  Make sure the external file includes the column names.
Also, make sure the players' names are NOT surrounded by quotes in the 
external file.

Solutions

Expert Solution

(1) R-Code:

data = read.table("http://people.stat.sc.edu/hitchcock/tennisplayers2018.txt",
header = F, fill = TRUE, skip = 7)
colnames(data) = c("first.name", "last.name", "major.match.wins", "major.match.losses",
"overall.match.wins", "overall.match.losses", "major.titles",
"overall.titles")

(2)

data$major.winning.pct = (data$major.match.wins)/(data$major.match.wins + data$major.match.losses)
data$overall.winning.pct = (data$overall.match.wins)/(data$overall.match.wins + data$overall.match.losses)

(3)

data = data[order(-data$major.titles),]
View(data)

(4)

data = data[order(-data$major.titles, -data$major.winning.pct),]
View(data)

(5)

library(dplyr)
data1 = data %>% filter(major.titles >= 6)

(6)

sapply(data[3:8], mean)
major.match.wins major.match.losses overall.match.wins overall.match.losses
159.966667 41.700000 700.900000 225.833333
major.titles overall.titles
6.366667 50.700000

(7)

write.table(data, "tennisdata.txt", col.names = T, row.names = F, quote = F, sep = " ", qmethod = "double")


Related Solutions

Using R studio 1. Read the iris data set into a data frame. 2. Print the...
Using R studio 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint: You could...
Create a data file frame in R called musseldata which has the following observations: species length...
Create a data file frame in R called musseldata which has the following observations: species length drywght tidehght calif     113    14.3     low tross      48     6.9     med calif      72     8.1     high calif      82     8.7     med tross      33     4.9     high tross      51     7.0     med calif      94   11.6     low Type the name of the data frame and copy/paste your R command the result into the green box. Use a logical condition with the subset() function to create a subset of the data called...
a. In R there is an built in data frame Nile. This has the annual flow...
a. In R there is an built in data frame Nile. This has the annual flow in river Nile for year 1871 to 1971. Produce a time series plot. Print graph(s). b. add the title as "Nile River Annual Flow", x axis label as "Year" and y axis label as "Flow". Print graph(s). c. Add a horizontal line showing the average flow over these years. Print graph(s). d. Add text as: "Average Flow:" with the calculated average flow on the...
This is for Predictive Analytics. 1. Read the iris data set into a data frame. 2....
This is for Predictive Analytics. 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint:...
Java Create a Project named Chap4b 1. Create a Student class with instance data as follows:...
Java Create a Project named Chap4b 1. Create a Student class with instance data as follows: student id, test1, test2, and test3. 2. Create one constructor with parameter values for all instance data fields. 3. Create getters and setters for all instance data fields. 4. Provide a method called calcAverage that computes and returns the average test score for an object to the driver program. 5. Create a displayInfo method that receives the average from the driver program and displays...
Part 1 Create a class named Room which has two private data members which are doubles...
Part 1 Create a class named Room which has two private data members which are doubles named length and width. The class has five functions: a constructor which sets the length and width, a default constructor which sets the length to 12 and the width to 14, an output function, a function to calculate the area of the room and a function to calculate the parameter. Also include a friend function which adds two objects of the room class. Part...
Install and load the dataset named Carseats (in the ISLR package) into R. Create a new...
Install and load the dataset named Carseats (in the ISLR package) into R. Create a new dataframe that is a copy of Carseats. Create two indicator (dummy) variables: Bad_Shelf = 1 if ShelveLoc = “Bad”, 0 otherwise Good_Shelf = 1 if ShelveLoc = “Good”, 0 otherwise Also, create two interaction variables: Price_Bad_Shelf = Price* Bad_Shelf Price_Good_Shelf = Price* Good_Shelf For Questions 1-2, please estimate a linear regression model (using the lm function) with Sales as the dependent variable and Price,...
Consider the Toluca data (it is in the le named Toluca.txt). Use R to do the...
Consider the Toluca data (it is in the le named Toluca.txt). Use R to do the following. (a) Read the data and t a linear model. Print the ANOVA table. (b) Compute 92% con dence interval for b0 and b1 using the appropriate R com- mand. (c) Test the hypothesis H0 : b1 = 0 vs Ha : b1 =/= 0 with level = .05. What is the p-value. (d) Can you create a con dence interval for the mean...
USE R STUDIO. Consider the pressure data frame. There are two columns: temperature and pressure: •...
USE R STUDIO. Consider the pressure data frame. There are two columns: temperature and pressure: • Construct a scatterplot with pressure on the vertical axis and temperature on the horizontal axis. • The graph of the following function passes through the plotted points reasonably well: y = (0.168 + 0.007 ∗ x) ^(20/3). Recall that the differences between the pressure values predicted by the curve (i.e. y) and the observed pressure values (i.e. the pressure values obtained from the data...
Import the RestaurantRating1 dataset in R and save the resulting data frame. RestaurantRating1 is shown below...
Import the RestaurantRating1 dataset in R and save the resulting data frame. RestaurantRating1 is shown below as a table. Use some of the data wrangling techniques to transform the dataset into a tidy data. Use glimpse() function to show the resulting dataframe. Donalds Fila King Payes Wendi 1 3 1 1 1 2 3 1 1 2 2 3 1 2 2 3 3 1 2 2 3 3 1 3 3 3 3 5 3 3 3 3 5...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT