Question

In: Math

2. The data set `MLB-TeamBatting-S16.csv` contains MLB Team Batting Data for selected variables. Load the data...


2. The data set `MLB-TeamBatting-S16.csv` contains MLB Team Batting Data for selected variables. Load the data set from the given url using the code below. This data set was obtained from [Baseball Reference](https://www.baseball-reference.com/leagues/MLB/2016-standard-batting.shtml).
* Tm - Team   
* Lg - League: American League (AL), National League (NL)
* BatAge - Batters’ average age
* RPG - Runs Scored Per Game
* G - Games Played or Pitched
* AB - At Bats
* R - Runs Scored/Allowed
* H - Hits/Hits Allowed
* HR - Home Runs Hit/Allowed
* RBI - Runs Batted In
* SO - Strikeouts
* BA - Hits/At Bats
* SH - Sacrifice Hits (Sacrifice Bunts)
* SF - Sacrifice Flies

Using the `mlb16.data` data, do the following:
i) use `filter` to select teams with the following arguments:
a) Cardinals team `STL`.
b) teams with Hits `H` more than 1400 last 2016 season.   
c) team league `Lg` is National League `NL`.   
ii) use `arrange` to select teams in decreasing number of home runs `HR`.
iii) use `arrange` to display the teams in decreasing number of `RBI`.   
iv) use `group_by` to group the teams per league; and `summarise` to compute the average `RBI` within each league. Use the pipe `%>%` operator to string multiple functions.   
  


### Code chunk
```{r}
# load the data set
mlb16.data <- read.csv("https://raw.githubusercontent.com/jpailden/rstatlab/master/data/MLB-TeamBatting-S16.csv")
str(mlb16.data) # check structure
head(mlb16.data) # show first six rows

# last R code line
```

Solutions

Expert Solution

Code and results are given below


Related Solutions

Load “Lock5Data” into your R console. Load “OlympicMarathon” data set in “Lock5Data”. This data set contains...
Load “Lock5Data” into your R console. Load “OlympicMarathon” data set in “Lock5Data”. This data set contains population of all times to finish the 2008 Olympic Men’s Marathon. a) What is the population size? b) Now using “Minutes” column generate a random sample of size 5. c) Calculate the sample mean and record it (create a excel sheet or write a direct R program to record this) d) Continue steps (b) and (c) 10,000 time (that mean you have recorded 10,000...
Data Set The data set (attached) is a modified CSV file on all International flight departing...
Data Set The data set (attached) is a modified CSV file on all International flight departing from US Airports between January and June 2019 reported by the US Department of Transportation (https://data.transportation.gov/Aviation/International_Report_Passengers/xgub-n9bw). Each record holds a route (origin to destination) operated by an airline. This CSV file was modified to keep it simple and relatively smaller. Here is a description of each column: Column 1 – Month (1 – January, 2 – February, 3 – March, 4 – April, 5...
The family college data set contains a sample of 792 cases with two variables, teen and...
The family college data set contains a sample of 792 cases with two variables, teen and parents, and is summarized in Table below. The teen variable is either college or not, where the teenager is labeled as college if she went to college immediately after high school. The parent variable takes the value degree if at least one parent of the teenager completed a college degree. Parents Degree Parents No Degree Total Teen College 231 214 445 Teen Not college...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the ISLR package and the Auto data Determine the median value for mpg Use the median to create a new column in the data set named mpglevel, which is 1 if mpg>median and otherwise is 0. Make sure this variable is a factor. We will use mpglevel as the target (response) variable for the algorithms. Use the names() function to verify that your new column...
Use the ERA and Win% data for the 8 randomly selected MLB pitchers below to answer...
Use the ERA and Win% data for the 8 randomly selected MLB pitchers below to answer the following questions. (please include the steps used to answer the questions. Thank you.) ERA 0.035 0.021 0.022 0.025 0.015 0.029 0.015 0.026 Win% 0.545 0.554 0.718 0.610 0.675 0.535 0.665 0.604 Question 1A Compute the IQR for Win%. Question 2B Compute the covariance between ERA and Win%. Question 3C Compute the correlation between ERA and Win%.
The data file contains displacement (in mm)-load (in N) data for a mechanical test that was...
The data file contains displacement (in mm)-load (in N) data for a mechanical test that was conducted on an unknown metal. The initial length and diameter of the specimen are also given. a. (5 pts.) Using the data and a computer program (such as Excel), create an engineering stress-engineering strain graph with proper labels. The stress axis should be in the units of MPa. You do not need to show your spreadsheet or software code used to make the graph....
Using the MLB attendance data, calculate 90% Confidence leveIs for each team mean attendance. What sampling...
Using the MLB attendance data, calculate 90% Confidence leveIs for each team mean attendance. What sampling distribution did you use? Why? Can you conclude that one team or another has larger crowds given these Confidence leveIs? Brewers White Sox             Indians   Twins 45,341    38,088           16,789    48,711 22,603    26,337           18,082    24,439 23,649    24,141           14,887    27,539 41,758    33,278           13,843    26,047 41,282    27,653           14,066    24,552 28,019    23,139           20,484    30,131 22,331    25,390           25,065    15,869 17,386    25,459           25,402    27,783 41,522    26,342           25,721    35,269 41,209    30,193           27,250   ...
Trauma and Metabolic Expenditure: The data set “MetabolicExpenditures”contains metabolic expenditures for eight randomly selected patients admitted...
Trauma and Metabolic Expenditure: The data set “MetabolicExpenditures”contains metabolic expenditures for eight randomly selected patients admitted to a hospital for reasons other than trauma (NonTrauma) and for seven randomly selected patients admitted for multiple fractures (Trauma). (Data from C.L. Long, et al. “Contribution of Skeletal Muscle Protein in Elevated Rates of Whole Body Protein Catabolism in Trauma Patients,” American Journal of Clinical Nutrition, 34: (1981): 1087—1093.) Before collecting the data, the researchers hypothesize that trauma patients have higher metabolic expenditures...
R Programming: Load the {ISLR} and {GGally} libraries. Load and attach the College{ISLR} data set. 1.2...
R Programming: Load the {ISLR} and {GGally} libraries. Load and attach the College{ISLR} data set. 1.2 Inspect the data with the ggpairs(){GGally} function, but do not run the ggpairs plots on all variables because it will take a very long time. Only include these variables in your ggpairs plot: “Outstate”,“S.F.Ratio”,“Private”,“PhD”,“Grad.Rate”. 1.3 Briefly answer: if we are interested in predicting out of state tuition (Outstate), can you tell from the plots if any of the other variables have a curvilinear relationship...
Use the Moneyball data set which contains selected statistics for Major League Baseball teams from 1962–2012....
Use the Moneyball data set which contains selected statistics for Major League Baseball teams from 1962–2012. Based on historical data, the probability that in a given year the NYM will make the playoffs is p = 7/47 = 0.149. Let X be the discrete random variable that gives the total number of Playoffs made by NYM in the last 20 years, i.e., from 1993 to 2012.                                                                                                                     (12 Points) What is the probability that the total...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT