Question

In: Statistics and Probability

Write code in R for this questions,, will vote!! Load the Taxi.txt data set into R....

Write code in R for this questions,, will vote!!

Load the Taxi.txt data set into R.

(a) Calculate the mean, median, standard deviation, 30th percentile, and 65th percentile for Mileage and TripTime.

(b) Make a frequency table for PaymentProvider that includes a Sum column. Report the resulting table.

(c) Make a contingency table comparing PaymentType and Airport. Report the resulting table.

(d) Use the cor() function to find the correlation between each pair of the Meter, Tip, Mileage, and TripTime variables. Report these 6 values. Output alone here is not sufficient.

This is the txt file contents

"ID"   "Provider"   "Meter"   "Tip"   "Surcharge"   "Extras"   "Tolls"   "PaymentType"   "PaymentProvider"   "PickUpZip"   "DropOffZip"   "Mileage"   "TripTime"   "Airport"
24163148   "Transco, Inc."   8.65   0   0.25   0   0   "Cash"   "Cash"   20001   20002   1.57   16   "N"
24527857   "DCVIP Cab"   9.73   3   0.25   1   0   "CreditCard"   "VisaCredit"   20004   20003   2.2   67   "N"
24270554   "DCVIP Cab"   6.22   1.62   0.25   0   0   "CreditCard"   "VisaCredit"   20037   20007   0.6   311   "N"
24262083   "Verifone"   5.95   0   0.25   1   0   "Cash"   "Cash"   20006   20001   0.9   261   "N"
24333678   "Hitch"   19.99   0   0.25   0   0   "Cash"   "Cash"   20008   22202   7   77   "N"

Solutions

Expert Solution

a. R codes:

> d=read.table('1.txt',header=T,sep='')
> d
ID. Provider Meter Tip Surcharge Extras Tolls PaymentType
1 24163148 Transco, Inc. 8.65 0.00 0.25 0 0 Cash
2 24527857 DCVIP Cab 9.73 3.00 0.25 1 0 CreditCard
3 24270554 DCVIP Cab 6.22 1.62 0.25 0 0 CreditCard
4 24262083 Verifone 5.95 0.00 0.25 1 0 Cash
5 24333678 Hitch 19.99 0.00 0.25 0 0 Cash
PaymentProvider PickUpZip DropOffZip Mileage TripTime Airport
1 Cash 20001 20002 1.57 16 N
2 VisaCredit 20004 20003 2.20 67 N
3 VisaCredit 20037 20007 0.60 311 N
4 Cash 20006 20001 0.90 261 N
5 Cash 20008 22202 7.00 77 N
> attach(d)
The following objects are masked from d (pos = 3):

Airport, DropOffZip, Extras, ID., Meter, Mileage, PaymentProvider,
PaymentType, PickUpZip, Provider, Surcharge, Tip, Tolls, TripTime

The following objects are masked from d (pos = 4):

Airport, DropOffZip, Extras, ID., Meter, Mileage, PaymentProvider,
PaymentType, PickUpZip, Provider, Surcharge, Tip, Tolls, TripTime

The following objects are masked from d (pos = 5):

Airport, DropOffZip, Extras, ID., Meter, Mileage, PaymentProvider,
PaymentType, PickUpZip, Provider, Surcharge, Tip, Tolls, TripTime

> summary(Mileage)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.600 0.900 1.570 2.454 2.200 7.000
> sd(Mileage)
[1] 2.615546
> quantile(Mileage,c(0.3,0.65))
30% 65%
1.034 1.948

> summary(TripTime)
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.0 67.0 77.0 146.4 261.0 311.0
> sd(TripTime)
[1] 130.7203
> quantile(TripTime,c(0.3,0.65))
30% 65%
69.0 187.4

variables Mean Median Stand Deviation 30 th percentile 65 th percentile
Mileage 2.454 1.570 2.615546 1.034 1.948
Trip time 146.4 77.0 130.7203 69.0 187.4

b. Frequency table

> t=table(PaymentProvider)
> addmargins(t,margin=1)
PaymentProvider
Cash VisaCredit Sum
3 2 5

c. Contingency table:

> c=table(PaymentType, Airport)
> c
Airport
PaymentType N
Cash 3
CreditCard 2


> addmargins(c) # table with sum
Airport
PaymentType N Sum
Cash 3 3
CreditCard 2 2
Sum 5 5

d.

> d1=data.matrix(d)
> dim(d1)
[1] 5 14
> d2=[,c(3,4,12,13)]
Error: unexpected '[' in "d2=["
> d2=d1[,c(3,4,12,13)]
> d2
Meter Tip Mileage TripTime
[1,] 8.65 0.00 1.57 16
[2,] 9.73 3.00 2.20 67
[3,] 6.22 1.62 0.60 311
[4,] 5.95 0.00 0.90 261
[5,] 19.99 0.00 7.00 77
> cor(d2)
Meter Tip Mileage TripTime
Meter 1.0000000 -0.23823509 0.9967267 -0.52607478
Tip -0.2382351 1.00000000 -0.2654125 0.04012666
Mileage 0.9967267 -0.26541247 1.0000000 -0.48500904
TripTime -0.5260748 0.04012666 -0.4850090 1.00000000


Related Solutions

Load “Lock5Data” into your R console. Load “OlympicMarathon” data set in “Lock5Data”. This data set contains...
Load “Lock5Data” into your R console. Load “OlympicMarathon” data set in “Lock5Data”. This data set contains population of all times to finish the 2008 Olympic Men’s Marathon. a) What is the population size? b) Now using “Minutes” column generate a random sample of size 5. c) Calculate the sample mean and record it (create a excel sheet or write a direct R program to record this) d) Continue steps (b) and (c) 10,000 time (that mean you have recorded 10,000...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the ISLR package and the Auto data Determine the median value for mpg Use the median to create a new column in the data set named mpglevel, which is 1 if mpg>median and otherwise is 0. Make sure this variable is a factor. We will use mpglevel as the target (response) variable for the algorithms. Use the names() function to verify that your new column...
R Programming: Load the {ISLR} and {GGally} libraries. Load and attach the College{ISLR} data set. 1.2...
R Programming: Load the {ISLR} and {GGally} libraries. Load and attach the College{ISLR} data set. 1.2 Inspect the data with the ggpairs(){GGally} function, but do not run the ggpairs plots on all variables because it will take a very long time. Only include these variables in your ggpairs plot: “Outstate”,“S.F.Ratio”,“Private”,“PhD”,“Grad.Rate”. 1.3 Briefly answer: if we are interested in predicting out of state tuition (Outstate), can you tell from the plots if any of the other variables have a curvilinear relationship...
Write R code: Here are the first six observations from the prostate data set found in...
Write R code: Here are the first six observations from the prostate data set found in the faraway library. Use help(prostate) to describe the dataset and the variables in the data sets. obs lcavol lweight age lbph svi lcp gleason pgg45 lpsa 1 -0.579819 2.7695 50 -1.38629 0 -1.38629 6 0 -0.43078 2 -0.994252 3.3196 58 -1.38629 0 -1.38629 6 0 -0.16252 3 -0.510826 2.6912 74 -1.38629 0 -1.38629 7 20 -0.16252 4 -1.203973 3.2828 58 -1.38629 0 -1.38629 6...
Answer the following bootstrap question by showing the R code : A set of data X...
Answer the following bootstrap question by showing the R code : A set of data X contains the following numbers: 119.7 104.1 92.8 85.4 108.6 93.4 67.1 88.4 101.0 97.2 95.4 77.2 100.0 114.2 150.3 102.3 105.8 107.5 0.9 94.1 We generated n = 20 observations Xi = 10 Wi+100, where Wi has a contaminated normal distribution with proportion of contamination 20% and σc = 4. Suppose we are interested in testing: H0 : μ = 90 versus H1 :...
Use R statictical software. Load the ISLR package to get the Auto data set. Fit below...
Use R statictical software. Load the ISLR package to get the Auto data set. Fit below non-linear models to the Auto data set. We will treat horsepower as the predictor and mpg as the response. • Fit the cubic spline with 3 knots (25th percentile, 50th percentile, and 75th percentile of horsepower) • Fit the natural spline with 3 knots (25th percentile, 50th percentile, and 75th percentile of horsepower) • Fit the smoothing spline by choosing optimal lambda with cross-validation....
R studio questions Write up your answers and paste the R code Copy and paste all...
R studio questions Write up your answers and paste the R code Copy and paste all plots generated. First create a sample drawn from a normal random variable. R has many distributions for which you can get probabilities and draw random numbers. We are going to use the normal. Go to help in R and type in rnorm. You will see a write up for functions associated with the normal distribution. dnorm is the density; pnorm is the probability distribution...
Please do these questions in the R language 1. Load the cars dataset into R. It...
Please do these questions in the R language 1. Load the cars dataset into R. It is a built-in dataset. 2. Do an str() to determine the number of observations and variables. Enter your answer as a comment. 3. Plot speed on x axis and distance on y axis. 4. Find the correlation between speed and distance. What does the magnitude and sign indicate? Enter your answer as a comment. 5. Build a linear regression model with speed as the...
Given the data set (treatments 1 to 4) with respective outcome, what is the R code...
Given the data set (treatments 1 to 4) with respective outcome, what is the R code I can use to Find a 95 percent confidence interval on the mean strength of the 4 techniques. Also for finding a 95 percent confidence interval on the difference in means. (i.e 1 vs 3 , 2 vs 4 etc) strength group 3129 1 3000 1 2865 1 2890 1 3200 2 3300 2 2975 2 3150 2 2800 3 2900 3 2985 3...
write this java code: One statistical operation that is sometimes performed on a set of data...
write this java code: One statistical operation that is sometimes performed on a set of data values is to remove values that are far from the average. Write a program that reads real values from a text file, one per line. Store the data values as Double objects in an instance of the class java.util.ArrayList. Then Use an iterator to compute the average and standard deviation of the values. Display these results. Use a second iterator to remove any value...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT