In: Statistics and Probability
Write code in R for this questions,, will vote!!
Load the Taxi.txt data set into R.
(a) Calculate the mean, median, standard deviation, 30th percentile, and 65th percentile for Mileage and TripTime.
(b) Make a frequency table for PaymentProvider that includes a Sum column. Report the resulting table.
(c) Make a contingency table comparing PaymentType and Airport. Report the resulting table.
(d) Use the cor() function to find the correlation between each pair of the Meter, Tip, Mileage, and TripTime variables. Report these 6 values. Output alone here is not sufficient.
This is the txt file contents
"ID" "Provider" "Meter"
"Tip" "Surcharge" "Extras"
"Tolls" "PaymentType"
"PaymentProvider" "PickUpZip"
"DropOffZip" "Mileage"
"TripTime" "Airport"
24163148 "Transco, Inc." 8.65
0 0.25 0 0
"Cash" "Cash" 20001
20002 1.57 16 "N"
24527857 "DCVIP Cab" 9.73
3 0.25 1 0
"CreditCard" "VisaCredit" 20004
20003 2.2 67 "N"
24270554 "DCVIP Cab" 6.22
1.62 0.25 0 0
"CreditCard" "VisaCredit" 20037
20007 0.6 311 "N"
24262083 "Verifone" 5.95
0 0.25 1 0
"Cash" "Cash" 20006
20001 0.9 261 "N"
24333678 "Hitch" 19.99
0 0.25 0 0
"Cash" "Cash" 20008
22202 7 77 "N"
a. R codes:
> d=read.table('1.txt',header=T,sep='')
> d
ID. Provider Meter Tip Surcharge Extras Tolls PaymentType
1 24163148 Transco, Inc. 8.65 0.00 0.25 0 0 Cash
2 24527857 DCVIP Cab 9.73 3.00 0.25 1 0 CreditCard
3 24270554 DCVIP Cab 6.22 1.62 0.25 0 0 CreditCard
4 24262083 Verifone 5.95 0.00 0.25 1 0 Cash
5 24333678 Hitch 19.99 0.00 0.25 0 0 Cash
PaymentProvider PickUpZip DropOffZip Mileage TripTime Airport
1 Cash 20001 20002 1.57 16 N
2 VisaCredit 20004 20003 2.20 67 N
3 VisaCredit 20037 20007 0.60 311 N
4 Cash 20006 20001 0.90 261 N
5 Cash 20008 22202 7.00 77 N
> attach(d)
The following objects are masked from d (pos = 3):
Airport, DropOffZip, Extras, ID., Meter, Mileage,
PaymentProvider,
PaymentType, PickUpZip, Provider, Surcharge, Tip, Tolls,
TripTime
The following objects are masked from d (pos = 4):
Airport, DropOffZip, Extras, ID., Meter, Mileage,
PaymentProvider,
PaymentType, PickUpZip, Provider, Surcharge, Tip, Tolls,
TripTime
The following objects are masked from d (pos = 5):
Airport, DropOffZip, Extras, ID., Meter, Mileage,
PaymentProvider,
PaymentType, PickUpZip, Provider, Surcharge, Tip, Tolls,
TripTime
> summary(Mileage)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.600 0.900 1.570 2.454 2.200 7.000
> sd(Mileage)
[1] 2.615546
> quantile(Mileage,c(0.3,0.65))
30% 65%
1.034 1.948
> summary(TripTime)
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.0 67.0 77.0 146.4 261.0 311.0
> sd(TripTime)
[1] 130.7203
> quantile(TripTime,c(0.3,0.65))
30% 65%
69.0 187.4
variables | Mean | Median | Stand Deviation | 30 th percentile | 65 th percentile |
Mileage | 2.454 | 1.570 | 2.615546 | 1.034 | 1.948 |
Trip time | 146.4 | 77.0 | 130.7203 | 69.0 | 187.4 |
b. Frequency table
> t=table(PaymentProvider)
> addmargins(t,margin=1)
PaymentProvider
Cash VisaCredit Sum
3 2 5
c. Contingency table:
> c=table(PaymentType, Airport)
> c
Airport
PaymentType N
Cash 3
CreditCard 2
> addmargins(c) # table with sum
Airport
PaymentType N Sum
Cash 3 3
CreditCard 2 2
Sum 5 5
d.
> d1=data.matrix(d)
> dim(d1)
[1] 5 14
> d2=[,c(3,4,12,13)]
Error: unexpected '[' in "d2=["
> d2=d1[,c(3,4,12,13)]
> d2
Meter Tip Mileage TripTime
[1,] 8.65 0.00 1.57 16
[2,] 9.73 3.00 2.20 67
[3,] 6.22 1.62 0.60 311
[4,] 5.95 0.00 0.90 261
[5,] 19.99 0.00 7.00 77
> cor(d2)
Meter Tip Mileage TripTime
Meter 1.0000000 -0.23823509 0.9967267 -0.52607478
Tip -0.2382351 1.00000000 -0.2654125 0.04012666
Mileage 0.9967267 -0.26541247 1.0000000 -0.48500904
TripTime -0.5260748 0.04012666 -0.4850090 1.00000000