In: Statistics and Probability
R has a number of datasets built in. One such dataset is called mtcars. This data set contains fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models) as reported in a 1974 issue of Motor Trend Magazine.
We do not have to read in these built-in datasets. We can just attach the variables by using the code
attach(mtcars)
We can just type in mtcars and see the entire dataset. We can see the variable names by using the command
The variables are defined as follows:
mpg Miles/(US) gallon
cyl Number of cylinders
disp Displacement (cu.in.)
hp Gross horsepower
drat Rear
axle ratio
wt Weight (lb/1000)
qsec 1/4 mile time
vs V/S (“V” engine or “Straight line”) (0 or V, 1 for S)
am Transmission (0 = automatic, 1 = manual)
gear Number of forward gears
carb Number of carburetors
We want to model mpg by some or all of the other 10 variables . Do a complete regression analysis. Be sure to comment for each thing you do.
Suppose a prototype for a car was in development. This car has 6 cylinders, 250 cubic in. engine, 130 horsepower, a rear axle ratio of 3.8, weighs 2750 pounds, has a 1/4 mile time of 15.9 seconds, is a V engine type, has automatic transmission, 5 forward gears, and 6 carburetors. With 90% confidence, what is an interval estimate for the predicted mpg for this car?
Task 1.
Below is the R code to calculate the matrix A to store the Cosine similarity between every pair of vehicles.
# Task 1
cos.sim <- function(ix) #Function to calculate the Cosine
Similarity
{
X = mtcars[ix[1],]
Y = mtcars[ix[2],]
return( sum(X*Y)/sqrt(sum(X^2)*sum(Y^2)) )
}
n <- nrow(mtcars) #Get the rows count of mtcars
cmb <- expand.grid(i=1:n, j=1:n) #Genrate values of i and j to
loop over all the elements
A <- matrix(apply(cmb,1,cos.sim),n,n) #Apply the Cosine
Similarity function to all elements
Task 2.
To get the most similar automobile, we will run the below commands. The comments in the codes are provided to explain the code.
#Task 2
B <- matrix(nrow = nrow(A), ncol = 2) #Define an empty
matrix
sim <- apply(A, 1, order)[nrow(mtcars)-1,] #Get the second
highest value from each row of matrix A
# The second highest value after sorting will be 31st element
(nrow(mtcars)-1)
for (i in 1:nrow(mtcars)) {
B[i,1] = row.names(mtcars)[i]
B[i,2] = row.names(mtcars)[sim[i]]
}
I have received the matrix B as
[,1] [,2]
[1,] "Mazda RX4" "Mazda RX4 Wag"
[2,] "Mazda RX4 Wag" "Mazda RX4"
[3,] "Datsun 710" "Toyota Corona"
[4,] "Hornet 4 Drive" "Valiant"
[5,] "Hornet Sportabout" "AMC Javelin"
[6,] "Valiant" "Hornet 4 Drive"
[7,] "Duster 360" "Camaro Z28"
[8,] "Merc 240D" "Hornet 4 Drive"
[9,] "Merc 230" "Mazda RX4 Wag"
[10,] "Merc 280" "Merc 280C"
[11,] "Merc 280C" "Merc 280"
[12,] "Merc 450SE" "Merc 450SL"
[13,] "Merc 450SL" "Merc 450SE"
[14,] "Merc 450SLC" "Merc 450SE"
[15,] "Cadillac Fleetwood" "Pontiac Firebird"
[16,] "Lincoln Continental" "Cadillac Fleetwood"
[17,] "Chrysler Imperial" "AMC Javelin"
[18,] "Fiat 128" "Fiat X1-9"
[19,] "Honda Civic" "Fiat 128"
[20,] "Toyota Corolla" "Fiat 128"
[21,] "Toyota Corona" "Datsun 710"
[22,] "Dodge Challenger" "Hornet Sportabout"
[23,] "AMC Javelin" "Hornet Sportabout"
[24,] "Camaro Z28" "Duster 360"
[25,] "Pontiac Firebird" "Cadillac Fleetwood"
[26,] "Fiat X1-9" "Fiat 128"
[27,] "Porsche 914-2" "Toyota Corona"
[28,] "Lotus Europa" "Ferrari Dino"
[29,] "Ford Pantera L" "Camaro Z28"
[30,] "Ferrari Dino" "Maserati Bora"
[31,] "Maserati Bora" "Ferrari Dino"
[32,] "Volvo 142E" "Datsun 710"