Question

In: Statistics and Probability

R has a number of datasets built in. One such dataset is called mtcars. This data...

R has a number of datasets built in. One such dataset is called mtcars. This data set contains fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models) as reported in a 1974 issue of Motor Trend Magazine.

We do not have to read in these built-in datasets. We can just attach the variables by using the code

        attach(mtcars)

We can just type in mtcars and see the entire dataset. We can see the variable names by using the command

The variables are defined as follows:

mpg Miles/(US) gallon

cyl Number of cylinders

disp Displacement (cu.in.)

hp Gross horsepower

drat Rear

axle ratio

wt Weight (lb/1000)

qsec 1/4 mile time

vs V/S (“V” engine or “Straight line”) (0 or V, 1 for S)

am Transmission (0 = automatic, 1 = manual)
gear Number of forward gears
carb Number of carburetors

We want to model mpg by some or all of the other 10 variables . Do a complete regression analysis. Be sure to comment for each thing you do.

Suppose a prototype for a car was in development. This car has 6 cylinders, 250 cubic in. engine, 130 horsepower, a rear axle ratio of 3.8, weighs 2750 pounds, has a 1/4 mile time of 15.9 seconds, is a V engine type, has automatic transmission, 5 forward gears, and 6 carburetors. With 90% confidence, what is an interval estimate for the predicted mpg for this car?

Solutions

Expert Solution

Task 1.

Below is the R code to calculate the matrix A to store the Cosine similarity between every pair of vehicles.

# Task 1
cos.sim <- function(ix) #Function to calculate the Cosine Similarity
{
X = mtcars[ix[1],]
Y = mtcars[ix[2],]
return( sum(X*Y)/sqrt(sum(X^2)*sum(Y^2)) )
}
n <- nrow(mtcars) #Get the rows count of mtcars
cmb <- expand.grid(i=1:n, j=1:n) #Genrate values of i and j to loop over all the elements
A <- matrix(apply(cmb,1,cos.sim),n,n) #Apply the Cosine Similarity function to all elements

Task 2.

To get the most similar automobile, we will run the below commands. The comments in the codes are provided to explain the code.

#Task 2
B <- matrix(nrow = nrow(A), ncol = 2) #Define an empty matrix
sim <- apply(A, 1, order)[nrow(mtcars)-1,] #Get the second highest value from each row of matrix A
# The second highest value after sorting will be 31st element (nrow(mtcars)-1)
for (i in 1:nrow(mtcars)) {
B[i,1] = row.names(mtcars)[i]
B[i,2] = row.names(mtcars)[sim[i]]
}

I have received the matrix B as

[,1] [,2]
[1,] "Mazda RX4" "Mazda RX4 Wag"
[2,] "Mazda RX4 Wag" "Mazda RX4"
[3,] "Datsun 710" "Toyota Corona"
[4,] "Hornet 4 Drive" "Valiant"
[5,] "Hornet Sportabout" "AMC Javelin"
[6,] "Valiant" "Hornet 4 Drive"
[7,] "Duster 360" "Camaro Z28"
[8,] "Merc 240D" "Hornet 4 Drive"
[9,] "Merc 230" "Mazda RX4 Wag"
[10,] "Merc 280" "Merc 280C"
[11,] "Merc 280C" "Merc 280"
[12,] "Merc 450SE" "Merc 450SL"
[13,] "Merc 450SL" "Merc 450SE"
[14,] "Merc 450SLC" "Merc 450SE"
[15,] "Cadillac Fleetwood" "Pontiac Firebird"
[16,] "Lincoln Continental" "Cadillac Fleetwood"
[17,] "Chrysler Imperial" "AMC Javelin"
[18,] "Fiat 128" "Fiat X1-9"
[19,] "Honda Civic" "Fiat 128"
[20,] "Toyota Corolla" "Fiat 128"
[21,] "Toyota Corona" "Datsun 710"
[22,] "Dodge Challenger" "Hornet Sportabout"
[23,] "AMC Javelin" "Hornet Sportabout"
[24,] "Camaro Z28" "Duster 360"
[25,] "Pontiac Firebird" "Cadillac Fleetwood"
[26,] "Fiat X1-9" "Fiat 128"
[27,] "Porsche 914-2" "Toyota Corona"
[28,] "Lotus Europa" "Ferrari Dino"
[29,] "Ford Pantera L" "Camaro Z28"
[30,] "Ferrari Dino" "Maserati Bora"
[31,] "Maserati Bora" "Ferrari Dino"
[32,] "Volvo 142E" "Datsun 710"


Related Solutions

R has many build-in dataset. The data mtcars is one of them. The following R code...
R has many build-in dataset. The data mtcars is one of them. The following R code read-in data and save the data to input.                   input <- mtcars[,c("am","cyl","hp","wt")]              Write a few line of R code to conduct a regression analysis with am as the response variable, and              cyl, hp, wt as explanation variables.
Instructions tell you how to get the data in R R has built in dataset called...
Instructions tell you how to get the data in R R has built in dataset called Iris. This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. We are interested in estimating the length of Petal (Y) using the length of Sepal (X). First, load the...
Load the USArrests sample dataset from the built-in datasets (data(USArrests)) into R using a dataframe (Note:...
Load the USArrests sample dataset from the built-in datasets (data(USArrests)) into R using a dataframe (Note: Row names are states, not numerical values!). Use the kmeans package to perform a clustering of the data with increasing values of k from 2 to 10 - you will need to decide whether or not to center/scale the observations - justify your choice. Plot the within-cluster sum of squares for each value of k - what is the optimal number of clusters? Use...
warpbreaks is a built-in R dataset which gives This data set gives the number of warp...
warpbreaks is a built-in R dataset which gives This data set gives the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. We are interested in some descriptive statistics related to the warpbreaks dataset. We can access this data directly and convert the time series into a vector by using the assignment x <- warpbreaks$breaks. (In R, use ? warpbreaks for info on this dataset.) The values of x if assigned as above...
Examine classification using logistic regression. In R console, type mtcars. The dataset mtcars is a generic...
Examine classification using logistic regression. In R console, type mtcars. The dataset mtcars is a generic dataset in R. This dataset comprises of fuel consumption and 10 aspects of automobile design and performance for 32 automobiles. Using only the variables am (0 = automatic, 1 = manual) and mpg, your task is to fit a logistic regression model. Complete the following steps using R. Create a scatter plot of am vs. mpg. Describe the relationship and explain why a simple...
using the mtcars data set data(mtcars) USE data in mtcars library in R 5.Use k means...
using the mtcars data set data(mtcars) USE data in mtcars library in R 5.Use k means cluster analysis. 6. Get cluster means. 7. Visualize the clustering result.
For the mtcars dataset (in R), comment on how Rear axle ratio is associated with Displacement...
For the mtcars dataset (in R), comment on how Rear axle ratio is associated with Displacement (cu.in.) and Miles/(US) gallon using the following steps: Draw appropriate scatterplot(s) [Show your code in “R Code” section. Leave “Answer” section blank. Justify your choice of charts in a few sentences in “Comments” section. No screenshots of the charts are required.] Determine correlation coefficient(s) [Show your code in “R Code” section. Show the answer in “Answer” section. Leave “Comments” section blank.] Calculate and plot...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing the level in feet of Lake Huron from 1872- 1972. To assign the values into an ordinary vector,x, we can do the following 'x <- as.vector(LakeHuron)'. From there, we can access the data easily. Assume the values in X are a random sample from a normal population with distribution X. Also assume the X has an unknown mean and unknown standard deviation. With this...
In R: Consider dataset “juul” from library “ISwR”. (juul is a built in data set) Are...
In R: Consider dataset “juul” from library “ISwR”. (juul is a built in data set) Are the means of igf1 equal among tanner groups at 5% level? Please use the six step process to test statistical hypotheses for this research problem. Note: You need to convert tanner from numeric to factor type and ignore all the NAs.
The data set ”airquality” in the R datasets library has data on ozone concentration, wind speed,...
The data set ”airquality” in the R datasets library has data on ozone concentration, wind speed, temperature, and solar radiation by month and day for May through September in New York. Attach airquality to your workspace and then construct side-by-side boxplots of Wind by Month. Month is a numeric variable in the airquality data frame. You can treat it as a factor by using the ”as.factor” function, e.g., > plot(Wind ∼ as.factor(Month)) Next, do an analysis of variance to determine...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT