Question

In: Statistics and Probability

Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....

Assignment:

Install and load the ggplot2 package.

load the "diamonds" dataset

RCode:

install.packages("ggplot2")
library(ggplot2)
?diamonds

1. Explore the dataset & state insights

2. Create plots for dataset

3: Provide summary of descriptive stats)

4. Run the regressions, research, Investigate & comment on R^2 & on regression plots - 1 line each.

#===========================================
# DV = Price, IV or IVs = your choice
# Can we create and compare models to predict "Price"?
# Question- Investigate & comment on R^2 & on plots
#Compare regression models & discuss R^2 -any improvement?
# Based on your understanding of regression models, select the best model
#to predict the price of diamonds based on the dataset

#Name your R file as LastNameFirstInitial.R and include your full name in the first line of the script.

diamonds {ggplot2} R Documentation
Prices of 50,000 round cut diamonds

Description

A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows:

Usage

diamonds
Format

A data frame with 53940 rows and 10 variables:

price
price in US dollars (\$326–\$18,823)

carat
weight of the diamond (0.2–5.01)

cut
quality of the cut (Fair, Good, Very Good, Premium, Ideal)

color
diamond colour, from J (worst) to D (best)

clarity
a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))

x
length in mm (0–10.74)

y
width in mm (0–58.9)

z
depth in mm (0–31.8)

depth
total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)

table
width of top of diamond relative to widest point (43–95)

Solutions

Expert Solution

library("ggplot2")
attach(diamonds)
View(diamonds)

ggplot(data=diamonds) + geom_histogram(binwidth=500,
aes(x=diamonds$price)) + ggtitle("Diamond Price Distribution") +
xlab("Diamond Price U$") + ylab("Frequency") + theme_minimal()

Run the regressions, research, Investigate & comment on R^2 & on regression plots - 1 line each.

Model is

r-squared = 0.919 or 92% variation in y is explained by this model.


Related Solutions

Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....
Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1. Explore the dataset & state insights 2. Create plots for dataset 3: Provide summary of descriptive stats) 4. Run the regressions, research, Investigate & comment on R^2 & on regression plots - 1 line each. #=========================================== # DV = Price, IV or IVs = your choice # Can we create and compare models to predict "Price"? # Question- Investigate & comment on R^2 &...
Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....
Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1. Explore the dataset & state insights 2. Create plots for dataset 3: Provide summary of descriptive stats) 4. Run the regressions, research, Investigate & comment on R^2 & on regression plots - 1 line each. #=========================================== # DV = Price, IV or IVs = your choice # Can we create and compare models to predict "Price"? # Question- Investigate & comment on R^2 &...
Install and load the dataset named Carseats (in the ISLR package) into R. Run a multiple...
Install and load the dataset named Carseats (in the ISLR package) into R. Run a multiple linear regression with all the variables. Using the coefficients, write down the model. ( be careful with the qualitative variable ShelveLoc. ) obtain the interaction plot of ShelveLoc and price.
Install and load the dataset named Carseats (in the ISLR package) into R. Create a new...
Install and load the dataset named Carseats (in the ISLR package) into R. Create a new dataframe that is a copy of Carseats. Create two indicator (dummy) variables: Bad_Shelf = 1 if ShelveLoc = “Bad”, 0 otherwise Good_Shelf = 1 if ShelveLoc = “Good”, 0 otherwise Also, create two interaction variables: Price_Bad_Shelf = Price* Bad_Shelf Price_Good_Shelf = Price* Good_Shelf For Questions 1-2, please estimate a linear regression model (using the lm function) with Sales as the dependent variable and Price,...
R code: ## 2. __Basic dplyr exercises__ ## Install the package `fueleconomy` and load the dataset...
R code: ## 2. __Basic dplyr exercises__ ## Install the package `fueleconomy` and load the dataset `vehicles`. Answer the following questions. install.packages("fueleconomy") library(fueleconomy) library(dplyr) library(tidyr) data(vehicles) e. Finally, for the years 1994, 1999, 2004, 2009, and 2014, find the average city mpg of midsize cars for each manufacturer for each year. Use tidyr to transform the resulting output so each manufacturer has one row, and five columns (a column for each year). I have included sample output for the first...
Install the `babynames` package with `install.packages()`. This package includes data from the Social Security Administration about...
Install the `babynames` package with `install.packages()`. This package includes data from the Social Security Administration about American baby names over a wide range of years. Generate a plot of the reported proportion of babies born with the name Angelica over time. Do you notice anything odd about the plotted data? (Hint: you should) If so, describe the issue and generate a new plot that adjusts for this problem. Make sure you show both plots along with all code that was...
1. Load the cpus dataset from the MASS package. Use syct, mmin , mmax , cach...
1. Load the cpus dataset from the MASS package. Use syct, mmin , mmax , cach , chmin, chmax as the predictors (independent variables) to predict performance (perf) Perform the best subset selection in order to choose the best predictors from the above predictors. What is the best model obtained according to Cp, BIC, and adjusted R2? Show some plots to provide evidence for your answer, and report the coefficients of the best model obtained for each criterion. Repeat using...
load the MASS library in R. A. Package ‘MASS’ which provides a description of the datasets...
load the MASS library in R. A. Package ‘MASS’ which provides a description of the datasets available in the MASS package. Then, answer each of the following questions using the appropriate test statistic and following formal steps of hypothesis testing. A:Test of equal or given proportions: Use the “bacteria” data set to answer the question, “did the drug treatment have a significant effect of the presence of the bacteria compared with the placebo?” B: F-test: Use the “cats” data set...
Load the package nycflights13 with library(nycflights13). If you are on running R Studio locally, you must...
Load the package nycflights13 with library(nycflights13). If you are on running R Studio locally, you must install this package before you can use it! # install.packages("nycflights13") library(nycflights13) library(ggplot2) library(dplyr) data(flights) data(airports) data(airlines) Question 2 The dataset `airlines` contains the full name of the carrier (examine it!). Join the dataset with the flights dataset so all of the information in `flights` is retained. Using the merged dataset, which carrier (`name`) has the longest average departure delay? Which has the shortest?
1. The dataset prostate (in R package ”faraway”) is from a study on 97 men with...
1. The dataset prostate (in R package ”faraway”) is from a study on 97 men with prostatecancer who were due to receive a radical prostatectomy.Fit a model withlpsa(y) as the response variable andlcavol(x) as the predictor andanswer the following question: •Calculate and plot the 90%confidenceandpredictionbands. Which type ofintervals are wider?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT