Question

In: Statistics and Probability

Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....

Assignment:

Install and load the ggplot2 package.

load the "diamonds" dataset

RCode:

install.packages("ggplot2")

library(ggplot2)

?diamonds

1. Explore the dataset & state insights

2. Create plots for dataset

3: Provide summary of descriptive stats)

4. Run the regressions, research, Investigate & comment on R^2 & on regression plots - 1 line each.

#===========================================

# DV = Price, IV or IVs = your choice

# Can we create and compare models to predict "Price"?

# Question- Investigate & comment on R^2 & on plots

#Compare regression models & discuss R^2 -any improvement?

# Based on your understanding of regression models, select the best model

#to predict the price of diamonds based on the dataset

#Name your R file as LastNameFirstInitial.R and include your full name in the first line of the script.

diamonds {ggplot2} R Documentation

Prices of 50,000 round cut diamonds

Description

A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows:

Usage

diamonds

Format

A data frame with 53940 rows and 10 variables:

price

price in US dollars (\$326–\$18,823)

carat

weight of the diamond (0.2–5.01)

cut

quality of the cut (Fair, Good, Very Good, Premium, Ideal)

color

diamond colour, from J (worst) to D (best)

clarity

a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))

x

length in mm (0–10.74)

y

width in mm (0–58.9)

z

depth in mm (0–31.8)

depth

total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)

table

width of top of diamond relative to widest point (43–95)

Expert Solution

library("ggplot2")
attach(diamonds)
View(diamonds)

ggplot(data=diamonds) + geom_histogram(binwidth=500,
aes(x=diamonds$price)) + ggtitle("Diamond Price Distribution") +
xlab("Diamond Price U$") + ylab("Frequency") + theme_minimal()

Run the regressions, research, Investigate & comment on R^2 & on regression plots - 1 line each.

Model is

r-squared = 0.919 or 92% variation in y is explained by this model.

orchestra answered 2 years ago

Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....

Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1. Explore the dataset & state insights 2. Create plots for dataset 3: Provide summary of descriptive stats) 4. Run the regressions, research, Investigate & comment on R^2 & on regression plots - 1 line each. #=========================================== # DV = Price, IV or IVs = your choice # Can we create and compare models to predict "Price"? # Question- Investigate & comment on R^2 &...

Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....

Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1. Explore the dataset & state insights 2. Create plots for dataset 3: Provide summary of descriptive stats) 4. Run the regressions, research, Investigate & comment on R^2 & on regression plots - 1 line each. #=========================================== # DV = Price, IV or IVs = your choice # Can we create and compare models to predict "Price"? # Question- Investigate & comment on R^2 &...

Install and load the dataset named Carseats (in the ISLR package) into R. Run a multiple...

Install and load the dataset named Carseats (in the ISLR package) into R. Run a multiple linear regression with all the variables. Using the coefficients, write down the model. ( be careful with the qualitative variable ShelveLoc. ) obtain the interaction plot of ShelveLoc and price.

Install and load the dataset named Carseats (in the ISLR package) into R. Create a new...

Install and load the dataset named Carseats (in the ISLR package) into R. Create a new dataframe that is a copy of Carseats. Create two indicator (dummy) variables: Bad_Shelf = 1 if ShelveLoc = “Bad”, 0 otherwise Good_Shelf = 1 if ShelveLoc = “Good”, 0 otherwise Also, create two interaction variables: Price_Bad_Shelf = Price* Bad_Shelf Price_Good_Shelf = Price* Good_Shelf For Questions 1-2, please estimate a linear regression model (using the lm function) with Sales as the dependent variable and Price,...

R code: ## 2. Basic dplyr exercises ## Install the package `fueleconomy` and load the dataset...

R code: ## 2. __Basic dplyr exercises__ ## Install the package `fueleconomy` and load the dataset `vehicles`. Answer the following questions. install.packages("fueleconomy") library(fueleconomy) library(dplyr) library(tidyr) data(vehicles) e. Finally, for the years 1994, 1999, 2004, 2009, and 2014, find the average city mpg of midsize cars for each manufacturer for each year. Use tidyr to transform the resulting output so each manufacturer has one row, and five columns (a column for each year). I have included sample output for the first...

Install the `babynames` package with `install.packages()`. This package includes data from the Social Security Administration about...

Install the `babynames` package with `install.packages()`. This package includes data from the Social Security Administration about American baby names over a wide range of years. Generate a plot of the reported proportion of babies born with the name Angelica over time. Do you notice anything odd about the plotted data? (Hint: you should) If so, describe the issue and generate a new plot that adjusts for this problem. Make sure you show both plots along with all code that was...

1. Load the cpus dataset from the MASS package. Use syct, mmin , mmax , cach...

1. Load the cpus dataset from the MASS package. Use syct, mmin , mmax , cach , chmin, chmax as the predictors (independent variables) to predict performance (perf) Perform the best subset selection in order to choose the best predictors from the above predictors. What is the best model obtained according to Cp, BIC, and adjusted R2? Show some plots to provide evidence for your answer, and report the coefficients of the best model obtained for each criterion. Repeat using...

load the MASS library in R. A. Package ‘MASS’ which provides a description of the datasets...

load the MASS library in R. A. Package ‘MASS’ which provides a description of the datasets available in the MASS package. Then, answer each of the following questions using the appropriate test statistic and following formal steps of hypothesis testing. A:Test of equal or given proportions: Use the “bacteria” data set to answer the question, “did the drug treatment have a significant effect of the presence of the bacteria compared with the placebo?” B: F-test: Use the “cats” data set...

Load the package nycflights13 with library(nycflights13). If you are on running R Studio locally, you must...

Load the package nycflights13 with library(nycflights13). If you are on running R Studio locally, you must install this package before you can use it! # install.packages("nycflights13") library(nycflights13) library(ggplot2) library(dplyr) data(flights) data(airports) data(airlines) Question 2 The dataset `airlines` contains the full name of the carrier (examine it!). Join the dataset with the flights dataset so all of the information in `flights` is retained. Using the merged dataset, which carrier (`name`) has the longest average departure delay? Which has the shortest?

Use the Galton dataset from the mosaicData package in R STUDIO library(mosaic) Create a scatter plot...

Use the Galton dataset from the mosaicData package in R STUDIO library(mosaic) Create a scatter plot to show the relationship between height against father’s height (x=father, y=height) What relationship did you see? (Use comments to write in your R Markdown file) Separate your plot into facets by sex Add a regression line using the “lm” method to both of your facets Generate a box plot of height by sex. Use the RailTrail data from the mosaicData package library(mosaic) Generate a...

Question

Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....

Solutions

Expert Solution

Related Solutions

Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....

Assignment: Install and load the ggplot2 package. load the "diamonds" dataset RCode: install.packages("ggplot2") library(ggplot2) ?diamonds 1....

Install and load the dataset named Carseats (in the ISLR package) into R. Run a multiple...

Install and load the dataset named Carseats (in the ISLR package) into R. Create a new...

R code: ## 2. Basic dplyr exercises ## Install the package `fueleconomy` and load the dataset...

Install the `babynames` package with `install.packages()`. This package includes data from the Social Security Administration about...

1. Load the cpus dataset from the MASS package. Use syct, mmin , mmax , cach...

load the MASS library in R. A. Package ‘MASS’ which provides a description of the datasets...

Load the package nycflights13 with library(nycflights13). If you are on running R Studio locally, you must...

Use the Galton dataset from the mosaicData package in R STUDIO library(mosaic) Create a scatter plot...