In: Statistics and Probability
Enter your answers in the empty code chunks.
Don't change anything in the chunk below, and make sure you run it before attempting any of the problems:
```{r message=FALSE, warning=TRUE}
library(tidyverse)
library(ggpubr)
set.seed(2018) # Belgium win 3rd place in the World Cup
```
# Basics
Calculate $\frac{(2+2)\times (3^2 + 5)}{(6/4)}$:
```{r basics1}
# your code here
```
Create a vector called "x" with the following values: 10, 15, 18, 20.
```{r basics2}
# your code here
```
Calculate the mean (`mean()`), median (`median()`) and standard
deviation (`sd()`) of `x`.
```{r basics3}
# your code here
```
Draw two numbers from a standard normal distribution (`rnorm()`):
```{r basics4}
# your code here
```
Draw 100 numbers from a normal distribution with a mean of **7** and standard deviation of **4**, and store the output in an object called "x1":
```{r basics5}
# your code here
```
Draw 100 numbers from a normal distribution with a mean of **5** and standard deviation of **2**, and store the output in an object called "x2":
```{r basics6}
# your code here
```
Use `data.frame()` to combine `x1` and `x2` into a data frame called "df":
```{r basics7}
# your code here
```
Use the `$` indexing method to calculate:
* the mean of the first column of `df`:
```{r basics8}
# your code here
```
* the standard deviation of the second column of `df`:
```{r basics9}
# your code here
```
Use `head()` to print the first **3** rows of `df`:
```{r basics10}
# your code here
```
# `dplyr`
Let's work with the data set `diamonds`:
```{r data}
data(diamonds)
head(diamonds)
```
Calculate the average price of a diamond:
```{r dplyr1}
# your code here
```
Use `group_by()` to group diamonds by **color**, then use `summarise()` to calculate the average price *and* the standard deviation in price **by color**:
```{r dplyr2}
# your code here
```
Use `group_by()` to group diamonds by **cut**, then use `summarise()` to count the number of observations **by cut**:
```{r dplyr3}
# your code here
```
Use `filter()` to remove observations with a depth greater than 62, then use`group_by()` to group diamonds by **clarity**, then use `summarise()` to find the maximum price of a diamond **by clarity**:
```{r dplyr4}
# your code here
```
Use `mutate()` and `log()` to add a new variable to the data called "log_price":
```{r dplyr5}
# your code here
```
# `ggplot2`
Continue using `diamonds`.
Use `geom_histogram()` to plot a histogram of prices:
```{r ggplot1}
# your code here
```
Use `geom_density()` to plot the density of *log prices* (the variable you added to the data frame):
```{r ggplot2}
# your code here
```
Use `geom_point()` to plot carats against log prices (i.e. carats on the x-axis, log prices on the y-axis):
```{r ggplot3}
# your code here
```
Use `stat_summary()` to make a bar plot of **average** cut:
Same as above but change the theme to `theme_classic()`:
```{r ggplot4}
# your code here
```
Finally,
* create a bar plot for **average** color and assign it to the
object "plot_color";
* create a scatter plot for depth against log pricse (depth on
x-axis, log prices on y-axis) and assign it to the object
"plot_depth";
* use `ggarrange` from `ggpubr` to combine `plot_color` and
`plot_depth` into a single plot with automatic labels.
```{r ggplot5}
```
# Inference
Use `t.test()` to test the following hypothesis on *log price*:
$$
H_0: \mu = 8 \\
H_A: \mu \neq 8
$$
Note : Allowed to solve only 4 sub-questions in one post.
But the post has more than 20 questions. I have solved the first 15 codes questions in the time provided. Unable to solve the remain as I am running out time. (Note: Limited time provided to solve the question)
Request you to repost the remaining question to get them answered.