In: Computer Science
( In R / R studio )
im not sure how to share my data set, but below is the title of my data set and the 12 columns of my data set.
Please answer as best you can wheather its pseudo code, partial answers, or just a suggestion on how i can in to answer the question. thanks
#----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The dataset incovid_sd_20201001.RDatacontains several variables related to infections of covid-19 for eachzip code in San Diego County as of October 1, 2020. colnames(covid):
## [1] "zipcode" "X"
## [3] "Y" "population"
## [5] "median_age" "median_income"
## [7] "income_ineq" "case_count_proportion"
## [9] "children_proportion" "uninsured_employed_proportion"
## [11] "uninsured_unemployed_proportion" "uninsured_notworking_proportion"
#----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
(1)[5 pts] Use the hist()function in R to make 3 histograms of the proportion of the total population diagnosed with covid-19in each zip code: one for each of Sturges, Scott, and Freedman-Diaconis rules for the number of break points. How many bins does each respective rule use? Which do you think provides the best visual representation of the distribution of the data? Why?
## kde2d() from the package MASS is used for two-dimensional kernel density estimation with an axis-aligned bivariate normal kernel, evaluated on a square grid.
library(MASS)
kde2d(x, y, h, n, lims = c(range(x), range(y)))
where,
x = x coordinate of dataset
y = y coordinate of dataset
n = Number of grid points in each direction
lims = The limits of the rectangle covered by the grid
h = vector bandwidths for x and y directions.
## For h the bandwidth for density() via Normal Reference Distribution is used by using bandwidth.nrd(). This is a thumb rule for choosing the bandwidth of a Gaussian kernel density estimator. We can also use width.SJ() for the method of Sheather & Jones (1991) to select the bandwidth of a Gaussian kernel density estimator.
## For a two-dimensional Kernel Density Estimation of the variables- median_age and median_income we can use:
h = c(width.SJ(dataset$median_age),
width.SJ(dataset$median_income)
To apply 2D KDE we can write:
library(MASS)
n=50 #Let's assume n=50#
h = c(width.SJ(dataset$median_age),
width.SJ(dataset$median_income)
kde2d(dataset$median_age, dataset$median_income, h, n, lims =
c(range(dataset$median_age), range(dataset$median_income)))
NOTE-
Please comment down below for any feedback/doubt.
&
Please give a big thumbs up (Upvote).
Peace !