Question

In: Computer Science

( In R / R studio ) im not sure how to share my data set,...

( In R / R studio )

im not sure how to share my data set, but below is the title of my data set and the 12 columns of my data set.

Please answer as best you can wheather its pseudo code, partial answers, or just a suggestion on how i can in to answer the question. thanks

#----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The dataset incovid_sd_20201001.RDatacontains several variables related to infections of covid-19 for eachzip code in San Diego County as of October 1, 2020. colnames(covid):

## [1] "zipcode" "X"

## [3] "Y" "population"

## [5] "median_age" "median_income"

## [7] "income_ineq" "case_count_proportion"

## [9] "children_proportion" "uninsured_employed_proportion"

## [11] "uninsured_unemployed_proportion"    "uninsured_notworking_proportion"

#----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(1)[5 pts] Use the hist()function in R to make 3 histograms of the proportion of the total population diagnosed with covid-19in each zip code: one for each of Sturges, Scott, and Freedman-Diaconis rules for the number of break points. How many bins does each respective rule use? Which do you think provides the best visual representation of the distribution of the data? Why?

Solutions

Expert Solution

## kde2d() from the package MASS is used for two-dimensional kernel density estimation with an axis-aligned bivariate normal kernel, evaluated on a square grid.

library(MASS)
kde2d(x, y, h, n, lims = c(range(x), range(y)))
where,

x = x coordinate of dataset

y = y coordinate of dataset

n = Number of grid points in each direction

lims = The limits of the rectangle covered by the grid

h = vector bandwidths for x and y directions.

## For h the bandwidth for density() via Normal Reference Distribution is used by using bandwidth.nrd(). This is a thumb rule for choosing the bandwidth of a Gaussian kernel density estimator. We can also use width.SJ() for the method of Sheather & Jones (1991) to select the bandwidth of a Gaussian kernel density estimator.

## For a two-dimensional Kernel Density Estimation of the variables- median_age and median_income we can use:

h = c(width.SJ(dataset$median_age), width.SJ(dataset$median_income)
To apply 2D KDE we can write:

library(MASS)
n=50 #Let's assume n=50#
h = c(width.SJ(dataset$median_age), width.SJ(dataset$median_income)
kde2d(dataset$median_age, dataset$median_income, h, n, lims = c(range(dataset$median_age), range(dataset$median_income)))

NOTE-
Please comment down below for any feedback/doubt.
&
Please give a big thumbs up (Upvote).
Peace !


Related Solutions

I get errors in my merge method and Im not sure how to fix it to...
I get errors in my merge method and Im not sure how to fix it to make the code work public class OrderedApp1 { public static void main(String[] args) {    int maxSize = 100; // array size int searchKey = 55; OrdArray1 arr, a1, a2, a3; // reference to arrays arr = new OrdArray1(maxSize); // create the arrays a1 = new OrdArray1(maxSize); a2 = new OrdArray1(maxSize); a3 = new OrdArray1(maxSize);    //int a3[] =new int [ a1.length + a2.length];...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the ISLR package and the Auto data Determine the median value for mpg Use the median to create a new column in the data set named mpglevel, which is 1 if mpg>median and otherwise is 0. Make sure this variable is a factor. We will use mpglevel as the target (response) variable for the algorithms. Use the names() function to verify that your new column...
I have a program to code for my computer science class, and Im not sure how...
I have a program to code for my computer science class, and Im not sure how to do it. If someone can explain it step by step I would appreciate it. Problem Description: Define the GeometricObject2D class that contains the properties color and filled and their appropriate getter and setter methods. This class also contains the dateCreated property and the getDateCreated() and toString() methods. The toString() method returns a string representation of the object. Define the Rectangle2D class that extends...
Use R studio to do this problem. This problem uses the wblake data set in the...
Use R studio to do this problem. This problem uses the wblake data set in the alr4 package. This data set includes samples of small mouth bass collected in West Bearskin Lake, Minnesota, in 1991. Interest is in predicting length with age. Finish this problem without using Im() (a) Compute the regression of length on age, and report the estimates, their standard errors, the value of the coefficient of determination, and the estimate of variance. Write a sentence or two...
Using R studio 1. Read the iris data set into a data frame. 2. Print the...
Using R studio 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint: You could...
R-Studio; Statistics The data set in the table considers information on the spread of prostate cancer...
R-Studio; Statistics The data set in the table considers information on the spread of prostate cancer to the lymph nodes for 53 patients. For a sample of prostate cancer patients, a set of possible predictor variables were measured before surgery to determine if the lymph nodes were compromised. Subsequently, the patient underwent surgery and the status of his lymph nodes was determined. The data set contains 53 observations of 7 variables: id: identifiers for each subject in the study. ssln:...
Im not sure if im asking it correctly but 1.how does emerging makerts affect increase standard...
Im not sure if im asking it correctly but 1.how does emerging makerts affect increase standard of living in Turkey. 2. how does emerging markets in turkey affect increased incomes? 3. how does the increase or decrease in gdp lead to economic growth in turkey in an emerging market.
Using R Studio Now, set the seed to 348 with `set.seed()`. Then take a sample of...
Using R Studio Now, set the seed to 348 with `set.seed()`. Then take a sample of size 10,000 from a normal distribution with a mean of 82 and a standard deviation of 11. (a) Using sum() on a logical vector, how many draws are less than 60? Using mean() on a logical vector, what proportion of the total draws is that? How far is your answer from pnorm() in 1.1 above? ```{R} set.seed(348) x=rnorm(10000,82,11) sum(ifelse(x<60,1,0)) mean(ifelse(x<60,1,0)) pnorm(60,82,11) Using sum() function...
what steps can I take if my data is not a normal distributed? Not sure how...
what steps can I take if my data is not a normal distributed? Not sure how to ask. If the my data is not a normal distribution what steps can I take to make it a normal distribution.
Is my proof that empty set is open and R is open correct?
Is that empty set is open and R is open correct? Give details Explaination.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT