Question

In: Statistics and Probability

To find the dataset needed for this problem, you’ll first need to open the “swiss” dataset...

To find the dataset needed for this problem, you’ll first need to open the “swiss” dataset that is contained in R by running the following line:

> data('swiss')

Now you can rename the “swiss” dataset and use it to answer the question below. Name the data frame with your UT EID:

                       

> my_variable <- swiss

This dataset contains socio-economic indicators for the French-speaking provinces of Switzerland in the year 1888. Among the variables, “Agriculture” is the percentage of the workforce that were farmers and “Education” is the percentage of the population that were formally educated.

  1. Create an appropriate display for the distribution of “Education” and describe it with the appropriate statistics.

  1. You want to see if there was a difference in education in provinces that were majority farmers. Use the code below to add a new variable to the dataset called “Farmers” that defines if more than half the workforce in the province were farmers. You’ll need to include a logical statement within each set of brackets below to correctly define the new variable:

            > my_variable$Farmers[ ] <- 'More than half'

> my_variable$Farmers[ ] <- 'Less than half'

  1. How many provinces had a majority of farmers in the workforce? How many did not?

  1. Create a graph to compare the percentage of the population that were formally educated for provinces that were majority farmers versus those that were not. Using appropriate statistical language, compare the distributions based on your graph.

Solutions

Expert Solution

Note: Please replace where ever ut123 appears (multiple places) in the following code with your actual UT EID (This should start with a character and should have no spaces)

R code with comments (all statements starting with # are comments)

a) We will use a box plot as well as histogram to describe the distribution of education

#get the dataset swiss
data('swiss')
#assign it to you UT EID. Please replace ut123 with your actual UT EID
ut1234 <- swiss

par(mfrow=c(2,1))
#a) boxplot
boxplot(ut1234$Education,main="Distribution of Education",xlab="Education (% formally educated)",horizontal=TRUE)
#print the 5 number summary
summary(ut1234$Education)
#plot a histogram
hist(ut1234$Education,main="Distribution of Education",xlab="Education (% formally educated)")
#find the mean
xbar<-mean(ut1234$Education)
#get the sample standard deviation
s<-sd(ut1234$Education)
sprintf('Mean value of %%formally educated is %.2f%%',xbar)
sprintf('sample standard deviation of %%formally educated is %.2f%%',s)

#get this plot

And get this output

It can be seen from the box plot and the histogram that the distribution of educated is skewed towards right (longer right tail). The median is 8% indicating that 50% of the provinces have less than 8% of the population that were formally educated. The mean is higher than the median indicating the skewness. The box plot also indicates the presence of the outliers (more than 1.5 times IQR from Q3)

b &c) Create the variables and get the counts

R code:

#part b)
ut1234$Farmers[ut1234$Agriculture>=50] <- 'More than half'
ut1234$Farmers[ut1234$Agriculture<50 ] <- 'Less than half'
#part c) get the counts
table(ut1234$Farmers)

#get this

26 provinces had a majority of farmers in the workforce and 21 provinces did not have majority of farmers in the workforce.

d) R code

#part d) compare percentages
par(mfrow=c(1,1))
boxplot(ut1234$Education~ut1234$Farmers,main="Distribution of Education",ylab="Proportion of farmers in the workforce",xlab="Education (% formally educated)",horizontal=TRUE)

#get this plot

We can see that both the mean and median percentages of the population that were formally educated is lower when the farmers make up majority of the workforce, compared to when they make up less than half the work force.

We can also see that the distribution of education is skewed towards the right for provinces that were not majority farmers. However the distribution of education is far more symmetrical about the median,  for provinces that were majority farmers.

We can further see that the distribution of education has more spread (has higher variance, or higher IQR) when the farmers making up less than half the work force compared to when they were majority farmers.


Related Solutions

Before opening the dataset needed for this problem, you’ll need to call the “car” package: >...
Before opening the dataset needed for this problem, you’ll need to call the “car” package: > library(car) Now you can import the “Wong” dataset and use it to answer the question below. Remember to include any code you use along with your answers in your submission! 3. The Wong dataset contains data from a study by Wong, Monette, and Weiner (2001) on patients who fell into comas after sustaining traumatic brain injuries. After waking, Wong and colleagues administered two different...
Use RStudio to answer this question. Before opening the dataset needed for this problem, you’ll need...
Use RStudio to answer this question. Before opening the dataset needed for this problem, you’ll need to call the “car” package > library(car) Now you can import the “Robey” dataset and use it to answer the question below. Name the data frame with abc: > abc <- Robey Remember to include any R code you use along with your answers! The Robey dataset contains fertility rates from a sample of countries. You want to see if total fertility rate (tfr),...
R work (must be done in R) Before opening the dataset needed for this problem, you’ll...
R work (must be done in R) Before opening the dataset needed for this problem, you’ll need to call the “car” package. Run the following line of code: > library(car)  Now you can import the “ Cowles” dataset and use it to answer the question below. Name the data frame with your EID: > my_eid <- Cowles Remember to include any code you use along with your answers in your submission! 3. Cowles and Davis (1987) collected data on...
Need assistance responding to this discussion post in personal opinion. You’ll find that managers are also...
Need assistance responding to this discussion post in personal opinion. You’ll find that managers are also the shareholder or the owner of a company. In these cases, the shareholders interests line up with the goal of the firm. Technically, what benefits the company would also benefit the shareholder. You’ll also see that when the shareholder or owner is not playing the manager role, there’s a disconnect between both the shareholder and manager’s interest (Stout, 2002). The manager usually would make...
Need assistance responding to this discussion post in personal opinion. You’ll find that managers are also...
Need assistance responding to this discussion post in personal opinion. You’ll find that managers are also the shareholder or the owner of a company. In these cases, the shareholders interests line up with the goal of the firm. Technically, what benefits the company would also benefit the shareholder. You’ll also see that when the shareholder or owner is not playing the manager role, there’s a disconnect between both the shareholder and manager’s interest (Stout, 2002). The manager usually would make...
how to find the first, second, third quantile of a column in a dataset with R?...
how to find the first, second, third quantile of a column in a dataset with R? the dataset is like this: columns values a. 1. 2. 3. 5. 6. 7. 4. 7. 8. ......... b 5. 6. 7. 3. 8. 0. 4. 7. 4. 7. ......... c    1. 2. 3. 5. 6. 7. 4. 7. 8. ......... d 6. 3. 1. 0. 8. 3. 6. 6. 3.........
The following problem makes use of the dataset found here: FRq2B.csv Part a: Find the equation...
The following problem makes use of the dataset found here: FRq2B.csv Part a: Find the equation of the regression line and the coefficient of determination for x and z. Part b: Determine at the 5% significance level if x can be used to predict z. Part c: Find the equation of the regression line and the coefficient of determination for y and z. Part d: Determine at the 5% significance level if y can be used to predict z. x...
1. For this problem you will need a t-table. What is the t-value needed for the...
1. For this problem you will need a t-table. What is the t-value needed for the critical value for a one-tailed upper hypothesis test with a sample size = 40 and alpha =.05? 2. For this problem it would be helpful to have a standard normal table. What is the z-value needed for a confidence interval of a proportion with a sample size of 600 and alpha equal to .10? 3. For this problem you will need a t-table. What...
Find the labor cost to produce 40 cellphones if the time needed to produce the first...
Find the labor cost to produce 40 cellphones if the time needed to produce the first cell phone is 120 hours and learning rate is 90%. Assume the labor wage is $11/hour. please work it in excel
You’ll need to come up with programs for the substantive audit procedures for each of the...
You’ll need to come up with programs for the substantive audit procedures for each of the functional balance sheet areas (indicated with an asterisk (*) below).   B series*                                  Cash Substantive Audit Documentation C series*                                  Accounts Receivable Substantive Audit Documentation D series*                                  Inventory Substantive Audit Documentation E series*                                  Prepaids Substantive Audit Documentation F series*                                   Property, Plant and Equipment Substantive Audit Documentation I series*                                   Other Assets Substantive Audit Documentation L series*                                  Current Liabilities Substantive Audit Documentation N series*                                  Notes Payable Substantive Audit...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT