Question

In: Computer Science

Using the data from the csv file, answer the questions with rstudio # number_children - The...

Using the data from the csv file, answer the questions with rstudio

# number_children - The number of children in the home

# internet - Does the home have internet access?

# mode - The way the household took the survey

# own - Do the residents own with or without a mortgage or rent?

# language - The primary language spoken in the home

# decade_built - The decade the home was built

1) In how many households, wife’s income is over 100K?

2) How many households have a total income greater than 150K?

3) What is the average age of wife living in a home “Owned free and clear”?

4) How many different modes were available to participate in this survey?

5) How many households have both husband and wife younger than 40 years old?

6) How many households do not have internet access?

7) How many homes have 4 bedrooms?

8) How many homes have either electricity or gas monthly cost higher than 100?

9) In households with no children, do husbands tend to be older than their wives?

10) Do households in houses built in the 1960s or earlier spend more on electricity than those built in the 1970s or later? By how much?

# household - A unique ID number for each household

# age_husband - Age in years of the husband

# age_wife - Age in years of the wife

# income_husband - Total annual income of the husband

# income_wife - Total annual income of the wife

# bedrooms - Number of bedrooms in the home

# electricity - Monthly cost of electricity

# gas - Monthly cost of gas

csv file

https://docs.google.com/spreadsheets/d/1w3OCJ-ARXJ7mS_D9aKMjGmHJ3gHO2IEVpoW3tOUoCZs/edit?usp=sharing

Solutions

Expert Solution

Below are the R queries used to run the given problems, along with the answers.

data <- read.csv ("D:/practice-dataset.csv")

dim (subset (data, income_wife>100000))[1]
[1] 257

dim (subset (data, (income_husband + income_wife) >150000))
[1] 887

mean (subset (data, own == "Owned free and clear")[["age_wife"]])
[1] 63.25527

length (unique (data [["mode"]]) )
[1] 3

dim (subset (data, (age_husband < 40) & (age_wife < 40) ))[1]
[1] 1427

dim (subset (data, internet == "No"))[1]
[1] 642

dim (subset (data, bedrooms == 4))[1]
[1] 1652

dim (subset (data, (electricity > 100) | (gas > 100) ))[1]
[1] 4441

dim (subset (data, (number_children == 0) & (age_husband > age_wife) ))[1]
[1] 3412

mean (subset(data, decade_built <= 1960)[["electricity"]]) - mean (subset(data, decade_built > 1960)[["electricity"]])
[1] -2.224459
# Houses built in 1960s or earlier spend $2.22 less on electricity than those build in 1970s or later


Related Solutions

When using the import wizard in MATLAB to import data fro, a .csv file the data...
When using the import wizard in MATLAB to import data fro, a .csv file the data appears in MATLAB in the following format "35:53.2" how do I convert this into more usable matlab values? I think that the duration function was used to generate the time format. The code will need to be written in MATLAB software I will leave feedback if you are able to provide a correct response. Thank you
Answer the following questions using the NYC2br.MTW file. Data were collected from a random sample of...
Answer the following questions using the NYC2br.MTW file. Data were collected from a random sample of two-bedroom apartments posted on Apartments.com in Manhattan and Brooklyn. A. What is one type of graph that could be used to compare the monthly rental rates of these two-bedroom apartments in Manhattan and Brooklyn? Explain why this is an appropriate graph. B. Using Minitab Express, Construct the graph you described in part A to compare the Manhattan and Brooklyn apartments in this sample. C....
Please answer the following questions using the data in the attached Excel file. You are thinking...
Please answer the following questions using the data in the attached Excel file. You are thinking of investing in Abercrombie and Fitch Co. (ANF). The returns for ANF are embedded in an Excel document below (Source: yahoo.com). 1.      For the investment in ANF that you are considering, for all of 2012 determine the following items: a) the mean return b) the median return c) the standard deviation d) the variance e) the coefficient of variation The weekly rates of return...
Create a program that parses a CSV file of product data and prints the items with...
Create a program that parses a CSV file of product data and prints the items with a price that is less than or equal to that input by the user. • Your program should take two arguments: an input file to process and a price limit. • Print only the names of each item to stdout that have a price less than or equal to the given limit. • If the given file does not exist or cannot be opened,...
Please answer this using Rstudio For the oyster data, calculate regression fits (simple regression) for the...
Please answer this using Rstudio For the oyster data, calculate regression fits (simple regression) for the 2D and 3D data a.1) Give null and alternative hypotheses a.2) Fit the regression model a.3) Summarize the fit and evaluation of the regression model (is the linear relationship significant). a.4 )Calculate residuals and make a qqplot. Is the normal assumption reasonable? Actual   2D   3D 13.04   47.907   5.136699 11.71   41.458   4.795151 17.42   60.891   6.453115 7.23   29.949   2.895239 10.03   41.616   3.672746 15.59   48.070   5.728880 9.94  ...
Data Set The data set (attached) is a modified CSV file on all International flight departing...
Data Set The data set (attached) is a modified CSV file on all International flight departing from US Airports between January and June 2019 reported by the US Department of Transportation (https://data.transportation.gov/Aviation/International_Report_Passengers/xgub-n9bw). Each record holds a route (origin to destination) operated by an airline. This CSV file was modified to keep it simple and relatively smaller. Here is a description of each column: Column 1 – Month (1 – January, 2 – February, 3 – March, 4 – April, 5...
Assignment 1 - IN PDF FORMAT Using R and Rstudio Pick a database from: data() Then...
Assignment 1 - IN PDF FORMAT Using R and Rstudio Pick a database from: data() Then preview the first 10 rows. Print the number of rows and columns - Print the names of the variables If you have row names, print them - work with the values for a field in your dataset. You can do it by dataset[[xx]] operator with xx can be the index of the field or the nae of the field. Now use dataset[xx] to get...
Use the data in the Mod8-2Data file to answer the following questions. The data contains information...
Use the data in the Mod8-2Data file to answer the following questions. The data contains information from a car seat manufacturer on the age of machine (in months) and the cost of repairs (in 10s of $). Run the regression in Minitab and show the regression line on a scatter plot. Assume a level of significance of 5%. Age Repairs10 110 32.767 113 37.668 114 39.252 134 44.314 93 34.262 141 47.616 115 32.474 115 33.898 115 43.345 142 52.637...
(a) Using the armspanSpring2020.csv data from class, test the hypothesis that those who identify as female...
(a) Using the armspanSpring2020.csv data from class, test the hypothesis that those who identify as female have a shorter armspan than those who do not so identify. Write out the null and alternative hypotheses, give the value of the test statistic and the p-value, and state your conclusion using a 5% significance level. Use R for all computations. (b) Interpret, in your own words, the meaning of the p-value you got in part (a). (c) Find a 95% confidence interval...
I am trying to create a program that reads from a csv file and finds the...
I am trying to create a program that reads from a csv file and finds the sum of total volume in liters of liquor sold from the csv file by county and print that list out by county in descending order. Currently my program runs and gives me the right answers but it is not in descending order. I tried this:     for county, volume in sorted(sums_by_volume.items(), key=lambda x: x[1], reverse=True):         index +=1         print("{}. {} {:.2f}".format(county, sums_by_volume[county]))      When I run...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT