In: Computer Science
Using the data from the csv file, answer the questions with rstudio
# number_children - The number of children in the home
# internet - Does the home have internet access?
# mode - The way the household took the survey
# own - Do the residents own with or without a mortgage or rent?
# language - The primary language spoken in the home
# decade_built - The decade the home was built
1) In how many households, wife’s income is over 100K?
2) How many households have a total income greater than 150K?
3) What is the average age of wife living in a home “Owned free and clear”?
4) How many different modes were available to participate in this survey?
5) How many households have both husband and wife younger than 40 years old?
6) How many households do not have internet access?
7) How many homes have 4 bedrooms?
8) How many homes have either electricity or gas monthly cost higher than 100?
9) In households with no children, do husbands tend to be older than their wives?
10) Do households in houses built in the 1960s or earlier spend more on electricity than those built in the 1970s or later? By how much?
# household - A unique ID number for each household
# age_husband - Age in years of the husband
# age_wife - Age in years of the wife
# income_husband - Total annual income of the husband
# income_wife - Total annual income of the wife
# bedrooms - Number of bedrooms in the home
# electricity - Monthly cost of electricity
# gas - Monthly cost of gas
csv file
https://docs.google.com/spreadsheets/d/1w3OCJ-ARXJ7mS_D9aKMjGmHJ3gHO2IEVpoW3tOUoCZs/edit?usp=sharing
Below are the R queries used to run the given problems, along with the answers.
data <- read.csv ("D:/practice-dataset.csv")
dim (subset (data, income_wife>100000))[1]
[1] 257
dim (subset (data, (income_husband + income_wife)
>150000))
[1] 887
mean (subset (data, own == "Owned free and
clear")[["age_wife"]])
[1] 63.25527
length (unique (data [["mode"]]) )
[1] 3
dim (subset (data, (age_husband < 40) & (age_wife <
40) ))[1]
[1] 1427
dim (subset (data, internet == "No"))[1]
[1] 642
dim (subset (data, bedrooms == 4))[1]
[1] 1652
dim (subset (data, (electricity > 100) | (gas > 100)
))[1]
[1] 4441
dim (subset (data, (number_children == 0) & (age_husband
> age_wife) ))[1]
[1] 3412
mean (subset(data, decade_built <= 1960)[["electricity"]]) -
mean (subset(data, decade_built > 1960)[["electricity"]])
[1] -2.224459
# Houses built in 1960s or earlier spend $2.22 less on electricity
than those build in 1970s or later