In: Statistics and Probability
Scenario: A 28 year old with a bachelors degree. They have no children and is a new home owner in the state of Maryland. As head of the household must determine a household budgeting plan.
Use Table 1 to report the variables selected for this assignment. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.
Variable Name In DataSet | Description | Type of variable (Qualitative or quantitative |
Income | annual househole income in USD | Quantitative |
Martial Status | ||
Age | ||
Family Size | ||
Housing |
Reason(s) for selecting the variable and expected Outcome(s)
1) Income
2) Martial Status
3) Age
4) Family Size
5) Housing
Data Set Description:
Proposed Data Analysis:
Measures of Central Tendency and Dispersion
Complete Table 2. Numerical Summaries of the Selected Variables and briefly explain why you choose those measurements. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.
Table 2. Numerical Summaries of the Selected Variables
Variable Name |
Measures of Central Tendency and Dispersion |
Rationale for Why Appropriate |
Variable 1: “Income” |
Number of Observations ? Median ? Sample Standard Deviation |
I am using median for two reasons: 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency. 2. The variable is quantitative. I am using sample standard deviation for three reasons: 1. The data is a sample from a larger data set. 2. It is the most commonly used measure of dispersion. 3. The variable is quantitative. |
Marital Status | ||
Age | ||
Family size | ||
Housing |
Graphs and/or Tables
Complete Table 3. Type of Graphs and/or Table for Selected Variables and briefly explain why you choose those graphs and/or tables. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.
Table 3. Type of Graphs and/or Tables for Selected Variables
Variable Name |
Graph and/or Table |
Rationale for why Appropriate? |
Variable 1: “Income” |
Graph: I will use the histogram to show the normal distribution of data. |
Histogram is one of the best plot to show the normal distribution of quantitative level data . |
martial status | ||
age | ||
family size | ||
housing |
The data is a random sample from the US Department of Labor’s 2016 Consumer Expenditure Surveys (CE) and provides information about the composition of households and their annual expenditures (https://www.bls.gov/cex/). It contains information from 30 households, where a survey responder provided the requested information; it is all self-reported information. This dataset contains four socioeconomic variables (whose names start with SE) and four expenditure variables (whose names start with USD).
Variable name in dataset |
Description |
Type of variable |
Income |
annual household income in USD |
Quantitative |
Marital Status |
married |
Qualitative |
Age |
28 years |
Quantitative |
Family Size |
2 person |
Quantitative |
Housing |
Owned house |
Qualitative |
Table 2. Numerical Summaries of the Selected Variables
Variable name in dataset |
Measures of Central Tendency and Dispersion |
Rationale for Why Appropriate |
Income |
Number of Observations ? Median ? Sample Standard Deviation |
I am using median for two reasons: 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency. 2. The variable is quantitative. I am using sample standard deviation for three reasons: 1. The data is a sample from a larger data set. 2. It is the most commonly used measure of dispersion. 3. The variable is quantitative. |
Marital Status |
? Mode |
I am using mode as: 1. The variable is qualitative. |
Age |
Number of Observations ? Mean ? Sample Standard Deviation |
I am using mean for two reasons: 1. crude measure of central tendency. 2. The variable is quantitative. I am using sample standard deviation for three reasons: 1. The data is a sample from a larger data set. 2. It is the most commonly used measure of dispersion. 3. The variable is quantitative. |
Family Size |
Number of Observations ? Mean ? Sample Standard Deviation |
I am using mean for two reasons: 1. crude measure of central tendency. 2. The variable is quantitative. I am using sample standard deviation for three reasons: 1. The data is a sample from a larger data set. 2. It is the most commonly used measure of dispersion. 3. The variable is quantitative. |
Housing |
mode |
I am using mode as: 1. The variable is qualitative. |
Table 3. Type of Graphs and/or Tables for Selected Variables
Variable name in dataset |
Graph and/or Table |
Rationale for Why Appropriate |
Income |
Graph: I will use the histogram to show the normal distribution of data. |
Histogram is one of the best plot to show the normal distribution of quantitative level data. |
Marital Status |
Pie chart or Bar chart |
Effectively display the relative frequencies of a small number of groups of qualitative variable |
Age |
histogram |
Histogram is one of the best to show whether data follows the normal distribution |
Family Size |
histogram |
Histogram is one of the best to show whether data follows the normal distribution |
Housing |
Pie chart or Bar chart |
Effectively display the relative frequencies of a small number of groups of qualitative variable |