In: Statistics and Probability
Assumptions guides (be sure to explain how your data meets
these assumptions and perform the applicable
tests.
Applicable Tests: T-tests /Chi-Square / Correlation / Regression / ANOVA / Mann-Whitney U / Wilcoxon Signed Rank / Kruskal-Wallis
Create a document explaining the data you have decided to
use and what statistical test you intend to use. Be sure to use the
decision tree provided in this activity. Analyze your data to
insure that it meets the assumptions of the test (for example
parametric vs. non-parametric). Please Evaluate and Utilize These
police shooting death totals.
shot to death by the police 2017-18 |
||||
RACE | 2017 | 2018 | 2017 MALE | 2018 MALE |
WHITE | 457 | 211 | ||
BLACK | 223 | 102 | ||
HISPANIC | 179 | 68 | ||
OTHER | 44 | 18 | ||
UNKNOWN | 84 | 120 | ||
TOTAL | 987 | 519 |
940 |
492 |
For shot to death by the police data in 2017 and 2018. Here we may be interseted in finding out whether the data values obtained for year 2017 is associated with year 2018. The Chi-square test of independence is formed to test the association between two year data.
The Chi-square statistic is a non-parametric test which takes the assumption that,
1) the distribution of the data is not known (i.e. non-parametric),
2) The data values are measured at no nominal or ordinal level (nominal in this case) and
3) The sample sizes of the groups that studied are unequal.
The Chi-Square test of independence is used to determine if there is a significant relationship between two factors (RACE and YEAR). The Chi-Square test of independence is performed in following steps,
Step 1: The hypothesis is defined as,
Null hypothesis: There is no association between two variables.
Alternative hypothesis: There is an association present between the two variables.
Step 2: The significance level for the test is,
Step 3: The Chi-Square test statistic is obtained as follow,
The observed values are,
Observed Values | |||
RACE | 2017 | 2018 | TOTAL |
WHITE | 457 | 211 | 668 |
BLACK | 223 | 102 | 325 |
HISPANIC | 179 | 68 | 247 |
OTHER | 44 | 18 | 62 |
UNKNOWN | 84 | 120 | 204 |
TOTAL | 987 | 519 | 1506 |
The expected values are obtained using the formula,
The expected values are,
Observed Values | |||
RACE | 2017 | 2018 | TOTAL |
WHITE | 437.79 | 230.21 | 668 |
BLACK | 213 | 112 | 325 |
HISPANIC | 161.88 | 85.12 | 247 |
OTHER | 40.63 | 21.37 | 62 |
UNKNOWN | 133.7 | 70.3 | 204 |
TOTAL | 987 | 519 | 1506 |
Now the Chi-Square Value is obtained using the formula,
Step 4: The P-value for Chi-Square statistic is obtained using the chi square distribution table,
Step 5:
The null hypothesis is rejected. It can be stated now, the two variables are dependent. There is a statistically significant association between two variables Race and year.
Similarly, the test is performed for the two variable, Gender (Male, Female) and Year (2017 and 2018).
From the data values provided,
GENDER | 2017 MALE | 2018 MALE |
MALE | 940 | 492 |
FEMALE | 47 | 27 |
TOTAL | 987 | 519 |
The Chi-Square test of independence is performed in following steps,
Step 1: The hypothesis is defined as,
Null hypothesis: There is no association between two variables.
Alternative hypothesis: There is an association present between the two variables.
Step 2: The significance level for the test is,
Step 3: The Chi-Square test statistic is obtained as follow,
The observed values are,
Observed Values | |||
GENDER | 2017 MALE | 2018 MALE | TOTAL |
MALE | 940 | 492 | 1432 |
FEMALE | 47 | 27 | 74 |
TOTAL | 987 | 519 | 1506 |
The expected values are,
Expected Values | |||
GENDER | 2017 MALE | 2018 MALE | TOTAL |
MALE | 938.5 | 493.5 | 1432 |
FEMALE | 48.5 | 25.5 | 74 |
TOTAL | 987 | 519 | 1506 |
Now the Chi-Square Value is obtained using the formula,
Step 4: The P-value for Chi-Square statistic is obtained using the chi square distribution table,
Step 5:
The null hypothesis is failed rejected. It can be stated now, the two variables are independent. There is a no significant association between two variables Gender and year.