In: Statistics and Probability
FiscalNote is a startup founded by a Washington, DC entrepreneur and funded by a Singapore sovereign wealth fund, the Winklevoss twins of Facebook fame, and others. It uses machine learning and data mining techniques to predict for its clients whether legislation in the US Congress and in US state legislatures will pass or not. The company reports 94% accuracy. (Washington Post, November 21, 2014, “Capital Business”) ConsideringjustbillsintroducedintheUSCongress,do a bit of internet research to learn about the numbers of bills introduced and passage rates. Identifythepossibletypes of misclassifications, and comment on the use of overall accuracy as a metric. Include a discussion of other possible metrics and the potential role of propensities.
There are two types of misclassification:
Null hypothesis: bill not passed
Alternate hypothesis: the bill passed
1) Type 1 error: It is a condition when the bill is not passed actually but our system predicts that the bill is passed
2) Type 2 error: It is a condition when the bill is passed actually but our system predicts that bill is not passed
Using accuracy only as a measure will not capture both these two types of error. However, accuracy is a good measure only for the balanced problem( where the number of positive and negative classes are the same). But here we our data do not guarantee for the balanced problem. so other metrics such as below should be used
a) sensitivity
b) specificity
c) precision
There may be certain cases where one of the misclassifications plays a major role than the other. For example in court punishing an innocent suspect is harsher then giving a free pass to a criminal. In such cases, we need to frame the problems and pick the right metrics out of listed above in a,b,c