In: Math
You download two sets of posts from an online forum. Set One is a collection of posts by "pro-Hong Kong Protestors" (HKP) students. Set Two is a collection of posts by pro-Chinese Government (CG) students. (Let's say you get these two collections by searching for students who are members either of a pro-HKP group, or pro-CG group.) You compute the probabilities of different words they use, and focus on a set of six "key" words of interest, {"legal", "democracy", "violence", "legitimate", "calm", "foreign"}. You compute the "probability that, given that they use one of these five words, which word it is" (you could do this by counting up each of those words for the two sets, and dividing by the total number of those words in each set.) words: {"legal", "democracy", "violence", "legitimate", "calm", "foreign"}. pHKP = {0.2, 0.2, 0.3, 0.2, 0.05, 0.05} pCG = {0.1, 0.05, 0.3, 0.05, 0.1, 0.4}
The government tells you that they think about 10% of the
posters on the mainland are pro-HKP, and they just want to have a
conversation with these people about things.
You encounter a post. The poster uses the word "democracy" twice,
the word "violence" once, and the word "foreign" once. Assuming
that he is either pro-HKP, and follows the pHKP distribution, or
pro-CG, and follows the pCG distribution...
Q: Given government priors, what is the probability that the poster is pro-HKP? (i.e., follows the pHKP distribution rather than the pCG distribution)