In: Computer Science
1 Question 1
Suppose a tweet is represented as a tuple (tweet id, Boolean,
Boolean). The second Boolean element will be True if the tweet
matches Twitter REST API Query otherwise, it will be False. The
third Boolean element will be True if the tweet content is positive
(relevant to the topic of interest); Otherwise, it will be
False.
The whole twitter space is represented as the set of tweets:
Suppose the crawled tweets using Twitter REST API Query is
represented as the set of tweets: M = {(4, True, False), (7, True,
False), (8, True, False), (10, True, True), (11, True, True), (13,
True, True), (17, True, False)} Suppose the set of tweets from
randomly sampled users is: D0= {(3, False, False), (4, True,
False), (5, True, True), (8, True, False), (11, True, True), (12,
False, False), (13, True, True) , (14, True, False), (15, False,
True)}
Give the above sampled tweets, calculate the approximate values of
the three metrics, including API recall, quality recall, and
quality precision.
2 Question 2
Suppose a tweet is represented as a tuple (tweet id, Boolean,
Boolean). The second Boolean element will be True if the tweet
matches Twitter REST API Query otherwise, it will be False. The
third Boolean element will be True if the tweet content is positive
(relevant to the topic of interest); Otherwise, it will be
False.
The whole twitter space is represented as the set of tweets:
Suppose the crawled tweets using Twitter REST API Query is
represented as the set of tweets: M = {(2, True, False), (3, True,
True), (6, True, True), (9, True, False), (14, True, False)}
Suppose the set of tweets from randomly sampled users is: D0 = {(1,
False, False), (4, True, True), (6, True, True), (9, True, False),
(10, False, True), (12, True, False), (13, True, False), (14, True,
False), (15, True, True)}
Give the above sampled tweets, calculate the approximate values of
the three metrics, including API recall, quality recall, and
quality precision.
Answer:
Data Mining Question:
As we can see that we have whole set of Twitter space is
represented from the mentioned 2 sets of tweets.
Total there are the following unique tweet IDs from the 2 sets.
3, 4, 5, 7, 8, 10, 11, 12, 13, 14, 15, 17
API recall is the fraction of the tweets that are successfully retrieved by API.
The tweets from the whole set is(only showing the ids): 3, 4, 5, 7, 8, 10, 11, 12, 13, 14, 15, 17
Out of this API got the tweets with ids: 4, 7, 8, 10, 11, 13, 17
Therefore API recall is: 7/12
Quality recall is the fraction of the relevant(quality)
tweets of the relevant total tweet set.
The relevant tweet from the whole set is(only showing the ids): 5,
10, 11, 13, 15
Out of this API got the tweets with ids: 10, 11, 13
Therefore API recall is: 3/5
Quality precision is the fraction of the retrieved(quality) tweets that are successfully retrieved by API.
Total tweet retrieved by API(only showing the ids): 4, 7, 8, 10, 11, 13, 17
The relevant tweet got by API: 10, 11, 13
Therefore Quality precision is: 3/7
The difference between precision and recall is that:
precision counts the fraction of relevance among the retrieved
documents whereas
recall counts the fraction of relevance among the whole set of
documents.
Note: i write the question num1 only
i hope you can understand this answer.