Question

In: Computer Science

ID Documents 1 I love data mining 2 The seven dwarves love mining 3 Data science...

ID Documents
1 I love data mining
2 The seven dwarves love mining
3 Data science is a hot new career
4 I don't love my major or career

Use the corpus of documents shown in the above table to answer the quiz questions below.

What is the inverse document frequency (IDF) of the term "love"? (Round your answer to 2 decimal places).

What is the TF-IDF value (importance) of the term "data" to document 1? (Round your answer to 2 decimal places)

Can you show me the calculations? Thank you!

Solutions

Expert Solution

After removing the word stems and irrelevant words, the term frequency matrix is created as shown below.

a) What is the inverse document frequency (IDF) of the term "love"? (Round your answer to 2 decimal places).

Inverse Document Frequency can be calculated using the following equation.

where d is the document collection, and dt is the set of documents containing term t.

Here d is the number of documents=4

dt is the number of documents containing the term "love" = 3 (d1, d2 and d4 documents contain the term love).

IDF(love)=log(1+4)/3)

=log(5/3)=0.51

b) What is the TF-IDF value (importance) of the term "data" to document 1? (Round your answer to 2 decimal places)

TF-IDF(t)=TF(d,t) x IDF(t)

TF (d,t) can be calculated using the following equation.

TF(d1,data) =1+log(1+log(1)) [ In document 1, the term data appear 1 time.]

= 1+log(1+0)

=1+log(1) = 1

Inverse Document Frequency can be calculated using the following equation.

where d is the document collection, and dt is the set of documents containing term t.

Here d is the number of documents=4

dt is the number of documents containing the term "data" = 2 (d1 and d3 documents contain the term data)

IDF(data)=log(1+4)/2)

=log(5/2)=0.92

TF-IDF(t)=TF(d,t) x IDF(t)

= 1 * 0.92

= 0.92


Related Solutions

Data Structure: 1. Write a program for f(n) = 1^2+2^3+…+n^2. (i^2 = i*i) 2. If you...
Data Structure: 1. Write a program for f(n) = 1^2+2^3+…+n^2. (i^2 = i*i) 2. If you have the following polynomial function f(n)=a0 +a1 x + a2x2+…+an xn , then you are asked to write a program for that, how do you do? 3. Write a function in C++ to sort array A[]. (You can assume that you have 10 elements in the array.) 4. Analyze the following program, tell us what does it do for each location of “???” (...
Counting and Looping The program in LoveCS.java prints “I love Computer Science!!” 10 times. Copy it...
Counting and Looping The program in LoveCS.java prints “I love Computer Science!!” 10 times. Copy it to your directory and compile and run it to see how it works. Then modify it as follows: // **************************************************************** // LoveCS.java // // Use a while loop to print many messages declaring your // passion for computer science // **************************************************************** public class LoveCS { public static void main(String[] args) { final int LIMIT = 10; int count = 1; while (count <= LIMIT){...
Data Science for Data Mining Why is it often better to perform reductions using operators rather...
Data Science for Data Mining Why is it often better to perform reductions using operators rather than excluding attributes or observations as data are imported? (Write Minimum 100 words)
Consider the middle 3 digits of your student ID composed of seven digits. Convert it to...
Consider the middle 3 digits of your student ID composed of seven digits. Convert it to binary format (each digit is represented by a maximum of 3bits). For example, 1060385 is simplified to 603 and then converted to 110 000 011. Assume now that we want to send your student ID while being able to detect and correct single bit errors. 1.1) Using two-dimensional parity check show what will be transmitted codeword using datawords of size 3bits. 1.2) Using the...
Briefly summarize what is Data Science for Business? What you need to know about data mining...
Briefly summarize what is Data Science for Business? What you need to know about data mining and data-analytic thinking". How do you think about the emerging trend of Big Data and Data Mining?
ID Affiliation Location Education Confidence 1 1 3 0 72 2 1 3 5 65 3...
ID Affiliation Location Education Confidence 1 1 3 0 72 2 1 3 5 65 3 0 4 5 66 4 0 1 4 78 5 0 3 1 81 6 1 2 5 81 7 1 1 2 83 8 1 3 3 74 9 0 4 0 78 10 0 2 2 85 11 0 1 1 85 12 1 3 5 69 13 1 2 0 69 14 1 3 2 79 15 1 4 1 82...
ID X Y 1 2 3 2 3 6 3 4 6 4 5 7 5...
ID X Y 1 2 3 2 3 6 3 4 6 4 5 7 5 8 7 6 5 7 7 6 7 8 8 8 9 7 8 10 12 11 Test the significance of the correlation coefficient. Then use math test scores (X) to predict physics test scores (Y).  Do the following: Create a scatterplot of X and Y. Write the regression equation and interpret the regression coefficients (i.e., intercept and slope). Predict the physics score for each....
Data: Student ID Final GPA Annual Salary ($1000) 1 3.36 78 2 2.62 71 3 2.69...
Data: Student ID Final GPA Annual Salary ($1000) 1 3.36 78 2 2.62 71 3 2.69 71 4 3.18 78 5 3.02 77 6 2.88 76 7 3.04 77 8 2.47 67 9 3.4 78 10 2.61 69 11 3.75 86 12 2.92 76 13 3.02 77 14 2.66 72 15 3.3 81 16 3.15 78 17 3.05 103 18 3.06 78 19 3.57 80 20 2.32 64 21 2.73 73 22 2.71 69 23 2.19 65 24 2.86 73...
Raw data ID X Y A 0 0 B 0 2 C 3 4 D 3...
Raw data ID X Y A 0 0 B 0 2 C 3 4 D 3 4 E 6 6 F 6 8 Standard scores ID STDX STDY A -1.22 -1.55 B -1.22 -0.78 C 0 0 D 0 0 E 1.22 0.78 F 1.22 1.55 1. What is the sum of squares regression? (correct answer is 36, please show work) 2. What can you conclude with ANOVA? (correct answer is Reject the null, p<0.01; type I error is possible,...
For each of the general audit procedures of (1) recalculation, (2) inspection of internal documents, (3)...
For each of the general audit procedures of (1) recalculation, (2) inspection of internal documents, (3) reperformance, and (4) analytical procedures, discuss one way the procedure could be misapplied or the auditors could be misled in such a way as to render the work (audit evidence) misleading or irrelevant.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT