Question

In: Computer Science

Write a java simple document retrieval program that first asks the user to enter a single...

Write a java simple document retrieval program that first asks the user to enter a single term query, then goes through two docuements named doc1.txt and doc2.txt (provided with assignment7) to find which document is more relevant by calculating the frequency of the query term in each document. Output the conclusion to a file named asmnt7output.txt in the following format (as an example for query “java” and “problem”). The percentages in parenthese are respective supporting frequencies.

java: doc1(6.37%) is more relevant than doc2(0.00%)

problem: doc2(0.41%) is more relevant than doc1(0.17%)


Your code should keep asking for a query until the user enters -1. Matching of the query term and the document terms should be case insensitive. Especially, if the query term is the prefix of a document term ignoring the case, it should also be considered matching (e.g. “Computer” matches “computers”, “Network” matches “netWorking”, etc).

Solutions

Expert Solution

The Java document retrieval program is given below, and the file is named DocQuery.java. All the methods have been extensively commented and explained. Build it any way you like but be careful with the location of the input files "doc1.txt" and "doc2.txt". If you have any related queries, please leave a comment and I will get back to you.

DocQuery.java

import java.io.File;

import java.io.FileInputStream;

import java.io.InputStreamReader;

import java.io.BufferedReader;

import java.util.ArrayList;

import java.util.regex.Matcher;

import java.util.regex.Pattern;

import java.util.Scanner;

//The DocQuery class contains the method that we use to read the files and their content,

//process it and calculate the supporting frequencies into query terms

public class DocQuery {

    //This method takes a filename as argument and reads the content of the file'

    //line by line, and returns a string for the entire text of the file.

    public String readFile(String filename) {

        //We use StringBuilder for building in the entire string

        //This is better than concatenation on string

        StringBuilder content = new StringBuilder();

        try {

            FileInputStream file = new FileInputStream(new File(filename));

            //Read with BufferedReader

            BufferedReader br = new BufferedReader(new InputStreamReader(file));

            String line;

            //Read till br fetches null (EOF)

            while((line = br.readLine()) != null) {

                content.append(line+"\n");

            }

            br.close();

        } catch(Exception ex) {

            ex.printStackTrace();;

        }

        //Return the string created by calling the StringBuilder's toString() method

        return content.toString();

    }

    //Takes a string as input and returns all the document terms (valid words), in the string

    //in an ArrayList of String

    public ArrayList<String> splitIntoTerms(String line) {

        ArrayList<String> termList = new ArrayList<>();

        //We use Regex to find the terms

        //here the regex w+ matches any set of characters that contain either numbers 0-9,

        //charcters A-Z or a-z and the underscore_character

        Pattern regExp = Pattern.compile("\\w+");

        //We collect all the matching terms in String line and find them

        Matcher terms = regExp.matcher(line);

        //Add the terms to termList

        while(terms.find()) {

            termList.add(terms.group());

        }

        //return termList

        return termList;

    }

    //Calculates the supporting frequncy percentage, given the text of a file and query term

    public double getSuportingFrequency(String content, String query) {

        //Splits the text into lines

        String lines[] = content.split("\n");

        //initilize the count for total and supporting terms to zero

        int totalTerms = 0;

        int supportingTerms = 0;

        //Go through each line

        for(int i = 0; i < lines.length; i++) {

            //get the terms in each line

            ArrayList<String> terms = splitIntoTerms(lines[i]);

            //add to the totalTerms count

            totalTerms += terms.size();

            //traverse each term in each line

            for(int j = 0; j < terms.size(); j++) {

                //check for macthes if and only if the term's length is greater than or equal to query term

                if(terms.get(j).length() >= query.length())

                    //We have to match conditions: 1. Match the entire term. 2)Match prefix and in a case insensitive manner

                    //for each terms we convert them to lowercase, get the first n terms of the documnet term

                    //and match it with our query term and increment supporting terms if matched

                    if(terms.get(j).toLowerCase().substring(0,query.length()).indexOf(query.toLowerCase()) > -1)

                        supportingTerms++;

            }

        }

        //calculate the percentage. Cast to double to include fractional values

        double per = (double)supportingTerms/(double)totalTerms;

        per *= 100;

        return per;

    }

    public static void main(String args[]) {

        DocQuery dq = new DocQuery();

        //Read the text from files doc1 and doc2

        String doc1 = dq.readFile("doc1.txt");

        String doc2 = dq.readFile("doc2.txt");

        Scanner in = new Scanner(System.in);

        int choice = 0;

        //Enter a do while loop

        do {

            String query;

            //get the query term

            System.out.println("Enter the query term : ");

            query = in.next();

            //calculate the supporting frequency for the term in both the files

            double support1 = dq.getSuportingFrequency(doc1, query);

            double support2 = dq.getSuportingFrequency(doc2, query);

            //get a string representation of the percentage values with the values rounded

            //to two places after the decimal

            String s1_str = String.format("%.2f",support1);

            String s2_str = String.format("%.2f",support2);

            //use if else to print appropriate results

            if(support1 > support2)

                System.out.println(query+": doc1("+s1_str+"%) is more relevant than doc2("+s2_str+"%)");

            else if(support1 < support2)

                System.out.println(query+": doc2("+s2_str+"%) is more relevant than doc1("+s1_str+"%)");

            else if(support1 == support2)

                System.out.println(query+": doc1("+s1_str+"%) and doc2("+s2_str+"%) are equally relevant");

            //ask for user choice

            System.out.println("Again? 0(for y)/1(for n)");

            choice = in.nextInt();

            //exit if choice is 1

        } while(choice != 1);

        in.close();

    }

}

Sample Input Files (doc1.txt and doc2.txt)

These files were used to test the program

Sample Output


Related Solutions

Program should be written in Java a) Write a program that asks the user to enter...
Program should be written in Java a) Write a program that asks the user to enter the approximate current population of India. You should have the computer output a prompt and then YOU (as the user should enter the population.)  For testing purposes you may use the value of 1,382,000,000 from August 2020. Assume that the growth rate is 1.1% per year. Predict and print the predicted population for 2021 and 2022. The printout should include the year and the estimated...
Write a Java program that asks the user to enter an integer that is used to...
Write a Java program that asks the user to enter an integer that is used to set a limit that will generate the following four patterns of multiples of five using nested loops •Ascending multiples of five with ascending length triangle •Ascending multiples of five with descending length (inverted) triangle •Descending multiples of five with ascending length triangle •Descending multiples of five with descending length (inverted) triangle Use error checking to keep asking the user for a positive number until...
Java Program 1. Write a program that asks the user: “Please enter a number (0 to...
Java Program 1. Write a program that asks the user: “Please enter a number (0 to exit)”. Your program shall accept integers from the user (positive or negative), however, if the user enters 0 then your program shall terminate immediately. After the loop is terminated, return the total sum of all the previous numbers the user entered. a. What is considered to be the body of the loop? b. What is considered the control variable? c. What is considered to...
Write a program in Java that first asks the user to type in today's price of...
Write a program in Java that first asks the user to type in today's price of one dollar in Japanese yen, then reads U.S. dollar values and converts each to yen. Use 0 as a sentinel to denote the end of dollar input. THEN the program reads a sequence of yen amounts and converts them to dollars. The second sequence is terminated by another zero value.
Write a C++ program that asks the user to enter a series of single-digit numbers with...
Write a C++ program that asks the user to enter a series of single-digit numbers with nothing separating them. Read the input as a C-string or a string object. The program should display the sum of all the single-digit numbers in the string. For example, if the user enters 2514, the program should display 12, which is the sum of 2, 5, 1, and 4. The program should also display the highest and lowest digits in the string. It is...
Instructions Write a Java program that asks the user t enter five test scores. The program...
Instructions Write a Java program that asks the user t enter five test scores. The program should display a letter grade for each score and the average test score. Write the following methods in the program: * calcAverage -- This method should accept five test scores as arguments and return the average of the scores. * determineGrade -- This method should accept a test score as an argument and return a letter grade for the score, based on the following...
Write a JAVA program that prompts the user to enter a single name. Use a for...
Write a JAVA program that prompts the user to enter a single name. Use a for loop to determine if the name entered by the user contains at least 1 uppercase and 3 lowercase letters. If the name meets this policy, output that the name has been accepted. Otherwise, output that the name is invalid.
Write a Java program that directs the user to enter a single word (of at least...
Write a Java program that directs the user to enter a single word (of at least four characters in length) as input and then prints the word out in reverse in the pattern shown below (please note that the spaces added between each letter are important and should be part of the program you write): <Sample Output Enter a word (at least four characters in length): cat Word must be at least four characters in length, please try again. Enter...
IN JAVA Write a complete program that asks the user to enter two real numbers from...
IN JAVA Write a complete program that asks the user to enter two real numbers from the console. If both numbers are positive print the product, if both numbers are negative print the quotient, otherwise print INVALID INPUT. Use a nested if; output should be to a dialog and console; use printf to format the console output (for real numbers specify the width and number of digits after the decimal). The output must be labeled. Follow Java conventions, indent your...
Write a program that asks the user to enter the name of a file, and then...
Write a program that asks the user to enter the name of a file, and then asks the user to enter a character. The program should count and display the number of times that the specified character appears in the file. Use Notepad or another text editor to create a sample file that can be used to test the program. Sample Run java FileLetterCounter Enter file name: wc4↵ Enter character to count: 0↵ The character '0' appears in the file...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT