Question

In: Computer Science

Can you compare 2 search engines based on information retrieval models like Probability model or boolean...

Can you compare 2 search engines based on information retrieval models like Probability model or boolean model?

Solutions

Expert Solution

Search Engine

The second approach for organizing and locating the information on the Web is search engine. Search engine is a program that searches documents for specified keywords and returns a list of the documents where the keywords are found. Search engine technology has solution for the rapid growth of Web data on the Internet, to help Web users to find desired information. For this, a number of commercial search engines are available online such as Yahoo!, Google, AltaVista, Baidu etc. Search engine can be categorized into two types: a general-purpose search engine and specific-purpose search engine. The generalpurpose search engine, such as Google, is to retrieve as many Web pages available on the Internet that are relevant to the user query to Web users. The returned Web pages to the user are ranked in a sequence according to their relevant weights to the query. The user satisfaction with the search results is dependent on how quickly and how accurately they can find the desired information. The specific–purpose search engines, on the other hand, aim at searching those Web pages for a specific task or an identified community. For example, Google Scholar and Digital Bibliography & Library Project (DBLP) are two representatives of the specific-purpose search engines. Such search engines are designed for a specific researcher community, which provides various information regarding conferences or journals in the computer science domain. No matter which type the search engine is, each search engine mainly performs the following tasks:

(i) A user interface is used for searching the information on the Web. The user can submit his query on this interface to find relevant information. The query must consist of words or phrases describing the specific information of user’s interest.

(ii) The search engine then searches its repository corresponding to the given query.

(iii) The search engine returns all URLs, which matched the given query.

(iv)   This list provides better matched URL link on the top of the returned URL list. These returned URLs may consist of links to other Web pages, textual data, images, audio, video etc.

Architecture of Search Engine The general architecture of a search engine is as shown in figure 3.2. The architecture consists of the following modules- (a) User interface (b) Query processor (c) Searcher (d) Evaluator (e) Web Crawler (f) Indexer (g) Repository.

Model is an idealization or abstraction of an actual process. Information Retrieval models can describe the computational process.

For example

1. how documents are ranked

2. Note that how documents or indexes are stored is implemented.

The goal of information retrieval (IR) is to provide users with those documents that will satisfy their information need. Retrieval models can attempt to describe the human Process, such as the information need, interaction.

Retrieval model can be categorize as

1. Boolean retrieval model

2. Vector space model

3. Probabilistic model

4. Model based on belief net

The Boolean model of information retrieval is a classical information retrieval (IR) model and is the first and most adopted one. It is used by virtually all commercial IR systems today.

Exact vs Best match

In exact match a query specifies precise criteria. Each document either matches or fails to match the query. The results retrieved in exact match is a set of document (without ranking).

In best match a query describes good or best matching documents. In this case the result is a ranked list of document. The Boolean model here I’m going to deal with is the most common exact match model.

Basic Assumption of Boolean Model

1. An index term is either present(1) or absent(0) in the document

2. All index terms provide equal evidence with respect to information needs.

3. Queries are Boolean combinations of index terms.

o X AND Y: represents doc that contains both X and Y

o X OR Y: represents doc that contains either X or Y

o NOT X: represents the doc that do not contain X

Boolean Queries Example

User information need: Interested to know about Everest and Nepal

User Boolean query: Everest AND Nepal

Implementation Part

Example of Input collection

Doc1= English tutorial and fast track

Doc2 = learning latent semantic indexing

Doc3 = Book on semantic indexing

Doc4 = Advance in structure and semantic indexing

Doc5 = Analysis of latent structures

Query problem: advance and structure AND NOT analysis

Boolean Model Index Construction

First we build the term-document incidence matrix which represents a list of all the distinct terms and their presence on each document (incidence vector). If the document contains the term than incidence vector is 1 otherwise 0.

Terms/doc

Doc1

Doc2

Doc3

Doc4

Doc5

English

1

0

0

0

0

Tutorial

1

0

0

0

0

Fast

1

0

0

0

0

Track

1

0

0

0

0

Books

0

1

0

0

0

Semantic

0

1

1

1

0

Analysis

0

1

0

0

1

Learning

0

0

1

0

0

Latent

0

0

1

0

1

Indexing

0

0

1

1

0

Advance

0

0

0

1

0

Structures

0

0

0

1

1


So now we have 0/1 vector for each term. To answer the query we take the vectors for advance, structure and analysis, complement the last, and then do a bitwise AND.

Doc1

Doc2

Doc3

Doc4

Doc5

0

0

0

1

0

0

0

0

1

1

(AND)

0

0

0

1

0

1

0

1

1

0

(NOT analysis)

0

0

0

1

0

Doc4



Related Solutions

Can I get a comparison of search engines like Google and Bingo based on (user interface)...
Can I get a comparison of search engines like Google and Bingo based on (user interface) information retrieval methods?
Search Engines carry national identities and cultures. Compare major search engines from ANY TWO continents/countries, based...
Search Engines carry national identities and cultures. Compare major search engines from ANY TWO continents/countries, based on their local markets, strategies, and national characteristics?
Search the internet and compare these three colleges for the following information. (1. Stanford, 2. Harvard,...
Search the internet and compare these three colleges for the following information. (1. Stanford, 2. Harvard, 3.Yale) What is the acceptance rate? What is the percentage of female-to-male applicants and acceptance? What is the tuition cost?
Determine whether a probability model based on Bernoulli trials can be used to investigate the situation....
Determine whether a probability model based on Bernoulli trials can be used to investigate the situation. If not, explain. A company realizes that 5% of its pens are defective. In a package of 30 pens, is it likely that more than 6 are defective? Assume that pens in a package are independent of each other. Group of answer choices Yes No. There are more than two possible outcomes. No. 6 is more than 10% of 30 No, the chance of...
2 Based on the consumer privacy legislature you are familiar with, is it legal for search...
2 Based on the consumer privacy legislature you are familiar with, is it legal for search engines to share data with DOJ (please refer to specific laws/acts when answering this question)?
Design a phone purchase program in JOptionPane. Select model ( can be made up 2 models...
Design a phone purchase program in JOptionPane. Select model ( can be made up 2 models minimum) Color selection( 2 options minimum) Storage Selection( 2 options minimum) Accessories (2 options minimum) Payment Plan or one time purchase Prices can be made up Print out price. Allow user termination. Create a class and test programs.
After reviewing the optimization models in the text, select one model that you can use to...
After reviewing the optimization models in the text, select one model that you can use to improve a specific business process. Explain your reasoning. You can work with a business process at your current job, previous jobs, think of a hypothetical situation (i.e. how FedEx may be routing their shipments or how a call center may be scheduling their employees), or use the scenario from the discussion in Week 1.
Based on all the information you could search and obtain, please show your personal understandings for...
Based on all the information you could search and obtain, please show your personal understandings for the following questions regarding the 2008 financial crisis: (1) What were the problems in our economy during the pre-crisis period, which finally led the entire world into this unprecedented crisis? Could you explain why those problems had such a severe impact to our economy? (2) Could you please show at least two financial tools that were heavily used by Wall Street before 2008 but...
You are trying to form portfolios based on the following information: State Probability Return A Return...
You are trying to form portfolios based on the following information: State Probability Return A Return B Poor 20.0% -4.0% -4.0% Normal 40.0% 3.0% 8.0% Good 30.0% 10.0% 8.0% Very Good 10.0% 30.0% 10.0% You also know the risk-free rate is 5%. Question 4: Calculate the Covariance between Stock A and B Question 5: Calculate the Correlation Coefficient between Stock A and B
2. What type of information would you like to have in the “Other Information” column on...
2. What type of information would you like to have in the “Other Information” column on the receiving sheet? Why? 3. Many operators feel that the receiving sheet is useful in calculating daily food, beverage, and nonfood costs. How do you think the receiving sheet is helpful in this matter? 4. What should a receiver do when a question arises regarding the quality of merchandise received? 5. What should a receiver do if a delivery is made without an accompanying...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT