In: Statistics and Probability
Outline a method to disambiguate homographs (two words that are spelled the same) based on search engine logs. For example, how can we distinguish a financial institution bank and a river bank based on these logs? reference book : ext Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining / please provide details and context
Homographs spelled the same but they had the different meaning.
To distinct and Disambiguate :
Search Engine using some parameters to distinguish.
They are 1. Maintaining History 2. Geography 3, Context/Syntax
History :
Search engine maintaining you search history like what your searching, what kind of data your seeing on your daily routine. so by analyzing your history data, search engine will give you the related information.
For example: If you are searching for Accounting, deposits, net banking, taxes ...etc
system will give Banking information.
If yours searching history is more about rivers, swimming places... etc . system will suggest river bank.
User Location:
Search engine maintaining Our Geographical locations also by using GPS/Navic or IP address.
By this geographical location system will suggest you the relevant topics.
For Example:
If you are staying nearer to river, it may give river bank details. If you are staying in area where Financial banks are there. you will get same data.
Syntax/Context of search:
by using Syntax and context of words you have you used for the search.
Ex:
If you search like Bank near me to deposit money. By the other words search engine can decide that it is about Financial bank.
If you search like nearest bank for fishing, by this context search engine will give you River bank.
Note:
Search engine will analyze data by comparing all the parameters, then only it will give you relevant and best options.