In: Accounting
Suppose you are working as a data mining consultant for an Internet Search Engine company. Describe how data mining can help the company. Give examples for which techniques such as (1) clustering, (2) classification, (3) association rule mining, (4) anomaly detection can be applied.
Answer :
The search engine provides large number of options with its methods to get the exact data that we are searching for in a huge database.
Clustering:
Clustering is useful for extracting data that is used in similar. We can provide list of variables that can classify the main idea as inputs. These variables are matched with other main topics with same variables.
Clustering can be applied in many applications like searching through databases or webstes with the information of drugs. Finding combination of drugs that are used basing on the previous observations.
Classification:
Classification deals with a group of variables that have some associations that are discrete. As in the previous example for a pharmacology search engine; drugs X, Y, and Z may be similar drugs to help lower blood pressure. Each variable can be a drug that is used in conjunction with other variables (drugs). Not all drugs are prescribed by a doctor, some drugs can be over the counter, other can be vitamins or even nutritional components found in foods. Using classification these particular variables may be stored as binary (discrete) variables that if enabled (1) or disabled (0) would aid the search algorithm.
Association Rule Mining:
This method of data mining is used to discover patterns within the input and the data base creating a strong link that associates the two variables. This type of data mining can take a link of words, i.e. a sentence or short phrase, and compare it to previous searches that have been performed in the past.
An example of association rule mining can be applied to many points of interest searches. If a user is searching Pismo Beach, CA then other associated phrases can come up to help a user find out more information about the point of interest. The data base can keep a log of all previous searches performed by other users. For example if one person was in search for the famous cinnamon rolls made in Pismo Beach, CA searching “Pismo Beach Cinnamon Rolls”, when the next user types just a portion of a phrase, “Pismo Beach, CA,” the search engine can convey that there are cinnamon rolls made in the same point of interest.
Anomaly Detection:
Anomaly detection is when an input variable is very dissimilar
from other variables (or events) contained in the data base. This
is a helpful tool to insure that only pertinent information is
included in search results.
Anomaly detection can be useful in the first pharmacology example
to insure that the only information relayed to the user is about
related drugs, rather than drugs associated with treating unrelated
symptoms.