In: Computer Science
how the stemming and Lemmatization affect a ration of precision and recall with examples ? {does increase or decrease]
Stemming and Lemmatization are the terms from the information retrieval area of computer science. Information retrieval is a huge area where the engineers are working hard to retrieve the relevant accurate data from the very big data forming every day.
Recall measures the amount of data classified or retrieved and precision is the relevancy of the data.
Stemming is the process obtaining the relevant part of the data and removing the inflated part. This will increase the recall feature. One can increase recall by matching words against their other inflected parts when a data is rerieved using a query. For example, Porter stemmer is a widely used stemming technique to remove the inflations from the application which involves natural language processing.
If stemming is increased, the amount of data retrieved will increase because it search within all related documents and fetch them all. For example if one search "green apple", the stemming process will seperate the words and fetch datas. This will increase the amount of data and there by recall.
Lemmatization is also a technique used to remove the inflations from the retrived data. Lemmatization reduces inflation by properly ensuring that the root of the word searching is actually existing in the language. Llemmatization relies more on lexical knowledge base to improve the precision. Word Net is a knowledge base where plenty of correct datas are stored. So when a query is executed, the lemmatization process convert it into more meaningful way to reduce inflations. This will result in increased precision of the retrieved data. For example Lemmatization converts the word "feelness" into "feelings" in the search query to improve the search results.
When lemmatization is incresed, the meaning of query is concidered and the data corely related only are retrieved. This increases the precision. For example, when one search "green apple", the documents with "green apple" and the data with same meaning only are retrieved.