In: Computer Science
a.What is the value of performing text analysis? How do companies benefit from this exercise?
(b) What are three challenges to performing text analysis?
(c) In your own words, discuss the text analysis steps (i.e., parsing, search and retrieval, and text mining).
(d) What are three major takeaways from text analysis?
Need 600 words
a. ) text analysis (also called text data mining ) is a method for extracting useful information from unstructured data through the identification and exploration of large amounts of text. Or, text mining is a method for extracting structured information from unstructured text.
Companies applies these text mining techniques such as categorization, entity extraction, sentiment analysis and natural language processing to transform text into data that can be used for further analysis. Applied to a corpus or body of information, text mining can be used to make large quantities of unstructured data accessible and useful by extracting useful information and knowledge hidden in text content and revealing patterns, trends and insight in large amounts of information.
b).major 3 challenges are
1.Establish a Contextualizing Data Structure
2.Achieve Semantic Disambiguation and Decoding of Textual Content
3.Promote Data Quality and Veracity
c.) 7 basic steps involved in preparing an unstructured text document for deeper analysis:
Each step is achieved on a spectrum between pure machine learning and pure software rules. Let’s review each step in order, and discuss the contributions of machine learning and rules-based NLP.
1. Language Identification
The first step in text analytics is identifying what language the text is written in. Spanish? Singlish? Arabic? Each language has its own idiosyncrasies, so it’s important to know what we’re dealing with.
2. Tokenization
Now that we know what language the text is in, we can break it up into pieces. Tokenization is the process of breaking a piece of text apart into pieces that a machine can understand.
We use the term “tokens”, and not “words”, because as well as being words, tokens can also be things like:
Tokenization is language-specific, and each language has its own tokenization requirements. English, for example, uses white space and punctuation to denote tokens, and is relatively simple to tokenize.
3. Sentence Breaking
Point is, before you can run deeper text analytics functions (such as syntax parsing), you must be able to tell where the boundaries are on a sentence. Sometimes it’s a simple process, and other times… not so much.
Certain communication channels <cough> Twitter <cough> are particularly complicated to break down. We have ways of sentence breaking for social media, but we’ll leave that aside for now.
4. Part of Speech Tagging
Part of Speech tagging (or PoS tagging) is the process of determining the part of speech of every token in a document, and then tagging it as such.
5. Chunking
Let’s move on to the text analytics function known as Chunking (a few people call it light parsing, but we don’t). Chunking refers to a range of sentence-breaking systems that splinter a sentence into its component phrases (noun phrases, verb phrases, and so on).
Before we move forward, I want to draw a quick distinction between Chunking and Part of Speech tagging in text analytics.
6. Syntax Parsing
The syntax parsing sub-function is a way to determine the structure of a sentence. In truth, syntax parsing is really just fancy talk for sentence diagraming. But it’s a critical preparatory step in sentiment analysis and other natural language processing features.
7. Sentence Chaining
The final step in preparing unstructured text for deeper analysis is sentence chaining, sometimes known as sentence relation.
Lexalytics utilizes a technique called “lexical chaining” to connect related sentences. Lexical chaining links individual sentences by each sentence’s strength of association to an overall topic.
d.)
1: Many marketers use gut instincts, not data.
Forty percent of marketers say that they use intuition to make decisions. That means a lot of businesses are rolling the dice on their marketing plans. Guessing wrong wastes time and resources, and many of these marketers aren’t measuring the outcomes of their decisions.
2.
Analytics help marketers calculate the actual value of their efforts.
One of the biggest complaints that digital marketers have is how difficult it is to prove that their campaigns are successful. Putting a dollar value on social media campaigns or content downloads can be a challenge.
3.Analytics help reveal who customers are and what they want.
Marketers can use Google Analytics to better understand their customer behavior. Tracking customer paths and running behavior reports helps you learn more about what people are doing on your website.