Question

In: Computer Science

in python: (can use ntlk) Given the following documents: • Can we go to Disney??!!!!!! Let's...

in python: (can use ntlk)

Given the following documents:

• Can we go to Disney??!!!!!! Let's go on a plane!

• The New England Patriots won the Super Bowl..

• I HATE going to school so early

• When will I be considered an adult?

• I want to go to A&M, Baylor, or the University of Texas.

Conduct punctuation removal, stop word removal, casefolding, lemmatization, stemming on the documents.

Solutions

Expert Solution

#!/usr/bin/env python
# coding: utf-8

# In[4]:


from nltk.tokenize import word_tokenize
text = "Can we go to Disney Land???!!! Let's go on a plane!"
tokens = word_tokenize(text)
tokens = [x.lower() for x in tokens]
tokens


# In[5]:


import re
regex = re.compile(r'[\W\d]')
post_punctuation = []
for word in tokens:
    word = re.sub(regex,'',word)
    if len(word)>0:
        post_punctuation.append(word)
post_punctuation


# In[6]:


from nltk.corpus import stopwords
no_stop_words = []
for word in post_punctuation:
    if word not in stopwords.words('english'):
        no_stop_words.append(word)
no_stop_words


# In[7]:


from nltk.stem import WordNetLemmatizer
word_lem = WordNetLemmatizer()
for word in no_stop_words:
    word = word_lem.lemmatize(word)
no_stop_words


Related Solutions

how can we use cryptography libraries in Python
how can we use cryptography libraries in Python
Write a Python program for the following: A given author will use roughly use the same...
Write a Python program for the following: A given author will use roughly use the same proportion of, say, four-letter words in something she writes this year as she did in whatever she wrote last year. The same holds true for words of any length. BUT, the proportion of four-letter words that Author A consistently uses will very likely be different than the proportion of four-letter words that Author B uses. Theoretically, then, authorship controversies can sometimes be resolved by...
Accounting involves the use of “Source Documents”.  Which of the following are not examples of source documents...
Accounting involves the use of “Source Documents”.  Which of the following are not examples of source documents for financial accounting purposes:    A. Invoices B. Sales Receipts C. Month End Inventory listing D. Daily Deposits E. Payroll check stubs F. Storeroom Acquisitions
Find out how can we use cryptography libraries in Python. Write down the steps to install...
Find out how can we use cryptography libraries in Python. Write down the steps to install cryptography library in Python.
Let's say for a chlorophyll solution we found the absorbance, and use d that with Beer's...
Let's say for a chlorophyll solution we found the absorbance, and use d that with Beer's law to find the concentration. How come that concentration is different if we find the concentration of the same chlorophyll solution using the fluorescence?
Let's say that we use 16 mice in a study that seeks a relationship between the...
Let's say that we use 16 mice in a study that seeks a relationship between the concentration (x) of a toxic substance and the mortality time of mice (y), and the findings given in the table below are obtained. Is there a relationship between concentration and death time? What is the direction and strength of the relationship? Find it with correlation analysis. Concentration (microgram) Death Time (minutes) 20 20 25 18 30 18 35 16 40 17 45 16 50...
Please use Python 21. Rock, Paper, Scissors Game Write a program that let's the user play...
Please use Python 21. Rock, Paper, Scissors Game Write a program that let's the user play the game of rock, paper, scissors against the computer. The program should work as follows: 1. When the program begins, a random number in the range of 1 through 3 is generated. If the number is 1, then the computer has chosen rock. If the number is 2, then the computer has chosen paper. If the number is 3, then the computer has chosen...
Regular expressions are used in Python for describing and identifying specific patterns. In Python, we use...
Regular expressions are used in Python for describing and identifying specific patterns. In Python, we use “re” or “Regex” to denote regular expressions. Write the Python Code to return matching analytics word in the following given text. Write the Python Code to return how many times analytics word is provided in this text. Definitions are useful to the extent they serve a purpose. So, is defining analytics important? Yes and no. It’s not likely that we’ll ever arrive at a...
Let's see if we can apply some of this research stuff to our lives. For this...
Let's see if we can apply some of this research stuff to our lives. For this post, check your original Introduction post, take one piece of personal information (e.g., favorite show, hobbies, current/future employment), and create a scientific procedure by following the criteria/questions below: 1. What is your research question? For example, How does exercising improve happiness, or how does exercising improves work efficiency. Pick something related to the examples. 2. Write your hypothesis. Here is where you are taking...
Let's use the Bohr model equations to explore some properties of the hydrogen atom. We will...
Let's use the Bohr model equations to explore some properties of the hydrogen atom. We will determine the kinetic, potential, and total energies of the hydrogen atom in the n=2 state, and find the wavelength of the photon emitted in the transition n=2?n=1. Find the wavelength for the transition n=5 ? n=4 for singly ionized helium, which has one electron and a nuclear charge of 2e. (Note that the value of the Rydberg constant is four times as great as...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT