In: Computer Science
Put theory into practice by performing stopwords removal and text processing in Python using the popular NLTK library.
1:Obtain number of lines from the above text and make each line to store in an element of array؟
This is my first program.
. We are the students of Master’s in Data Science, Hello! How are you?
We are here to learn the python script, this is it now.
To get the number of lines and store each sentence in an array as different array elements, we can use Sentence tokenization which is available as a sent_tokenize module in the NLTK library.
The code for counting the number of sentences from the given string is given below-
If you are using NLTK tokenization for the first time, use this script to install the tokenization package before executing the program-
import nltk
nltk.download('punkt')
This will download the package and save it for further usage.
The code for tokenization is -
from nltk.tokenize import sent_tokenize
text = "This is my first program. We are the students of Master’s in Data Science, Hello! How are you? We are here to learn the python script, this is it now."
sentences = sent_tokenize(text) #Save sentences as array elements
numSentences = len(sentences) #Count number of sentences
print(sentences)
print(numSentences)
OUTPUT -