In: Computer Science
a. What is the main cue for splitting English documents into sentences? When and how might it go wrong? ( python coding required.)
b. What is the main cue in English for splitting sentences into tokens? When and how might it go wrong? (python coding required.)
a.) Below is the python program to split the text file into sentence but this may raise a problem is there is no \n new line character found than the file will be shown as it is we can also use "." but it may also arise the same problem so to avoid this we must have to read and print the data of the file.
# suppose we have a file called filenew.txt, lets see:-
f=open("filenew.txt").read() # reads file
sentence=f.split("\n") # split into sentences or we can use sentence=f.split(".")
print(sentence) # print each sentence that will be splited after the new line character \n
b.) This may arrise a situation if there will is no space in the file or the file contains only commas to seprate the words so we can print out the file data to check how we can split it into the tokens.
# suppose we have a file called filenew.txt, lets see:-
f=open("filenew.txt").read() # reads file
words=f.split(" ") # split into words(tokens) and characters
print(words)