In: Computer Science
In C++
For this assignment, you will write a program to count the number of times the words in an input text file occur.
The WordCount Structure
Define a C++ struct called WordCount that contains the following data members:
Functions
Write the following functions:
int main(int argc, char* argv[])
This function should declare an array of 200 WordCount objects and an integer numWords to track the number of array elements that are in use.
If no input file name is specified as a program argument when the program is run, then argc will be equal to 1. If so, print an error message similar to the following and exit the program:
Usage: assign1 [file-name]
Call the function countWords() passing argv[1] as the file name parameter. Store the value returned by the function in numWords.
Call the function sortWords() to sort the array.
Print a header line as shown in the sample output, then call printWords() to print the words and their counts.
int countWords(const char* fileName, WordCount wordArray[])
Parameters: 1) A C string that will not be changed by the function and that contains the name of an input file; 2) an array of WordCount objects.
Returns: The number of distinct words that the function stored in the array (i.e., the number of array elements filled with valid data).
This function should declare a file stream variable and open it for the file name passed in as the first parameter. If the file fails to open successfully, print an error message and exit the program.
Declare an integer numWords to keep track of the number of distinct words stored in the array of of WordCount objects. This variable should be initialized to 0.
The function should then read words from the file as C strings using the >> operator until end-of-file is reached. For each word read, the function should do the following:
Once all words have been read from the file, the file should be closed and numWords should be returned.
void stripPunctuation(char* s)
Parameters: 1) A C string that contains a word to be stripped of punctuation.
Returns: Nothing.
This function should remove any punctuation characters at the beginning and end of the C string s. For example:
It is possible (although rare) for a string to contain nothing but punctuation characters. In that case, the result of executing this function should be an empty string.
There are a number of valid approaches to solving this problem.
You will need to be able to distinguish between punctuation characters and non-punctuation (or alphanumeric) characters; the C library functions isalnum() and ispunct() can help you do that.
Performing the required modifications to the string "in place" may be difficult, so feel free to use a local temporary character array to make your changes and then copy the final result back into s at the end of the function.
void stringToUpper(char* s)
Parameters: 1) A C string that contains a word to be converted to uppercase.
Returns: Nothing.
This function should loop through the characters of the C string s and convert them to uppercase using the C library function toupper().
int searchForWord(const char* word, const WordCount wordArray[], int numWords)
Parameters: 1) A C string that will not be changed by this function and that contains a word to search for; 2) an array of WordCount objects to search that will not be changed by this function; 3) the number of elements in the array filled with valid data.
Returns: If the search was successful, returns the index of the array element that contains the word that was searched for, or -1 if the search fails.
This function should use the linear search algorithm to search for the C string word in wordArray.
void sortWords(WordCount wordArray[], int numWords)
Parameters: 1) An array of WordCount objects to sort; 2) the number of elements in the array filled with valid data.
Returns: Nothing.
This function should sort the array of WordCount objects in ascending order by account number using the selection sort algorithm.
The sort code linked to above sorts an array of integers called numbers of size size. You will need to make a number of changes to that code to make it work in this program:
Change the parameters for the function to those described above.
In the function body, change the data type of temp to WordCount. This temporary storage will be used to swap elements of the array of WordCount objects.
In the function body, change any occurrence of numbers to the name of your array of WordCount objects and size to numWords (or whatever you called the variable that tracks the number of array elements filled with valid data.
The comparison of numbers[j] and numbers[min] will need to use the C string library function strcmp() to perform the comparison. The final version of the if condition should look something like this:
if (strcmp(wordArray[j].word, wordArray[min].word) < 0) ...
It is legal to assign one WordCount object to another; you don't need to write code to copy individual data members.
void printWords(const WordCount wordArray[], int numWords)
Parameters: 1) An array of WordCount objects to print that will not be changed by this function; 2) the number of elements in the array filled with valid data.
Returns: Nothing.
This function should loop through the array and print each word and its corresponding count, neatly formatted into columns similar to the sample output. It should also print the number of words in the file (which is equal to the sum of the counts) and the number of distinct words (equal to numWords).
Text File:
Text Files - A Brief Description (Wikipedia, 2019) A text file (sometimes spelled "textfile"; an old alternative name is "flatfile") is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. The end of a text file is often denoted by placing one or more special characters, known as an end-of-file marker, after the last line in a text file. Such markers were required under the CP/M and MS-DOS operating systems. On modern operating systems such as Windows and Unix-like systems, text files do not contain any special EOF character. "Text file" refers to a type of container, while plain text refers to a type of content. Text files can contain plain text, but they are not limited to such. At a generic level of description, there are two kinds of computer files: text files and binary files.
Save following code in file assign1.cpp
#include <iostream>
#include <string.h>
#include <fstream>
using namespace std;
//structure defintion
struct WordCount
{
//two data members
char word[31];
int count;
};
//Function to convert string upper case
void stringToUpper(char *s)
{
for(int i=0;s[i]!='\0';i++) //repeat until end of
string char null
{
if(*(s+i)>=97 &&
*(s+i)<=122) //when lower case letter
*(s+i)=*(s+i)-32;//convert to upper case by subtraction of ascii
with 32
}
}
//Function striping punctuation characters
void stripPunctuation(char *s)
{
int pos=0;
for (char *p = s; *p; ++p) //repeat all
characters
if (isalpha(*p)) //when
alphabet
s[pos++] = *p;
//then concate to s
s[pos] = '\0'; //end with null
}
//Function search word
int searchForWord(const char* word, const WordCount wordArray[],int
numWords)
{
for(int i=0;i<numWords;i++) //repeat loop
if(strcmp(wordArray[i].word,word)==0) //when matched
return
i;//return its index
return -1;//return -1, when fails
}
//Function for counting words
int countWords(const char* fileName,WordCount wordArray[])
{
ifstream infile(fileName);//open file for
reading
int count=0;//initially count is 0
char word[31];//take max length word
infile>>word;//read the word
while(!infile.eof()) //repeat until end of file
{
stringToUpper(word);//convert to
upper case
stripPunctuation(word);//remove
punctuation characters
int found=searchForWord(word,
wordArray,count);//search word
if(found==-1) //when not
found
{
strcpy(wordArray[count].word,word);//store as new word
wordArray[count].count=1;//first count is 1
count++;//increment count
}
else //when found
wordArray[found].count+=1;//increment existing count
infile>>word;//read the next
words
}
infile.close();//close the file
return count;//finally return count
}
//Function to sort words
void sortWords(WordCount wordArray[],int numWords)
{
for(int i=0;i<numWords-1;i++) //outer loop
for(int j=0;j<numWords-i-1;j++)
//inner loop
{
if(strcmp(wordArray[j].word,wordArray[j+1].word)==0)
//compare
{
//swapping process
WordCount temp=wordArray[j];
wordArray[j]=wordArray[j+1];
wordArray[j+1]=temp;
}
}
}
//Function prints all words and their frequency
void printWords(const WordCount wordArray[],int numWords)
{
cout<<"Word\tFrequency"<<endl;
for(int i=0;i<numWords;i++) //repeat loop
{
cout<<wordArray[i].word<<"\t"<<wordArray[i].count<<endl;
}
}
//main with command line arguments
int main(int argc,char* argv[])
{
WordCount wordcount[200];//declare maximum 200
words
if(argc!=2) //when count of args are not 2
{
cout<<"\nUsage : assign1
[file-name]"<<endl;//print error message
return 1; //return error code
1
}
else //when passed exactly two args
{
int
total=countWords(argv[1],wordcount);//read to array
printWords(wordcount,total);//call
print function
}
return 0;
}
In file text1.txt save the sample text as
Text Files - A Brief Description (Wikipedia, 2019)
A text file (sometimes spelled "textfile"; an old alternative name is "flatfile") is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. The end of a text file is often denoted by placing one or more special characters, known as an end-of-file marker, after the last line in a text file. Such markers were required under the CP/M and MS-DOS operating systems. On modern operating systems such as Windows and Unix-like systems, text files do not contain any special EOF character.
"Text file" refers to a type of container, while plain text refers to a type of content. Text files can contain plain text, but they are not limited to such.
At a generic level of description, there are two kinds of
computer files: text files and binary files.
As you execute the file you will get output as below.
Word Frequency
TEXT 12
FILES 6
A 11
BRIEF 1
DESCRIPTION 2
WIKIPEDIA 1
FILE 7
SOMETIMES 1
SPELLED 1
TEXTFILE 1
AN 2
OLD 1
ALTERNATIVE 1
NAME 1
IS 4
FLATFILE 1
KIND 1
OF 8
COMPUTER 3
THAT 1
STRUCTURED 1
AS 4
SEQUENCE 1
LINES 1
ELECTRONIC 1
EXISTS 1
STORED 1
DATA 1
WITHIN 1
SYSTEM 1
THE 3
END 1
OFTEN 1
DENOTED 1
BY 1
PLACING 1
ONE 1
OR 1
MORE 1
SPECIAL 2
CHARACTERS 1
KNOWN 1
ENDOFFILE 1
MARKER 1
AFTER 1
LAST 1
LINE 1
IN 1
SUCH 3
MARKERS 1
WERE 1
REQUIRED 1
UNDER 1
CPM 1
AND 3
MSDOS 1
OPERATING 2
SYSTEMS 3
ON 1
MODERN 1
WINDOWS 1
UNIXLIKE 1
DO 1
NOT 2
CONTAIN 2
ANY 1
EOF 1
CHARACTER 1
REFERS 2
TO 3
TYPE 2
CONTAINER 1
WHILE 1
PLAIN 2
CONTENT 1
CAN 1
BUT 1
THEY 1
ARE 2
LIMITED 1
AT 1
GENERIC 1
LEVEL 1
THERE 1
TWO 1
KINDS 1
BINARY 1