In: Computer Science
This is C++
Create a program that reads an HTML file and converts it to plain text.
Console:
HTML Converter
Grocery List
* Eggs
* Milk
* Butter
Specifications:
The HTML file named groceries.html contains these HTML tags:
<h1>Grocery List</h1>
<ul>
<li>Eggs</li>
<li>Milk</li>
<li>Butter</li>
</ul>
When the program starts, it should read the contents of the file,
remove the HTML tags, remove any spaces to the left of the tags,
add asterisks (*) before the list items, and display the content
and the HTML tags on the console as shown above.
Note
The groceries.html file ends each line with a carriage return
character (‘\r’), not a new line character (‘\n’). To account for
that, when you read the file, you can use the third parameter of
the getline() function to specify the end of the line. For more
information, you can search online and check the documentation of
the getline() function.
Here is the solution,
my platform : Windows
IDE : Code Blocks
please keep your groceries.html file in the same folder along with the program
the program runs with any basic html file(code)
// your code starts here
#include <iostream>
#include<fstream>
using namespace std;
int main()
{
string line;
fstream myfile;
char array[5000]; // declaring an array for storing the plain
text
// opening the file "groceries.html". please keep the file in
the same folder along with the program
myfile.open("groceries.html", ios::in);
if (!myfile)
cout << "file cannot open!";
//-----------------------------------------
int k=0; // variable to store characters in the array
// reading line by line using the getline function
while (getline(myfile, line,'\r'))
{
// reading every character of every line
for(unsigned int i = 0; i<line.length(); i++) {
// checking for the character '<'
if(line[i]=='<' ) {
// check if the charcter 'L' and 'I' follows after the '<'
// checking for i+2 < line.length , in case it goes out of
index
if((i+2) < line.length()){
if((line[i+1]=='l' || line[i+1]=='L' ) && (line[i+2]=='i'
|| line[i+2]=='I')) {
//if condition is satisfied store the '*' character in the
array.
array[k++]='*';
}
}
// skip till the '>' character is encountered
while((i++) < line.length() && line[i]!='>'); // just
a single line . there is block for this while loop
}
// check if the next character is '<' if so don't store it in
the array
// also checking for (i+1) index in case it goes out of index
if((i+1)< line.length() && line[i+1]!='<')
array[k++]=line[i+1];
}
cout << '\n';
}
array[k] ='\0';
//------------------------------------------------
k=0; // assigning k to zero again
// display the contents of the array.
while(array[k]!='\0') {
cout<<array[k++];
}
return 0;
}
// code ends here
The code is running sucessfully and producing perfect output
my groceris.html file
here is the output
you can see that at the end it has also displayed the text "this is a header" .
i had added one more tag at the end of the html file(see the screenshot) for testing purpose.
you can use any basic html file and it will convert it into plain text.
also my max array size is 5000 characters.
Thank You.