Question

In: Computer Science

Use Python (hint: Beautiful Soup, Selenium, or Scrapy) to web scrape (clean and parse) an HTML...

Use Python (hint: Beautiful Soup, Selenium, or Scrapy) to web scrape (clean and parse) an HTML data site. You can also use other modules or libraries to clean and manipulate the data. Identify and explain any inconsistencies in the dataset.

The website link : https://www.iii.org/table-archive/23284

Get the wildfire tables from only 2019 and 2020 from the website.

Solutions

Expert Solution

Structure of my answer.

1. images of code

2. code

3. explaination

Images of Code

Output : of the table

Explanation:

1. the webiste has devided the data into two sections current and archives, where the current section holds the data of the current year. (2019) (actually the previous year) and the archies holds data from 2010 - 2018

steps :

1. fist we get the parse the html page using beautiful soup ,

2. get the div elements

3. then split the div elements into current and archives

from  bs4 import BeautifulSoup
import requests 
import pandas as pd 
import re

url = "https://www.iii.org/table-archive/23284"

wild_fire = requests.get(url)

site = BeautifulSoup(wild_fire.content,"html5lib")

divs = site.find_all('div',class_ = "view-content")

current,archives = divs[0],divs[1]

Extracting the Current table data (2019)

def parse_table(table):
    
    cols = ['State','Number of Fires','Number of acres burned']


    data = []
    count = 1
    index = 0


    for i in table.find_all('td'):

        if count == 1:
            col1 = i.get_text()

        elif count == 2:
            col2 = i.get_text()

        elif count == 3:
            col3 = i.get_text()

            data.append([col1,col2,col3])


            count = 0

        count += 1

    df = pd.DataFrame(data,columns = cols)
    df.set_index('State',inplace = True)
    
    
    return df

    
    
df = parse_table(current.find_all("table")[-1])
df.head()    
    

Extracting the table data from the archives

One can extract any table data by just changing the year_toget var

archives = list(archives.children)

year_toget = 2018

for table in archives: 
    
    res = table.find('span')
    if res != -1 and res:
        
        years = re.findall("\d{4}",res.get_text())
        
        if len(years) == 1:          
            year = int(years[0])
            
        if year_toget == year:
            df = parse_table(table.find_all('table')[-1])
         

df

Output:

I have extracted 2019 and 2018 year tables since 2020 was not available. When it becomes available then 2020 tabel will be in the current section and 2019 in the archived section hence the process to extract data will not change .

You my answer helps then upvote!!


Related Solutions

Using Python read dataset in the HTML in beautiful way. You need to read CSV file...
Using Python read dataset in the HTML in beautiful way. You need to read CSV file ( Use any for example, You can use small dataset) You need to use pandas library You need to use Flask Make search table like YouTube has.
Develop a personal web page for yourself using HTML, CSS, and Javascript Use the following HTML...
Develop a personal web page for yourself using HTML, CSS, and Javascript Use the following HTML tags to design your webpage: <h1>...</h1>,<h3>...</h3>, <h6>...</h6>, <p>...</p>, <b>...</b>, <i>...</i>, <a>...</a>, <img...>, <table>... </table>, <div>...</div>, <form>...</form>, <input type="text">, and <input type= "submit"> Use an external css to change the default style of your webpage. You must use at least one element selector, one id selector, and one class selector Using text input and submit button, allow the user to change the background color of...
Make a modest or simple Web page using Python flask. The basic components of HTML should...
Make a modest or simple Web page using Python flask. The basic components of HTML should be included. The Web page should have at least 3 Headings(<h1>), paragraph (<p>), comments (<!-- -->), ordered list, unordered list, three links to website, and should display time & date. Example: <html>     <head>         <title>Page Title</title>     </head> <body>     ..new page content.. </body> </html>
Generate a modest Web page via Python flask. It should include basic components of HTML. The...
Generate a modest Web page via Python flask. It should include basic components of HTML. The Web page should have at least three Headings(<h1>), a paragraph (<p>), comments (<!-- -->), ordered list, unordered list, three links to website, and should display time & date.
Write an HTML file for a web page that contains the items below. Use an internal...
Write an HTML file for a web page that contains the items below. Use an internal style sheet to specify all fonts, sizes, colors, and any other aspects of the presentation. Your page should contain the following items: 1) A header with white text on dark green background (just for the header, not the entire page), in Impact font, bold, and centered. 2) Two paragraphs of text, each with dark gray text in Tahoma font, on a light blue background,...
use CSS, java and HTML to make a charity admin page. It's a web page after...
use CSS, java and HTML to make a charity admin page. It's a web page after you log in to your account. I can have personal information, the amount of money donated, the children who have received charity, and some blogs. In fact, all the things are what I want. But you are free to do whatever You like, even if you don't say what I say. I just need a charity Admin page for charity User.Don't ask me for...
use CSS, java and HTML to make a charity admin page. It's a web page after...
use CSS, java and HTML to make a charity admin page. It's a web page after you log in to your account. I can have personal information, the amount of money donated, the children who have received charity, and some blogs. In fact, all the things are what I want. But you are free to do whatever You like, even if you don't say what I say. I just need a charity Admin page for charity User.Don't ask me for...
use CSS, java and HTML to make a charity admin page. It's a web page after...
use CSS, java and HTML to make a charity admin page. It's a web page after you log in to your account. I can have personal information, the amount of money donated, the children who have received charity, and some blogs. In fact, all the things are what I want. But you are free to do whatever You like, even if you don't say what I say. I just need a charity Admin page for charity User.Don't ask me for...
use CSS, java and HTML to make a charity admin page. It's a web page after...
use CSS, java and HTML to make a charity admin page. It's a web page after you log in to your account. I can have personal information, the amount of money donated, the children who have received charity, and some blogs. In fact, all the things are what I want. But you are free to do whatever You like, even if you don't say what I say. I just need a charity Admin page for charity User.Don't ask me for...
use CSS, java and HTML to make a charity admin page. It's a web page after...
use CSS, java and HTML to make a charity admin page. It's a web page after you log in to your account. I can have personal information, the amount of money donated, the children who have received charity, and some blogs. In fact, all the things are what I want. But you are free to do whatever You like, even if you don't say what I say. I just need a charity Admin page for charity User.Don't ask me for...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT