Question

In: Computer Science

Choose a table on a web page (use your discretion) and scrape the contents to a...

Choose a table on a web page (use your discretion) and scrape the contents to a csv file.

The table selected must have at least 3 numerical columns and at least 15 rows.

You must produce the following results for at least 2 of those columns:

● count - number of values in the column

● mean - average of all values in the column

● std - standard deviation of the column

● min - minimum value in the column

● 25% - percentile value in the column

● 50% - percentile value in the column

● 75% - percentile value in the column

● max - maximum value in the column

Deliverables :

- A file containing your python code.

- The csv file

Solutions

Expert Solution

code-

import requests
import pandas as pd
import csv
import urllib.request
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/List_of_United_States_cities_by_area'
response = requests.get(url)
print(response.status_code)

soup = BeautifulSoup(response.text,"html.parser")
table = soup.findAll('table',{"class":"wikitable"})[0]
tr = table.findAll(['tr'])[0:151]
csvFile = open("cities0fUSA.csv",'wt',newline='', encoding='utf-8')
writer = csv.writer(csvFile)
try:   
for cell in tr:
th = cell.find_all('th')
th_data = [col.text.strip('\n') for col in th]
td = cell.find_all('td')
row = [i.text.replace('\n','') for i in td]
writer.writerow(th_data+row)
  
finally:   
csvFile.close()

df=pd.read_csv("C:\\Users\\Preeti\\Downloads\\cities0fUSA.csv")

df["Population (2010)"]=df["Population (2010)"].str.replace(",","")
df["Land area (km2)"]=df["Land area (km2)"].str.replace(",","")

df["Population (2010)"]=df["Population (2010)"].astype(int)
df["Land area (km2)"]=df["Land area (km2)"].astype(int)

df["Population (2010)"].describe()

df["Land area (km2)"].describe()

Explanation :

import requests
import pandas as pd
import csv
import urllib.request
from bs4 import BeautifulSoup

We are using python libraries BeautifulSoups,Request and pandas . So make sure you have them install in your system.

To Install Beautiful Soup : pip install beautifulsoup4

url = 'https://en.wikipedia.org/wiki/List_of_United_States_cities_by_area'
response = requests.get(url)
print(response.status_code)

Here , we are fetching the URL from wikipedia contatining list of usa states and cities by area.You can visit the page to check the web table.since it is not legal to scrape the data,hence we must always check the response code . if its 200 mean you can go ahead and download it.

soup = BeautifulSoup(response.text,"html.parser")
table = soup.findAll('table',{"class":"wikitable"})[0]

Here we create a BeautifulSoup object and using the findAll() method extract the relevant information,which in our case is the <table> tag.Since there can be many tables in wikipedia page hence we specify the class attribute(which you can get by right clicking and select inspect)

tr = table.findAll(['tr'])[0:151]

Here , we specify the number of rows that we want to extract from that table[0:151]

csvFile = open("cities0fUSA.csv",'wt',newline='', encoding='utf-8')
writer = csv.writer(csvFile)
try:   
for cell in tr:
th = cell.find_all('th')
th_data = [col.text.strip('\n') for col in th]
td = cell.find_all('td')
row = [i.text.replace('\n','') for i in td]
writer.writerow(th_data+row)
  
finally:   
csvFile.close()

We need to store the data in a .csv format. So get the the headers from the table using <th> tag and data in the columns using <td> tag. Then write the fetched data to a file called cities0fUSA.csv with options ‘wt’, newline =’ ‘ is used to avoid any newline between two rows.

The above code will create a csv file in your downloads folder with name cities0fUSA.csv

df=pd.read_csv("C:\\Users\\Preeti\\Downloads\\cities0fUSA.csv")

Above code will read csv file and convert it to dataframe using pandas library . You need to specify the path where your csv file created is stored.In my case , it was in downloads so i have mentioned that path.

you can also use df.head() to see the top 5 rows.

df["Population (2010)"]=df["Population (2010)"].str.replace(",","")
df["Land area (km2)"]=df["Land area (km2)"].str.replace(",","")

since we need to find count,min,max of values in columns .column must have numeric values and without any special characters hence we use replace function to remove all commas

df["Population (2010)"]=df["Population (2010)"].astype(int)
df["Land area (km2)"]=df["Land area (km2)"].astype(int)

we then convert it all the values of the 2 columns Population (2010) and Land area (km2) to integers using astype(int)

df["Population (2010)"].describe()

df["Land area (km2)"].describe()

we finally use describe function on those 2 columns to get count.min,max,median,25th,75th percentile etc


Related Solutions

1 -​Create the code to generate a default web page. The contents of this page should...
1 -​Create the code to generate a default web page. The contents of this page should be Your​​Name, right justified
*tables and figures are listed by page number and are included in the table of contents...
*tables and figures are listed by page number and are included in the table of contents TRUE OR FALSE *WHICH OF THE FOLLOWING IS CORRECT IN TERMS OF APA NUMBER STYLE? a) 1950's b) ten-thousand c) 10 and 20's d) both a and c *when quoting, always provide the author's names, year, complete reference in the reference list, and___________ a) month of publication B) chapter number c)specific page citation d) all the above *which of the following sentences is the...
2 -​Create the code to generate a web page that contains a table. The table should...
2 -​Create the code to generate a web page that contains a table. The table should have 4 ​columns and 4 rows. Each cell should be 100 pixels wide by 100 pixels high. The border ​should be 2 ​pixels.
Choose a software system (or a web page) that you think will benefit the users if...
Choose a software system (or a web page) that you think will benefit the users if it employed the principles of direct manipulation. Illustrate some of the operations under the direct manipulation scheme
You have been asked to design a web page for a local business or organization (choose...
You have been asked to design a web page for a local business or organization (choose one from your local community). Go to www.flickr.com or www.1StopPictures.com and choose a visual to illustrate the website by trying to match the personality of the organization to a visual image. Then, identify the primary categories of information that need to be included on the page. Develop a flow chart or map that shows how a typical user would navigate through the site. What...
use CSS, java and HTML to make a charity admin page. It's a web page after...
use CSS, java and HTML to make a charity admin page. It's a web page after you log in to your account. I can have personal information, the amount of money donated, the children who have received charity, and some blogs. In fact, all the things are what I want. But you are free to do whatever You like, even if you don't say what I say. I just need a charity Admin page for charity User.Don't ask me for...
use CSS, java and HTML to make a charity admin page. It's a web page after...
use CSS, java and HTML to make a charity admin page. It's a web page after you log in to your account. I can have personal information, the amount of money donated, the children who have received charity, and some blogs. In fact, all the things are what I want. But you are free to do whatever You like, even if you don't say what I say. I just need a charity Admin page for charity User.Don't ask me for...
use CSS, java and HTML to make a charity admin page. It's a web page after...
use CSS, java and HTML to make a charity admin page. It's a web page after you log in to your account. I can have personal information, the amount of money donated, the children who have received charity, and some blogs. In fact, all the things are what I want. But you are free to do whatever You like, even if you don't say what I say. I just need a charity Admin page for charity User.Don't ask me for...
use CSS, java and HTML to make a charity admin page. It's a web page after...
use CSS, java and HTML to make a charity admin page. It's a web page after you log in to your account. I can have personal information, the amount of money donated, the children who have received charity, and some blogs. In fact, all the things are what I want. But you are free to do whatever You like, even if you don't say what I say. I just need a charity Admin page for charity User.Don't ask me for...
use CSS, java and HTML to make a charity admin page. It's a web page after...
use CSS, java and HTML to make a charity admin page. It's a web page after you log in to your account. I can have personal information, the amount of money donated, the children who have received charity, and some blogs. In fact, all the things are what I want. But you are free to do whatever You like, even if you don't say what I say. I just need a charity Admin page for charity User.Don't ask me for...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT