In: Computer Science
Homework #1
Task: In this homework, you will perform a webscrape on the requested site and output the .csv in the format requested. In this task, you will create a python script that exports the data into a single .csv files. The definition of the csv output is defined below. For submission, you only need to submit your .py file. In honor of World of Warcraft 15 year anniversary, I have created a test website that contains several mobs (monsters) found within the game.
Submission Requirements:
Python file used to scrape and create the necessary .csv files
Site to scrape: http://drd.ba.ttu.edu/2019c/isqs6339/hw1/index.php
You also need to scrape the associated mobcards that are available on urls related to each mob
listed on the URL above.
◦ Your scraper should dynamically acquire those URLs to scrape further. i.e. If I added another mob, your scraper should work with no code changes.
Grading:
You will be graded based upon:◦ Quality of your code
▪ Note, I am not looking for the most efficient code. I am looking for code that is well documented (i.e. commented) and follows a logical progression. Your goal is to write code that another developer could pick up and know what you are doing.
◦ Adhering to best practices listed in the lectures
▪ Example: Variables that I can change to run your code in my environment. I should not have to look through your code for items to change. These should be listed at the top of the file.
◦ Correctly generating .csv files by the requested
standards
CSV Definitions: Your code should produce the following file with
fields in this order.
File #1:
id – *hint*, this is stored in the href. There are many ways to get this value, but I suggest an
explode on “=” might be useful, once you access the href.
quality
name
hp
level
elite – This should be encoded as 0 if normal, 1 if elite
damage
money_drop
drop_mask
*Hint* Your code should only scrape URLs that are visible (displayed on the website).
import requests from bs4 import BeautifulSoup url = 'http://drd.ba.ttu.edu/2019c/isqs6339/hw1/index.php' # Get the HTML contents of URL using requests module page = requests.get(url) # Create a BS4 Object using the html content # mention parser as HTML parser. soup = BeautifulSoup(page.text, 'html.parser') rows = soup.select('#mobindex a') links = [] for link in rows: if link.text != '': links.append(url.rsplit('/', 1)[0] + '/' + link['href']) f = open('output.csv', 'w') f.write('Name,Elite,Level,HP,DropMask,MoneyDrop,Damage\n') for link in links: page = requests.get(link) soup = BeautifulSoup(page.text, 'html.parser') values = soup.select('#mobcard .val') values = [v.text.strip() for v in values] if values[1] == 'normal': values[1] = '0' else: values[1] = '1' f.write(','.join(values) + '\n') f.close()
************************************************** Thanks for your question. We try our best to help you with detailed answers, But in any case, if you need any modification or have a query/issue with respect to above answer, Please ask that in the comment section. We will surely try to address your query ASAP and resolve the issue.
Please consider providing a thumbs up to this question if it helps you. by Doing that, You will help other students, who are facing similar issue.