Question

In: Computer Science

Homework #1 Task: In this homework, you will perform a webscrape on the requested site and...

Homework #1

Task: In this homework, you will perform a webscrape on the requested site and output the .csv in the format requested. In this task, you will create a python script that exports the data into a single .csv files. The definition of the csv output is defined below. For submission, you only need to submit your .py file. In honor of World of Warcraft 15 year anniversary, I have created a test website that contains several mobs (monsters) found within the game.

Submission Requirements:

  •  Python file used to scrape and create the necessary .csv files

  •  Site to scrape: http://drd.ba.ttu.edu/2019c/isqs6339/hw1/index.php

  •  You also need to scrape the associated mobcards that are available on urls related to each mob

    listed on the URL above.

◦ Your scraper should dynamically acquire those URLs to scrape further. i.e. If I added another mob, your scraper should work with no code changes.

Grading:

 You will be graded based upon:◦ Quality of your code

▪ Note, I am not looking for the most efficient code. I am looking for code that is well documented (i.e. commented) and follows a logical progression. Your goal is to write code that another developer could pick up and know what you are doing.

◦ Adhering to best practices listed in the lectures

▪ Example: Variables that I can change to run your code in my environment. I should not have to look through your code for items to change. These should be listed at the top of the file.

◦ Correctly generating .csv files by the requested standards
CSV Definitions: Your code should produce the following file with fields in this order.

 File #1:

  • id – *hint*, this is stored in the href. There are many ways to get this value, but I suggest an

    explode on “=” might be useful, once you access the href.

  • quality

  • name

  • hp

  • level

  • elite – This should be encoded as 0 if normal, 1 if elite

  • damage

  • money_drop

  • drop_mask

    *Hint* Your code should only scrape URLs that are visible (displayed on the website).

Solutions

Expert Solution

import requests
from bs4 import BeautifulSoup

url = 'http://drd.ba.ttu.edu/2019c/isqs6339/hw1/index.php'

# Get the HTML contents of URL using requests module
page = requests.get(url)

# Create a BS4 Object using the html content
# mention parser as HTML parser.
soup = BeautifulSoup(page.text, 'html.parser')
rows = soup.select('#mobindex a')

links = []

for link in rows:
        if link.text != '':
                links.append(url.rsplit('/', 1)[0] + '/' + link['href'])

f = open('output.csv', 'w')

f.write('Name,Elite,Level,HP,DropMask,MoneyDrop,Damage\n')

for link in links:
        page = requests.get(link)
        soup = BeautifulSoup(page.text, 'html.parser')
        values = soup.select('#mobcard .val')

        values = [v.text.strip() for v in values]
        if values[1] == 'normal':
                values[1] = '0'
        else:
                values[1] = '1'
        f.write(','.join(values) + '\n')
        
f.close()


**************************************************

Thanks for your question. We try our best to help you with detailed answers, But in any case, if you need any modification or have a query/issue with respect to above answer, Please ask that in the comment section. We will surely try to address your query ASAP and resolve the issue.

Please consider providing a thumbs up to this question if it helps you. by Doing that, You will help other students, who are facing similar issue.


Related Solutions

You will write a Java Application program to perform the task of generating a calendar for...
You will write a Java Application program to perform the task of generating a calendar for the year 2020. You are required to modularize your code, i.e. break your code into different modules for different tasks in the calendar and use method calls to execute the different modules in your program. Your required to use arrays, ArrayList, methods, classes, inheritance, control structures like "if else", switch, compound expressions, etc. where applicable in your program. Your program should be interactive and...
For this homework assignment, the task is to come up with a way to test the...
For this homework assignment, the task is to come up with a way to test the hypothesis that internet use impairs classroom performance experimentally. 1 What experimental design will you use to test the hypothesis? (2 points)             2 Why did you decide to use this particular design? What specific threats to validity guided your choice of experimental design? (4 points) 3 How will you operationalize your independent variable? Here you first have to define the construct (so that we...
Risk Assessment Homework In this assignment, you will perform a qualitative risk assessment, using a template...
Risk Assessment Homework In this assignment, you will perform a qualitative risk assessment, using a template that has been provided below.    A listing of threats has been prepopulated for you. These threats have been categorized by type as shown below:                                                    Threat Origination Category Type Identifier Threats launched purposefully P Threats created by unintentional human or machine errors U Threats caused by environmental agents or disruptions E Purposeful threats are launched by threat actors for a variety of reasons...
Please use markup language HTML5 please. For this homework assignment, you will create a Web site...
Please use markup language HTML5 please. For this homework assignment, you will create a Web site made up of three different pages and links between those pages Index.htm The Web pages in a site should have a similar look-and-feel. For this site, you should create a simple menu as follows: Create a horizontal line across the very top and bottom of the page. Also on the home (Index) page, create links to the other two pages. The links should appear...
1. For each of the following, write C++ statements that perform the specified task. Assume that...
1. For each of the following, write C++ statements that perform the specified task. Assume that unsigned integers are stored in four bytes and that the starting address of the built-in array is at location 1002500 in memory. Declare an unsigned int built-in array values with five elements initialized to the even integers from 2 to 10. Assume that the constant size has been defined as 5. Declare a pointer vPtr that points to an object of type unsigned int....
Assignment Task Java Object Orientated GUI CQ Real Estate (CQRE) has requested you to create a...
Assignment Task Java Object Orientated GUI CQ Real Estate (CQRE) has requested you to create a Swing based Java GUI application to cater their needs. Whenever CQRE receives a property sale offer from the seller, it assigns an employee exclusively to that sale offer and then lists it for sale. When the prospective buyers provide their offers for buying these properties, CQRE maintains the details of those purchase offers. You may note that there can be many purchase offers for...
The following statements are from Sanderson Farms Inc.’s annual report for 2017. Task 1: Perform a...
The following statements are from Sanderson Farms Inc.’s annual report for 2017. Task 1: Perform a ratio analysis using at least seven ratios of your choosing. Explain what these ratios tell you about the company and for each one provide at least ONE recommendation of an action that management could take to improve that ratio. Task 2: Create a cash flow statement for the company for 2017. Sanderson’s Farm Inc. and Subsidiaries CONSOLIDATED BALANCED SHEETS October 31, 2017 2016 Assets...
A random sample of 20 subjects was asked to perform a given task. The time in...
A random sample of 20 subjects was asked to perform a given task. The time in seconds it took each of them to complete the task is recorded below: 49, 26, 46, 40, 37, 39, 33, 47, 31, 35, 39, 43, 28, 38, 41, 29, 38, 34, 45, 41 If we assume that the completion times are normally distributed, find a 95% confidence interval for the true mean completion time for this task. Then complete the table below. Carry your...
Please design a PLC program to perform the following task: An LED will be on when...
Please design a PLC program to perform the following task: An LED will be on when it’s activated by an NO push button for an accumulated 6 seconds. In other words, the push button can be on and off, but when it’s accumulated for six seconds, the LED will be on. After six seconds, the LED will be on for four seconds and is then reset itself for another cycle. Post LogixPro image of this programming Cascading timer Assume the...
Please attach the output screenshots, narrative descriptions, or paste the Python codes when requested. Task 2...
Please attach the output screenshots, narrative descriptions, or paste the Python codes when requested. Task 2 We are producing two types of products, product A and product B. Product A is defective if the weight is greater than 10 lbs. Product B is defective if the weight is greater than 15 lbs. For a given product, we know its product_type and its weight. Design a program to screen out defective products. Starting the program with variable definition: product_type = 'xxx'...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT