Question

In: Computer Science

Using python!!!! 1.     Copy the file web.py from class (or class notes) into your working folder....

Using python!!!!

1.     Copy the file web.py from class (or class notes) into your working folder.

2. Include the following imports at the top of your module (hopefully this is sufficient):


from web import LinkCollector # make sure you did 1
from html.parser import HTMLParser
from urllib.request import urlopen
from urllib.parse import urljoin
from urllib.error import URLError

3.     Implement a class ImageCollector. This will be similar to the LinkCollector, given a string containing the html for a web page, it collects and is able to supply the (absolute) urls of the images on that web page.   They should be collected in a set that can be retrieved with the method getImages (order of images will vary). Sample usage:

>>> ic = ImageCollector('http://www2.warnerbros.com/spacejam/movie/jam.htm')
>>> ic.feed( urlopen('http://www2.warnerbros.com/spacejam/movie/jam.htm').read().decode())
>>> ic.getImages()
{'http://www2.warnerbros.com/spacejam/movie/img/p-sitemap.gif', …, 'http://www2.warnerbros.com/spacejam/movie/img/p-jamcentral.gif'}

>>> ic = ImageCollector('http://www.kli.org/')
>>> ic.feed( urlopen('http://www.kli.org/').read().decode())

>>> ic.getImages()
{'http://www.kli.org/wp-content/uploads/2014/03/KLIbutton.gif', 'http://www.kli.org/wp-content/uploads/2014/03/KLIlogo.gif'}

4. Implement a class ImageCrawler that will inherit from the Crawler developed in amd will both crawl links and collect images. This is very easy by inheriting from and extending the Crawler class. You will need to collect images in a set. Hint: what does it mean to extend? Implementation details:

a. You must inherit from Crawler. Make sure that the module web.py is in your working folder and make sure that you import Crawler from the web module.

b. __init__ - extend’s Crawler’s __init__ by adding an set attribute that will be used to store images

c. Crawl – extends Crawler’s crawl by creating an image collector, opening the url and then collecting any images from the url in the set of images being stored. I recommend that you collect the images before you call the Crawler’s crawl method.

d. getImages – returns the set of images collected


>>> c = ImageCrawler()
>>> c.crawl('http://www2.warnerbros.com/spacejam/movie/jam.htm',1,True)
>>> c.getImages()
{'http://www2.warnerbros.com/spacejam/movie/img/p-lunartunes.gif', … 'http://www2.warnerbros.com/spacejam/movie/cmp/pressbox/img/r-blue.gif'}

>>> c = ImageCrawler()
>>> c.crawl('http://www.pmichaud.com/toast/',1,True)
>>> c.getImages()
{'http://www.pmichaud.com/toast/toast-6a.gif', 'http://www.pmichaud.com/toast/toast-2c.gif', 'http://www.pmichaud.com/toast/toast-4c.gif', 'http://www.pmichaud.com/toast/toast-6c.gif', 'http://www.pmichaud.com/toast/ptart-1c.gif', 'http://www.pmichaud.com/toast/toast-7b.gif', 'http://www.pmichaud.com/toast/krnbo24.gif', 'http://www.pmichaud.com/toast/toast-1b.gif', 'http://www.pmichaud.com/toast/toast-3c.gif', 'http://www.pmichaud.com/toast/toast-5c.gif', 'http://www.pmichaud.com/toast/toast-8a.gif'}

5.     Implement a function scrapeImages:   Given a url, a filename, a depth, and Boolean (relativeOnly)., this function starts at url, crawls to depth, collects images, and then writes an html document containing the images to filename. This is not hard, use the ImageCrawler from the prior step. For example:

>>> scrapeImages('http://www2.warnerbros.com/spacejam/

movie/jam.htm','jam.html',1,True)
>>> open('jam.html').read().count('img')
62

>>> scrapeImages('http://www.pmichaud.com/toast/',
'toast.html',1,True)
>>> open('toast.html').read().count('img')
11

link to web.py https://www.dropbox.com/s/obiyi7lnwc3rw0d/web.py?dl=0

Solutions

Expert Solution

Solution: See the code below:

-----------------------------------------------

#imorts

from web import LinkCollector
from html.parser import HTMLParser
from urllib.request import urlopen
from urllib.parse import urljoin
from urllib.error import URLError

## ImageCollector class
class ImageCollector(HTMLParser):
    ''' when given a url, ImageCollector
collects all the image urls found at that url.
Images will be collected in a set.
You can retrive them through getImages() method.'''


    def __init__(self,url):      
        HTMLParser.__init__(self)
        self.url = url
        self.imageURLs = set()
       
    def handle_starttag(self,tag,attrs):
        if tag=='img':
            for attr,val in attrs:
                if attr=='src': # collect
                    if val[:4]=='http': # absolute
                        self.imageURLs.add( val )                  

    def getImages(self):
        return self.imageURLs


## ImageCrawler class
from web import Crawler
class ImageCrawler(Crawler):
    '''Crawls a web pages for images'''
    def __init__(self):
        Crawler.__init__(self)
        self.imageURLs=set()
  
    def crawl(self,url,depth,relativeOnly):
        ic = ImageCollector(url)
        ic.feed( urlopen(url).read().decode())
        imageURLs=ic.getImages()
        #check signatures with your own method
        Crawler.crawl(imageURLs,depth,relativeOnly)
  
    def getImages(self):
        return self.imageURLs

##ScrapeImages method
def scrapeImages(url, filename, depth, relativeOnly):
    ''' create a local html file named filename
    containing all images found at url'''
    # crawl images
    ic = ImageCrawler()
    ic.crawl(url, depth, relativeOnly)

    # write them to a file
    file = open(filename,'w')
    file.write( '<html><body>\n')
    for image in ic.getImages():
        file.write( '<img src={}><br>\n'.format(image))
    file.write('</body></html>')
    file.close()

-----------------------------------------

Output:

Note: In your web.py, Crawler class code is not given. Hence, its output is not displayed. Modify and use ImageCrawler class accrodingly.


Related Solutions

Copy the following Python fuction discussed in class into your file: from random import * def...
Copy the following Python fuction discussed in class into your file: from random import * def makeRandomList(size, bound): a = [] for i in range(size): a.append(randint(0, bound)) return a a. Add another function that receives a list as a parameter and computes the sum of the elements of the list. The header of the function should be def sumList(a): The function should return the result and not print it. If the list is empty, the function should return 0. Use...
language: python Create a text file in your project folder with at least 20 "quirky sayings"/fortunes...
language: python Create a text file in your project folder with at least 20 "quirky sayings"/fortunes (the only requirement is that they be appropriate for display in class), If I use my own file though, you should handle as many fortunes as I put in. Make each fortune its own line, •in your main function ask the user for the name of the fortunes file.•Create a function which takes the name of the fortunes file as a parameter, open that...
Using python boto3 Check if a folder is in an S3 bucket and if not in...
Using python boto3 Check if a folder is in an S3 bucket and if not in the bucket then it creates a folder in the right directory
You will have to experiment with using chmod on the folder and the file to come...
You will have to experiment with using chmod on the folder and the file to come up with the following: Find the chmod commands for both the folder and the file that you would use to give the minimum permissions for You (the owner) to be able to edit and change the file (and I don’t want to see 777! ? ) : ___________________________________________________________ ___________________________________________________________ ___________________________________________________________
Write a python program to read from a file the names and grades of a class...
Write a python program to read from a file the names and grades of a class of students to calculate the class average, the maximum, and the minimum grades. The program should then write the names and grades on a new file identifying the students who passed and the students who failed. The program should consist of the following functions: a) Develop a gradesInput() function that reads data from a file and stores it and returns it as a dictionary....
Create a class Student (in the separate c# file but in the project’s source files folder)...
Create a class Student (in the separate c# file but in the project’s source files folder) with those attributes (i.e. instance variables): Student Attribute Data type (student) id String firstName String lastName String courses Array of Course objects Student class must have 2 constructors: one default (without parameters), another with 4 parameters (for setting the instance variables listed in the above table) In addition Student class must have setter and getter methods for the 4 instance variables, and getGPA method...
Name your program file warmup.py Submit your working Python code to your CodePost.io account. In this...
Name your program file warmup.py Submit your working Python code to your CodePost.io account. In this challenge, establish if a given integer num is a Curzon number. If 1 plus 2 elevated to num is exactly divisible by 1 plus 2 multiplied by num, then num is a Curzon number. Given a non-negative integer num, implement a function that returns True if num is a Curzon number, or False otherwise. Examples is_curzon(5) ➞ True 2 ** 5 + 1 =...
Part 1 Ask for a file name. The file should exist in the same folder as...
Part 1 Ask for a file name. The file should exist in the same folder as the program. If using input() to receive the file name do not give it a path, just give it a name like “number1.txt” or something similar (without the quotes when you enter the name). Then ask for test scores (0-100) until the user enters a specific value, in this case, -1. Write each value to the file as it has been entered. Make sure...
For the first part of this lab, copy your working ArrayStringList code into the GenericArrayList class.(already...
For the first part of this lab, copy your working ArrayStringList code into the GenericArrayList class.(already in the code) Then, modify the class so that it can store any type someone asks for, instead of only Strings. You shouldn't have to change any of the actual logic in your class to accomplish this, only type declarations (i.e. the types of parameters, return types, etc.) Note: In doing so, you may end up needing to write something like this (where T...
USE BASIC PYTHON Focus on Basic file operations, exception handling. Save a copy of the file...
USE BASIC PYTHON Focus on Basic file operations, exception handling. Save a copy of the file module6data.txt (Below) Write a program that will a. Open the file module6data.txt b. Create a second file named processed. txt c. Read the numbers from the first file one at a time. For each number write into second file, if possible, its I. Square ii. Square root iii. Reciprocal (1/number)use a math module function for the square root. use exception handling to trap possible...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT