Question

In: Computer Science

Please support with a python response. The prompt was: To locate a link on a webpage,...

Please support with a python response. The prompt was:

To locate a link on a webpage, we can search for the following three things in order:
- First, look for the three character string '<a '
- Next, look for the following to close to the first part '>': Those enclose the URL
- Finally, look for three characters to close the element: '</a': which marks the end of the link text

The anchor has two parts we are interested in: the URL, and the link text.

Write a function that takes a URL to a webpage and finds the **first link** on the page. Your function should return a tuple holding two strings: the URL and the link text.

We already created a program that fetches the contents of a website and pastes them into a list in a prior assignment:

import urllib.request
import sys


def fetch_contents(website):
    "Return the contents of this webpage as a list of lines"
    try:
        res = []

        with urllib.request.urlopen(website) as f:
            text = f.read().decode('utf-8')

            # Break the page into lines
            text = text.split('\n')
            for line in text:
                res.append(line)
    
        return res

    except urllib.error.URLError as e:
        print(e.reason)
        return []

if (len(sys.argv) != 2):
    print(f"Usage: python read_url.py <website>")
else:
    lst = fetch_contents(sys.argv[1])

    # Now display the contents
    for line in lst:
        print(line)

The hard part is done. Once I have all of the text, I now need to parse through it and look for what was prompted above. We are not allowed to use Beautiful Soup or any of the other libraries, which I wasn't aware of when I started working on it. How can I manually walk through the list, find and extract the first URL, and return it as a tuple (url, link text)?

Solutions

Expert Solution

# -*- coding: utf-8 -*-
"""
Created on Sun Oct 25 10:51:55 2020

@author: Unknown
"""

import urllib.request
import sys
import re

#The function Find uses the built in regex function to search the matching pattern
def Find(pat, text):    
    match = re.findall(pat, text)
    if match:        
        foundtext = match
        print('The matching line is ', text)
    else: 
        #print('No match found')
        foundtext = None    
    return foundtext

def fetch_contents(website):
    "Return the contents of this webpage as a list of lines"
    try:
        res = []

        with urllib.request.urlopen(website) as f:
            text = f.read().decode('utf-8')            

            # Break the page into lines
            text = text.split('\n')
            for line in text:
                res.append(line)
    
        return res

    except urllib.error.URLError as e:
        print(e.reason)
        return []
#The function makes a call to the find function to search for the string matching the pattern
def firstlink_tuple(pat,line):
    matchobj = Find(pat, line)
    return matchobj
    
    
if (len(sys.argv) != 2):
    print(f"Usage: python read_url.py <website>")
else:
    lst = fetch_contents(sys.argv[1])
    # Now display the contents
    #Regex pattern to search the string.The'(' and ')' indicates the text to be 
    #returned by Regex.By default the matched content is returned as tuple
    pat = r'<a\shref=(.*)>(.*)</a'  
    for line in lst:
        matchobj = firstlink_tuple(pat,line)                              
        if matchobj:                            
            print('The matching link and text in tuple is : ', matchobj)            
            exit()    
    
    
    

Related Solutions

in python please Q1) Create a Singly link list and write Python Programs for the following...
in python please Q1) Create a Singly link list and write Python Programs for the following tasks: a. Delete the first node/item from the beginning of the link list b. Insert a node/item at the end of the link list c. Delete a node/item from a specific position in the link list Q2) Create a Singly link list and write a Python Program for the following tasks: a. Search a specific item in the linked list and return true if...
1a. Please go to the library link and locate an article by Douglas R. Carmichael titled...
1a. Please go to the library link and locate an article by Douglas R. Carmichael titled "Hocus-Pocus Accounting." Please read the article and then identify and discuss the issues related to revenue recognition. In your responses, give thought to what you have read in your e-book, lesson, and Becker material. Also, what types of tests would the auditor design to uncover some of these issues?
Step by step in python please Write a program this will read a file (prompt for...
Step by step in python please Write a program this will read a file (prompt for name) containing a series of numbers (one number per line), where each number represents the radii of different circles. Have your program output a file (prompt for name) containing a table listing: the number of the circle (the order in the file) the radius of the circle the circumference the area of the circle the diameter of the circle Use different functions to calculate...
*******************In Python please******************* (1) Prompt the user to enter a string of their choosing. Store the...
*******************In Python please******************* (1) Prompt the user to enter a string of their choosing. Store the text in a string. Output the string. (1 pt) Enter a sample text: we'll continue our quest in space. there will be more shuttle flights and more shuttle crews and, yes; more volunteers, more civilians, more teachers in space. nothing ends here; our hopes and our journeys continue! You entered: we'll continue our quest in space. there will be more shuttle flights and more...
Please go to the library link and locate an article by Douglas R. Carmichael titled "Hocus-Pocus...
Please go to the library link and locate an article by Douglas R. Carmichael titled "Hocus-Pocus Accounting." Please read the article and then identify and discuss the issues related to revenue recognition. In your responses, give thought to what you have read in your e-book, lesson, and Becker material. Also, what types of tests would the auditor design to uncover some of these issues?
Include a webpage link showing a Wi-Fi wireless access point that can mount on the wall...
Include a webpage link showing a Wi-Fi wireless access point that can mount on the wall or ceiling. For one option, select a device that can receive its power by PoE from the network cable run to the device. For the other option, select a device that requires an electrical cable to the device as well as a network cable.
Python: How would I write a function that takes a URL of a webpage and finds...
Python: How would I write a function that takes a URL of a webpage and finds the first link on the page? The hint given is the function should return a tuple holding two strings: the URL and the link text.
8.30 LAB*: Program: Authoring assistant. PYTHON PLEASE!! (1) Prompt the user to enter a string of...
8.30 LAB*: Program: Authoring assistant. PYTHON PLEASE!! (1) Prompt the user to enter a string of their choosing. Store the text in a string. Output the string. (1 pt) Ex: Enter a sample text: we'll continue our quest in space. there will be more shuttle flights and more shuttle crews and, yes; more volunteers, more civilians, more teachers in space. nothing ends here; our hopes and our journeys continue! You entered: we'll continue our quest in space. there will be...
what does activate.bat mean when working with python in the command prompt or the anaconda prompt...
what does activate.bat mean when working with python in the command prompt or the anaconda prompt (please be detailed)
1. At the command prompt, use the where command to locate the where.exe file. Copy and...
1. At the command prompt, use the where command to locate the where.exe file. Copy and paste the full path and filename for the where.exe file into the space provided below. 2. At the command prompt, enter the following command and press enter. find "free beer" "c:\Program Files (x86)\Notepad++\*.txt" This command searches for the string "free beer" in all .TXT files in the Notepad++ program folder. According to the output of the find command, which of the files listed below...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT