Question

In: Computer Science

Deliverables There is one deliverable for this assignment hw1.py Make sure the script obeys all the...

Deliverables

There is one deliverable for this assignment

  • hw1.py

Make sure the script obeys all the rules in the Script Requirements page.

What is Special About This Assignment

This homework assignment is going to be different from the other assignments.

You will have to do very little coding for this assignment.

Instead, I will supply you with a function that will test regular expressions using regular expressions contained in the following variables:

regex_1
regex_2
regex_3
regex_4
regex_5
regex_6
regex_7
regex_8
regex_9
regex_10

Code for This Assignment

You must copy the code below into your script

import re

def test_regular_expression(regex, test_string) :
    pattern = re.compile(r'' + regex )
    match   = pattern.search(test_string)
    if match :
        try :
            return match.group(1)
        except :
            print('Match found but no substring returned')
            return ''
    else:
        print(regex, 'does not match', test_string)
        return ''

line_1 = 'Mar  xxxxx16xxxxxxx 11:58:13 xxxxxxxxxxxxxxx 65.96.149.57 port 60695    Wed'
line_2 = ' 205.236.184.32  09 Feb 2014:00:03:21 +0000 12_class_notes_it117.html HTTP/1.1" 200 56810323'

regex_1  = 
regex_2  = 
regex_3  = 
regex_4  = 
regex_5  = 
regex_6  = 
regex_7  = 
regex_8  = 
regex_9  = 
regex_10 = 

print('regex_1',  regex_1, '\t returned ', test_regular_expression(regex_1, line_1))
print('regex_2',  regex_2, '\t returned ', test_regular_expression(regex_2, line_1))
print('regex_3',  regex_3, '\t returned ', test_regular_expression(regex_3, line_1))
print('regex_4',  regex_4, '\t returned ', test_regular_expression(regex_4, line_1))
print('regex_5',  regex_5, '\t returned ', test_regular_expression(regex_5, line_1))
print('regex_6',  regex_6, '\t returned ', test_regular_expression(regex_6, line_1))
print('regex_7',  regex_7, '\t returned ', test_regular_expression(regex_7, line_2))
print('regex_8',  regex_8, '\t returned ', test_regular_expression(regex_8, line_2))
print('regex_9',  regex_9, '\t returned ', test_regular_expression(regex_9, line_2))
print('regex_10', regex_10,'\t returned ', test_regular_expression(regex_10,line_2))

Specification

Define the variables below.

The value of each variable should return the value given below when run on the string listed.

Variable Value Returned String
regex_1 Month name line_1
regex_2 Day number line_1
regex_3 Hours, minutes, seconds line_1
regex_4 IP address line_1
regex_5 Port number line_1
regex_6 Day of the week line_1
regex_7 Two digit day number line_2
regex_8 Month line_2
regex_9 Year line_2
regex_10 The filename with extension line_2

You MAY NOT use ordinary characters in your regular expression values.

For example you cannot use "html" when matching the filename.

You MAY use the period, . , as an ordinary character.

Don't forget that if you want to use a meta-character as a literal, like .you must escape it with a \.

Script for this assignment

Open an a text editor and create the file hw7.py.

You can use the editor built into IDLE or a program like Sublime.

Suggestions

Write this program in a step-by-step fashion using the technique of incremental development.

In other words, write a bit of code, test it, make whatever changes you need to get it working, and go on to the next step.

Put a # before each print statement in the test code at the bottom of the file, except for the one that prints regex_1.

For each regular expression, don't write the entire expression all at once.

Instead build it up little by little testing as you go.

When you get this regular expression working, remove the # from the next statement and repeat the procedure.

Solutions

Expert Solution

In order to make regex, we need to understand same basic representation.

\w - stands for any alphanumeric element i.e and alphabet character small or capital or a digit.

\d - stands for any one of digit

. - means any character except end of line.

\ - used to escape a character, useful when we want to match a character which otherwise has a special meaning
like . in order to match a full stop if we give . then it will match any character instead we have to give
\. i.e. \ followed by .

[] we can specify multiple character inside square bracket if the any of the character within the brackets matches
then the match is successful, but only one character from input is matched.

- - can be used inside [] to indicate a range of consecutive character i.e. a-z means any character from a to z

+ any regular expression followed by + means one or more occurence of the regular expression. i.e. \d+ means
one or more repeated occurance of digit.

{n} any number given within the curly braces will look for that many repition of previous regular expression. i.e.
\d{3} will look for 3 digits.

There are many more provided by python re package but for our case this are sufficient.

With this lets look at the program.. more description of the regular expression is in the code where it is given.

=============   

import re

def test_regular_expression(regex, test_string) :
pattern = re.compile(r'' + regex )
match = pattern.search(test_string)
if match :
try :
return match.group(0)
except :
print('Match found but no substring returned')
return ''
else:
print(regex, 'does not match', test_string)
return ''

line_1 = 'Mar xxxxx16xxxxxxx 11:58:13 xxxxxxxxxxxxxxx 65.96.149.57 port 60695 Wed'
line_2 = ' 205.236.184.32 09 Feb 2014:00:03:21 +0000 12_class_notes_it117.html HTTP/1.1" 200 56810323'

regex_1 = "^[A-Za-z]{3}"
# We need to look for 3 alphabets in the beginning of string. ^ indicates that whatever regular expression follows
# has to occur in the beginning, same expression if present and found in middle of string will not match
# [A-Za-z] is regular expression for matching any character from A-Z or a-z i.e. any alphabet. This followed
# by {3} indicates that 3 such occurence of alphabets. So altogether the regular expression lools for 3 alphabets at
# the beginning of string which in our case returns -- Mar


regex_2 = "\d\d"
# We need to match the day of month here. which is the first set of two digits in the string so we gave \d\d, each \d
# stand for a digit and this regular expression would match 2 digits in the string. Since we have used
# match.group(0) in test_regular_expression subroutine it only returns the first such two digit matched
# which in our case is day of month.


regex_3 = "\d\d:\d\d:\d\d"
# We want here time in hh:mm:ss format so we specify two digits followed by : then two more digits and : and two
# more digits this returns the time.


regex_4 = "\d+\.\d+\.\d+\.\d+"
# To match the ip address we use \d+ as each part of ip address can be 1, 2 or 3 digits. so we use \d+ by + we
# indicate that there should be one of more occurance of \d i.e. digit followed by a dot sign, as mentioned earlier in
# the beginning '.' (dot) has special meaning for regular expression so to literally match a dot we escape it by \ we
# repeat this for all the 4 parts of ip address.


regex_5 = "(?<=port )\d+"
# port number is nothing but 1 or more digits so we have \d+, but here comes the catch, if we just give this it will
# match any number right in the beginnig. We know the port number is the number that is present after the word
# port. So this is what we have to specify, first look for "port " and then match the subsequent number. We achieve
# this using (?<=xxxxx) where xxxxx is the prefix to be looked for. So by specifying (?<=port) before \d+, we ensure
# match happens only for that number which is preceded by "port ", and it solves our problem. Also this prefix is
# only sought for locating the number and doesnot form part of the final match. so the returned value would just be
# numeric port number.


regex_6 = "\w+$"
# \w as mentioned earlier indicates any alphanumeric. $ is used to indicate that this match should followed by end
# of line. So this would return one or more occurance of alphanumeric character at the end of line. Since the day
# of week is at end of line, it returns the same, even though there are many occurance of one or more alpha
# numeric in the line, due to addition of $ in the regular expression.


regex_7 = "\d\d(?= \w\w\w)"
# in line 2 the day of month is two digit but not any two digit. To make sure that it is the one we want we look for
# the two digits that is followed by 3 alphabets (i.e. month name) so this time we specify (?=xxx) after our pattern
# this will ensure to return only that match which is followed by xxx i.e. \w\w\w three alphanumeric character here
# thus returning the day of month. Even though there are multiple two digits occurance before that.


regex_8 = "[A-Za-z]{3}"
#This patter looks for 3 repeated occurance of alphabets. Which turns out to be our month name in line2

regex_9 = "\d+(?=:)"
#For year we look for numeric followed by : so again we use (?=:) to specify that match should look for following :
# and thus return the year we are looking for.

​​​​
regex_10 = "\w+\.[A-Za-z]+"
# For filename we look for pattern to match alphanumeric string followed by . (fot character) and alphabet string for
#extension. so the above given pattern looks for one or more occurance of alphanumeric character which is filname #and then . (dot) character due to \. and again one or more occurance of alphabet character dur to [A-Za-z]+, thus
#returning filename.

print('regex_1', regex_1, '\t\t returned ', test_regular_expression(regex_1, line_1))
print('regex_2', regex_2, '\t\t\t returned ', test_regular_expression(regex_2, line_1))
print('regex_3', regex_3, '\t\t returned ', test_regular_expression(regex_3, line_1))
print('regex_4', regex_4, '\t returned ', test_regular_expression(regex_4, line_1))
print('regex_5', regex_5, '\t\t returned ', test_regular_expression(regex_5, line_1))
print('regex_6', regex_6, '\t\t\t returned ', test_regular_expression(regex_6, line_1))
print('regex_7', regex_7, '\t returned ', test_regular_expression(regex_7, line_2))
print('regex_8', regex_8, '\t\t returned ', test_regular_expression(regex_8, line_2))
print('regex_9', regex_9, '\t\t returned ', test_regular_expression(regex_9, line_2))
print('regex_10', regex_10,'\t returned ', test_regular_expression(regex_10,line_2))

======File ends ======

Output
=======

regex_1 ^[A-Za-z]{3} returned Mar
regex_2 \d\d returned 16
regex_3 \d\d:\d\d:\d\d returned 11:58:13
regex_4 \d+\.\d+\.\d+\.\d+ returned 65.96.149.57
regex_5 (?<=port )\d+ returned 60695
regex_6 \w+$ returned Wed
regex_7 \d\d(?= \w\w\w) returned 09
regex_8 [A-Za-z]{3} returned Feb
regex_9 \d+(?=:) returned 2014
regex_10 \w+\.[A-Za-z]+ returned 12_class_notes_it117.html


Related Solutions

Deliverables There is one deliverable for this assignment hw4.py Make sure the script obeys all the...
Deliverables There is one deliverable for this assignment hw4.py Make sure the script obeys all the rules in the Script Requirements page. Specification The file has entries like the following Barnstable,Barnstable,1 Bourne,Barnstable,5 Brewster,Barnstable,9 ... This script should create a dictionary where the county is the key and then total number of cases for the country is the value. The script should print the name of the county with the highest number of cases along with the total cases. The script...
Deliverables There is one deliverable for this assignment hw6.py Make sure the script obeys all the...
Deliverables There is one deliverable for this assignment hw6.py Make sure the script obeys all the rules in the Script Requirements page. Specification The script must have 3 functions: get_args create_python_file print_directory get_args This function must have the following header: def get_args(arg_number): This function takes as its parameter an integer. The function should look at the number of command line arguments that the script gets when it is run. If the number of command line arguments is less than the...
Due Sunday, November 1st at 11:59 PM Deliverables There is one deliverable for this assignment hw7.py...
Due Sunday, November 1st at 11:59 PM Deliverables There is one deliverable for this assignment hw7.py Make sure the script obeys all the rules in the Script Requirements page. Specification Your script must print a Kelvin to Fahrenheit conversion table and between a minimum and maximum values, and a Fahrenheit to Kelvin conversion also between a minimum and maximum values. Here is the formula for converting Kelvin to Fahrenheit Here is the formula for converting Fahrenheit to Kelvin The script...
Write a script called script2-4.py that takes a person's delivery order as inputs, totals all the...
Write a script called script2-4.py that takes a person's delivery order as inputs, totals all the items and calculates the tax due and the total due. The number of inputs is not known. You can assume that the input is always valid. (i.e no negative numbers or string like "cat"). Use a loop that takes the prices of items as parameters that are floats, counts the number of items, and sums them to find the total. You must also use...
Write a script called script2-4.py that takes a person's delivery order as inputs, totals all the...
Write a script called script2-4.py that takes a person's delivery order as inputs, totals all the items and calculates the tax due and the total due. The number of inputs is not known. You can assume that the input is always valid. (i.e no negative numbers or string like "cat"). Use a loop that takes the prices of items as parameters that are floats, counts the number of items, and sums them to find the total. You must also use...
Write a script called script2-4.py that takes a person's delivery order as inputs, totals all the...
Write a script called script2-4.py that takes a person's delivery order as inputs, totals all the items and calculates the tax due and the total due. The number of inputs is not known. You can assume that the input is always valid. (i.e no negative numbers or string like "cat"). Use a loop that takes the prices of items as parameters that are floats, counts the number of items, and sums them to find the total. You must also use...
Make sure to include comments that explain all your steps (starts with #) Make sure to...
Make sure to include comments that explain all your steps (starts with #) Make sure to include comments that explain all your steps (starts with #) Write a program that prompts the user for a string (a sentence, a word list, single words etc.), counts the number of times each word appears and outputs the total word count and unique word count in a sorted order from high to low. The program should: Display a message stating its goal Prompt...
Assignment Requirements Write a python application that consists of two .py files. One file is a...
Assignment Requirements Write a python application that consists of two .py files. One file is a module that contains functions used by the main program. NOTE: Please name your module file: asgn4_module.py The first function that is defined in the module is named is_field_blank and it receives a string and checks to see whether or not it is blank. If so, it returns True, if not it return false. The second function that is defined in the module is named...
ASSIGNMENT Discuss the concept of "proof" as it relates to science. Make sure to provide at...
ASSIGNMENT Discuss the concept of "proof" as it relates to science. Make sure to provide at least one example in your discussion to facilitate the concept of proof.
the assignment is to compare and contrast two ads. Make sure the ads are similar enough...
the assignment is to compare and contrast two ads. Make sure the ads are similar enough to be compared and different enough to be contrasted: two soap ads, two cereal ads, two ads that are for baby products, etc 1 page
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT