In: Computer Science
The objective of this homework assignment is to demonstrate proficiency with reading files, and using string methods to slice strings, working with dictionaries and using-step wise development to complete your program.
Python is an excellent tool for reading text files and parsing (i.e. filtering) the data. Your assignment is to write a Python program in four steps that reads a Linux authentication log file, identifies the user names used in failed password attempts and counts the times each user name is used.
The purpose of step-wise development is to take an otherwise complex problem and break it down into separate manageable steps. Too often students and novice developers struggle because they try and attack a problem all at once. This leads to multiple problems in the code that prove too difficult to overcome. By breaking the problem up into pieces, the overall solution is much easier to obtain.
You need to complete this assignment in four separate steps and will be submitting four separate programs as described below.
Note that the first three steps use file auth.log.1b, because the full log file, auth.log.1, can take a long time to run in IDLE, because of the print statements. But, auth.log.1b does not have all of the data you'll need for Step 4. As a result, you need to use auth.log.1 in Step 4.
Step 1 (50 pts):
The objective of this program is simply to open a file, read each line and display it. Your program should:
Save this program as step_1.py.
Notes:
Download auth.log.1b by right clicking on it and doing a ‘Save as’ to the same directory in which you have your program. Although auth.log.1b is a plain text file, the ‘.1b’ file extension is non-standard and will not be recognized as a particular file type. You can view it, if you like, in any text editor such as NotePad, TextEdit or IDLE, but you may have to select ‘All file types’ to open it.
Potential gotchas:
If your program produces a FileNotFound error, it is most likely because auth.log.1b is not in the same folder as your program or the name got changed somehow. There may be a hidden file extension, like ‘.txt’.
Check the 'garbage in, garbage out' rule. Windows hides known file type extensions by default. What you're seeing might not be what you're getting, in terms of the actual exact file names. To make absolutely sure that the name of auth.log.1b is correct on your system do the following:
Step 2 (15 pts):
The objective of this step is to recognize which lines indicate an attack. Start this step by making a copy of your Step 1 program.
Your program should do the following:
Save this program as step_2.py.
Potential gotchas
Step 3 (25 pts):
This step is perhaps the most challenging. The objective here is to slice the user name out of the lines that include “Failed password”. Begin by making a copy of step_2.py.
Strategy:
You will slice the user name, which varies in length and start position within the line, by finding a pattern that always appears immediately before it and another pattern that always appears immediately after it. Use these offsets to compute the starting and ending values of the user name slice.
Your program should do the following:
Save your program as step_3.py.
Potential gotchas:
Step 4 (10 pts):
The objective of this step is to use a dictionary to count the number of times each user name appears in attack attempts. Begin by making a copy of step_3.py.
Strategy:
Up until this point, you've been working with an abbreviated log file, auth.log.1b, which only has data for attacks from user root. For this step, you'll need the full log file: auth.log.1. Download it just as you did with auth.log.1b and change your code to open auth.log.1 instead of auth.log.1b. Note that your program could take considerably longer to run using auth.log.1.
Use a dictionary to count the number of times each user name appears in the file. The items in the dictionary will consist of a user name as the key and a count as the value.
Your program should do the following:
Save your file as hwk6.py
Potential gotchas:
Submit all four of your program files for full credit. Be sure to include header comments in each file!
//Only if you encounter error like file not found then check for extension of file and add the name with extension in open() Good Luck!
step_1.py
fileread = open('auth.log.1b', 'r') #reading from file
line_in_file = fileread.readlines()
for line in line_in_file:
print(line.strip()) #display file
step_2.py
fileread = open('auth.log.1b', 'r') #reading from file
line_in_file = fileread.readlines()
for line in line_in_file:
if("Failed password" in line.strip()):
print(line.strip()) #display file
step_3.py
fileread = open('auth.log.1b', 'r') #reading from file
line_in_file = fileread.readlines()
for line in line_in_file:
if("Failed password" in line.strip()):
start=line.find("invalid user")+len("invalid user");
end=line.find("from")
print(line[start:end])
hwk6.py
dictionary={}
fileread = open('auth.log.1', 'r') #reading from file
line_in_file = fileread.readlines()
for line in line_in_file:
if("Failed password" in line.strip()):
start=line.find("invalid user")+len("invalid user");
#start offset --if it takes white space at start then change line to
#start=line.find("invalid user")+len("invalid user")+1
end=line.find("from") #end offset
username=line[start:end]
if(dictionary.get(username)): #check if username is in dict
value=dictionary.get(username)
dictionary[username]=value+1;
else:
dictionary[username]=1;
for dictkey,dictvalue in dictionary.items():
print("{} {}".format(dictkey,dictvalue)); #print dict key value