In: Computer Science
Write a script named countmatches that expects at least two arguments on the command line.
Error checking:
The script should check that the first argument is a file name, and that there is at least one other argument after it. If the first argument is not a file name or if it is missing anything after the filename, the script should output to the user
and then exit.
The script is not required to check that the file is in the proper form, or that the strings contains nothing but the letters a, c, g, and t.
The script is not required to check that the dna file contains a number of bases equal to a multiple of 3.
For each valid argument string, the program will search the DNA string in the file and count how many non-overlapping occurrences of that argument string are in the DNA string. To make sure you understand what non‐overlapping means, the string ata occurs just once in the string atata, not twice, because the two occurrences overlap.
If your script is called correctly, it will output for each argument a line containing the argument string followed by how many times it occurs in the string. If it finds no occurrences, it should output 0 as a count.
For example, if the string aaccgtttgtaaccggaac is in a file named dnafile, then your script should work like this:
$ ./countmatches dnafile ttt ttt 1 $ countmatches dnafile aac ggg aaccg aac 3 ggg 0 aaccg 2 |
Warning: if it is given valid arguments, the script is not
to output anything except the strings and their associated counts.
No fancy messages, no words!
Testing: There DNA text files are in the cs132 course directory,
/data/biocs/b/student.accounts/cs132/data/dna_textfiles
to give to your script as the file argument.
Hint: You can write this script using grep and one filter command that appears in the course material. Although there are many filters commands, you do not need all of them to write the script. You have to read more about grep to know how to use it. The one filter command appears in the course material already.
Please find the following program in shell script to find count of dna pattterns available in the input file.
Note:
I have added screen shots and comments for better understanding.
Program:
#!/bin/bash
#assign the total arguments passed to the script
count=$#
#throw error if the count is less than 2
if [ $count -lt 2 ]
then
echo "Invalid number of argument"
echo "$0 <file name> [arg1] [arg2] .."
exit 1
fi
#assign the $1 to fileName
fileName=$1
#shift skips the file name
shift
#check if the file present or not
if [ ! -f $fileName ]
then
echo "File: $fileName not found"
exit
fi
#now go through all the dna patggg aaccgterns passed to the argument
#inside the for loop, we use sed to repalce the overlapping
pattern to a pattern.
#for example, if "ataata" present, then change it to "ata"
for dnaPattern in $*
do
#get the count of pattern match
patternCount=`sed "s/$dnaPattern$dnaPattern/$dnaPattern/g"
<$fileName|grep -oP $dnaPattern |wc -l`
#print the output to screen
echo $dnaPattern $patternCount
done
Screen Shot:
Output: