In: Computer Science
The Question is about Boyer-Moor-Horspool algorithm.
I cannot set the code of pre-processing: initialization of shift table.
the Pseudocode is
|
so here is my code, but arise the error.|∑|– 1 : how can i express on python
def firstOccurrence(Text, Pattern):
T = len(Text)
P = len(Pattern)
if not isinstance(Text, str):
Text = Text.seq
Text = str(Text)
if not isinstance(Pattern, str):
Pattern = Pattern.seq
Pattern = str(Pattern)
S = {}
for k in range(0, P-1):
S[k] = P
for k in range(0, P-2):
S[Pattern[k]] = P-1-k
i = 0
while i + P <= T:
j = P -1
while Text[i+j] == Pattern[j]:
j -= 1
if j < 0:
return i
i += S[Text[i+P-1]]
return -1
print(firstOccurrence("ATTATTAAA", "AAA"))
>>Traceback (most recent call last): File , line 25, in firstOccurrence KeyError: 'T'
In the algorithm, they assume lookup table containing shifts for all possible (256) characters. This array is initialized to length of pattern . Then we scan the pattern and set the values of for all characters in .
In above algorithm, initialization is done in lines 1 and 2, whereas lines 3 and 4 are changing the values of for all characters in .
In our python implementation, in order to avoid storage space wastage for characters not present in , we shall use a dictionary. In order to avoid ambiguity, we shall name the dictionary as . We skip initialization step, and directly set the values of for the characters that are present in .
While accessing the value for character in , we return
if
exists
in . Or else,
we use default value . In order to avoid key
checking, python dict has a method
get(), Using that, we can simplify the value
access as - shift_table.get(c, m). Where
m is the default value.
Now we shall proceed to code.
-- code start --
def firstOccurrence(Text, Pattern): n = len(Text) m = len(Pattern) # Shift table building phase S = {} # The shift table for k in range(0, m-2): S[Pattern[k]] = m-1-k i = 0 while i + m <= n: j = m-1 while Text[i+j] == Pattern[j]: j -= 1 if j < 0: return i i += S.get(Text[i+m-1], m) return -1 if __name__ == '__main__': text = input("Enter text : ") pattern = input("Enter pattern : ") print(" Text is ", text) print(" Pattern is ", pattern) print( "First occurance is " ,firstOccurrence(text, pattern))
-- code end --
Sample output: