In: Computer Science
Transcribing Anonymous SEC Tips
Java or Python
* The function is expected to return a STRING_ARRAY.
* The function accepts following parameters:
* 1. STRING_ARRAY inputNames
* 2. STRING_ARRAY secRecords
*/
Problem Statement
Introduction
Imagine you are helping the Security Exchange Commission (SEC) respond to anonymous tips. One of the biggest problems the team faces is handling the transcription of the companies reported by the callers. You've noticed that sometimes the company name is misheard by the person taking the call, sometimes it is simply mistyped, and sometimes both. These problems make it more difficult to search the SEC records to identify the company.
You have access to the list of transcribed company names and the database of SEC records. We need a way to effectively translate company names based on their transcriptions so we can narrow our search results to the one company we are interested in.
Input
You will receive a string array representing the list of transcribed company names.
Each string in the array takes the following form:
You will also receive a string array representing the database of SEC records.
You may also make the following assumptions about the structure:
Output
For each transcribed company name in the input string array, you want to match that to a company name (first part of a string) in the SEC database. The second part of the string in the SEC database will represent the company's EIN. Your output should also be a string array, this time representing the EINs mapped to the names in the input string array. You may assume that every input name will match a name in the SEC records.
Responding to Calls
The Basics
Let's start with the first step: making sure that if the name is transcribed perfectly, we match that company's record in the database right away. This will give you an idea of how to match company names in our system and what the output array should be. This will also show you how the input is structured if you desire to make your own custom inputs. The input comes in the form of two string arrays, where the first line represents the length of the array. An example is below.
Input
3
Pear Computers
Construct An Ursus
Planetary Technologies
3
Pear Computers;54-1264938
Construct An Ursus;58-1481332
Planetary Technologies;19-3563561
Output
["54-1264938", "58-1481332", "19-3563561"]
Your code should pass test cases 0, 1, and 2 after solving this step.
Misspellings
The second thing we want to look for are basic misspellings due to the transcriber hearing the company name correctly but missing a keystroke or pressing the wrong key instead. Think "Harveys Steakhouse" turns into "Harfeys Sreakhouse" or "Sugar and Sugar" turns into "Sugra and Sugar". In the first example, the transcriber missed the "v" key and hit "f" instead, and missed "t" and hit "r" instead. In the second, the transcriber accidentally typed "r" before "a". You should pass test cases 3 through 8 after solving this problem. Hint: looking up the phrase "string edit distance" in a search engine should be of some help to you here.
Input
3
Pewar Computers
Consuct A Ursuus
Planteray Techniligies
3
Pear Computers;54-1264938
Construct An Ursus;58-1481332
Planetary Technologies;19-3563561
Output
["54-1264938", "58-1481332", "19-3563561"]
Metaphones
The last and trickiest instance of transcription comes in the form of arbitrary misspellings resulting from the transcriber either hearing the name correctly and using a different spelling than the one in our database, or mishearing the name in some form. Think "Ashley Antiques" vs. "Ashlee Antiques" vs. "Ashleigh Antiques" or "Rate My Reading" turns into "Great My Treating". This is a purposefully very open-ended and tricky problem, and you are not expected to get all cases. One example is viewable and most are purposefully hidden - try to be creative with your solution, as there are multiple ways you could solve this piece! Test cases 9 through 16 are the ones that relate to this part of the problem; as before, an example is below.
Input
3
Pare Computers
Conduct An Ersis
Palintary Technawlogies
3
Pear Computers;54-1264938
Construct An Ursus;58-1481332
Planetary Technologies;19-3563561
Output
["54-1264938", "58-1481332", "19-3563561"]
##IF YOU ARE SATISFIED WITH THE ANSWER, KINDLY LEAVE A LIKE, OR IF YOU THINK THERE IS SOME ERROR, LEAVE A COMMENT
CODE: PYTHON
from jellyfish import soundex
import editdistance
def my_func(inputNames,secRecords):
result = []
for i in inputNames:
for key in secRecords.keys():
if i == key or editdistance.eval(i,key) <= 2:
result.append(secRecords[key])
break
elif editdistance.eval(soundex(i),soundex(key)) <= 2:
result.append(secRecords[key])
break
return result
inputNames = []
for _ in range(int(input())):
inputNames.append(input())
secRecords = {}
for _ in range(int(input())):
value = input().split(";")
secRecords[value[0]] = value[-1]
print(my_fun(inputNames,secRecords))
OUTPUT:
Misspellings:
Metaphones: