Question

In: Computer Science

When you try to automatically process human speech, you frequently try to understand what the person...

When you try to automatically process human speech, you frequently try to understand what the person was saying (or trying to say) from very ambiguous sounds. For instance, the phrases “Recognize speech” and “Wreck a nice beach” sound very similar when pronounced by the average American speaker. As humans, when we process speech, we always use the context of a word to judge what the speaker is probably trying to say. Our minds automatically discard options that are very inconsistent with what we have inferred about the sentence so far, and we reevaluate our earlier “guesses” based on new sounds that come in. For example, if your friend tells you “I didn’t recognize you”, you infer pretty safely from “you” that it wasn’t “I didn’t wreck a nice you”, whereas if your friend says “I didn’t wreck a nice car”, you find this more likely than “I didn’t recognize car”. If we want to get machines to do this kind of reasoning, we can use the following simplistic model. There is a known directed graph G = (V, E). Each node v ∈ V encodes an initial portion of an (intended) sentence. Each directed edge e has two things associated with it: (1) a probability pe ∈ [0, 1], and (2) a label λe ∈ Σ, where Σ is an alphabet of phonemes.3 We assume that for each node v and each label `, there is at least one edge e = (v, u) out of v labeled by λe = ` and at least one edge e = (u, v) into v labeled by λe = `. Also, we assume that the sum of probabilities of all edges out of v is 1, i.e., P u:(v,u)∈E p(v,u) = 1. In addition to the graph, you are given a start node s (which corresponds to not having heard anything yet) and a sequence of observed phonemes L = `1`2`3 · · · `k of length k. Your goal is to find a directed path P = (e1, e2, . . . , ek) in G of length k, starting from s, such that (1) the sequence of labels on P matches the given sequence, i.e., λei = `i for all i, and (2) subject to this requirement, the probability Qk i=1 pei is maximized. Such a path P (and its final node v) give a good guess as to what the speaker was trying to say. Give (and analyze) a polynomial-time algorithm (polynomial in the size of the graph and the length of the sequence) for finding such a path P. Note: The main difficulty arises from the fact that a node v may have multiple outgoing edges with the same label. This corresponds to not being able to tell for sure whether you heard “Recognize” or “Wreck a nice”. Otherwise, this problem would be nearly trivial.

Solutions

Expert Solution

A more technical definition is given by Jurafsky where he defines ASR

as the building of system for mapping acoustic signals to a string of words.

He continues by defining automatic speech understanding(ASU) as extending

the goal to producing some sort of understanding of the sentence.

We will consider speaker independent ASR, i.e. systems that have not

been adapted to a single speaker, but in some sense all speakers of a particular

language.

Humans use more than their ears when listening, they use the knowledge they

have about the speaker and the subject. Words are not arbitrarily sequenced

together, there is a grammatical structure and redundancy that humans use

to predict words not yet spoken. Furthermore, idioms and how we ’usually’

say things makes prediction even easier.

In ASR we only have the speech signal. We can of course construct a

model for the grammatical structure and use some kind of statistical model

to improve prediction, but there are still the problem of how to model world

knowledge, the knowledge of the speaker and encyclopedic knowledge. We

can, of course, not model world knowledge exhaustively, but an interesting

question is how much we actually need in the ASR to measure up to human

Spoken language has for many years been viewed just as a less complicated

version of written language, with the main difference that spoken language is

grammatically less complex and that humans make more performance errors

while speaking. However, it has become clear in the last few years that

spoken language is essentially different from written language. In ASR, we

have to identify and address these differences.

Written communication is usually a one-way communication, but speech

is dialogue-oriented. In a dialogue, we give feed-back to signal that we un-

derstand, we negotiate about the meaning of words, we adapt to the receiver

etc.

Another important issue is disfluences in speech, e.g. normal speech is

filled with hesitations, repetitions, changes of subject in the middle of an

utterance, slips of the tounge etc.

comprehension


Related Solutions

Try to organize what we understand capital to be. Try to explain why we keep inventing...
Try to organize what we understand capital to be. Try to explain why we keep inventing new forms of capital such as human capital, social capital, and cultural capital. (maximum 1,000 words)
In this problem you will apply the utility maximization framework to try to understand some of...
In this problem you will apply the utility maximization framework to try to understand some of the effects of the current banking crisis on consumers wellbeing. Take a two period model in which consumers utility function is U(C1,C2)=C1C2 where C1 is consumption in period 1 and C2 consumption in period 2, both retrieved in cash. The consumer is free to borrow or save in period one as the interest rate r, through the banking system. a) Write the consumer's inter-temporal...
Try to organize what we understand capital to be based on the different readings in the...
Try to organize what we understand capital to be based on the different readings in the course. Try to explain why we keep inventing new forms of capital such as human capital, social capital, and cultural capital. (2-3 pages) (cite your sources do not plagiarize)
Try to organize what we understand capital to be based on the different readings in the...
Try to organize what we understand capital to be based on the different readings in the course. Try to explain why we keep inventing new forms of capital such as human capital, social capital, and cultural capital. (2-3 pages long)
persuasive speech 1.Consider a time when you listened to a persuasive speech and felt moved in...
persuasive speech 1.Consider a time when you listened to a persuasive speech and felt moved in beliefs, attitudes, values, or actions. Share how the speaker persuaded you. Why was the speaker effective, in your opinion? Fallacies of argument are common in persuasive speeches. Which fallacy do you feel is the most common in our culture right now? Why?
Why and in what situations are falsehoods protected speech? When are they not?
Why and in what situations are falsehoods protected speech? When are they not?
When you are promoted into a role where you are managing people, you don’t automatically become...
When you are promoted into a role where you are managing people, you don’t automatically become a leader? Agree or Disagree and please support your argument.
What do you understand by ABC? And Explain the process used to assign costs in an...
What do you understand by ABC? And Explain the process used to assign costs in an ABC system?
How do you understand what Creativity is? What comes to mind when you think of a...
How do you understand what Creativity is? What comes to mind when you think of a creative person?   What is your relationship with food like currently? How do you expect that bringing the practice of mindfulness to eating will affect it? Does the idea of creativity being a positive factor in the evolution of human culture make sense to you? Why or why not? Reflect on the tension between Conformity and Creativity, and the distiction between healthy vs. pathological conformity...
Try to get in touch with any person you know( it could be your relative or...
Try to get in touch with any person you know( it could be your relative or friend) who undergone or undergoing a special diet. Describe what it is made of and for what specific condition it is. You may cite specific food items and other details pertinent to the diet.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT