In: Computer Science
Give a context-free grammar for {w ∈ {b, c}∗| #c(w) = #b(w)}.
L = {w | w ∈ {b, c} ∗ , #c(w) = #b(w)}, where #i(w) is the number of occurrences of a symbol i in a string w. In other words, L is the language of all strings that has an equal number of b’s and c’s.
Hence, Take w ∈ L. Then either – w = bx, where #b(x) = #c(x) − 1, or – w = cy, where #a(y) = #c(y) + 1. Let A to be a nonterminal that generates LA = {x |#b(x) = #c(x) − 1}, and B be a nonterminal that generates LB = {y |#b(y) = #c(x) + 1}. Then, the grammar for L is S → bA | cB |ε
Let’s deal with A. Take x ∈ LA, i.e. #b(x) = #c(x) − 1. Then, either
In the second case, t ∈ L, and so we will write A → cS. What to do with the first case ? Observation: z must be a concatenation of two strings from LA. Why ? Intuitively – z has 2 less a’s than b’s. At the beginning of z, the difference between the numbers of b’s and c’s is 0, at the end of z the difference is -2. Since this difference may only change by 1 on every additional symbol of z, it had to become -1 before it became -2.
Formally: let f(w) = #b(w) − #c(w), and let wi stand for the prefix of w of length i. Then, for a string w of length n, w ∈ LA implies that f(w0) = 0, while f(wn) = −2. It follows that for some i, f(wi) = −1. Thus, w = wiu, where u ∈ LA, and v ∈ LA. Thus, for LA, we have the following rule A → bAA | cS. Similarly, for LB we obtain B → cBB | bS.