In: Computer Science
Write the RE for identifiers that can consist of any sequence of letters (l) or digit (d) or "_" but the first char must be a letter and the last char cannot be a "_"
So, as given in the question, we need to write a Regular Expression (RE), which consists of any sequence of letters (we represent letters as 'l'), or any sequence of digits (we represent digits as 'd'), or the particular symbol '_'. Here a condition is given that is, the first char or symbol must be a letter, which means must be 'l', and not any digit 'd' or symbol '_'. Also, the last char or symbol can not be the symbol '_'.
So, what can our language be? Let us first see the language with examples -
Language (L) can be {l, ld, ll, l_d, l_l, lll, ldd, l_d_d, ...} (can go infinitely)
(Very Important) Here it is stated that the start symbol or char will be the letter 'l', and the last char can not be '_', so last char can be 'l', and in between, you may have any sequence of letters means you may have a sequence of 0 length too, so only 'l' will be there in the language. A word from our language set L, 'ld', start with 'l', end with 'd', which is valid, and in 'l_d', it starts with 'l', the middle is a single sequence of the symbol '_' and then end in digit 'd'. With this, we are creating the language but 'd' can not be in language, because it does not start with 'l'.
Also, I will tell you a little bit about Kleen star (represented as '*'), which means making the set of all possible strings from a finite length of alphabet or symbol given. You can also rephrase it as you can construct strings of any length or sequence from a given set of symbols (which is here {'l', 'd', '_'}). It can also be a null string or strings with a length of 0, which we call as epsilon (represented as ).
Ex - a* = {, a, aa, aaa, aaaa, ....} (where 'a' is our symbol)
Suppose we have some symbol set {a, b}, and we have RE as (a + b), then the language set contains {a, b}, means EITHER 'a' OR 'b', & if we have RE as (a . b), then the language set contains {ab}, means 'a' AND 'b'. Just like logic gates, + is OR and . (dot) is represented as AND.
Now that we know what our language is, and we have a good idea about Kleen star, & other RE operators, so let us make the RE for the abovementioned language. So, our RE will be =
l . (l + d + _)* . (l + d)
or
letters . (letters + digits + _)* . (letters + digits)
Although I have told you how to write it indirectly, let me explain to you again, what this exactly means. So, here in our RE, the language starts with 'l', then we have 'l . (l + d + _)*', and it means l AND (l OR d OR _)*, where
(l + d + _)* = {, l ,d, _, ld, dl, l_, d_, _d, _l, ll, dd, ___, lll, ___, ddd, ldl, l_d, d_l, ... } (as I told, any possible sequence of string starting from length 0 to infinity). So, 'l . (l + d + _)*' = {l, ll, ld, l_, l_l, l_d, ld_, ...}, but this can end in '_', so this can not be the end of our RE.
In (l + d), we can have either 'l' or 'd', but can not have both, so by this we can not have the last char as '_'.
Hope you got it now. Thank you.