In: Computer Science
(a) How are rules for lexical analysis written? (b) What are these rules used for?
During compilation process, lexical analysis is the first phase of a compiler. The lexical analyser modifies the source code of the program. It takes the source code as input and produces a set of tokens as output.
The source code is converted into a series of token which are produced by removing all the white spaces and comments present in the source code.
The alphanumeric sequence of characters in a token are known as Lexemes.
To identify these lexemes there are many rules which are predefined which are known as grammar rules.
According to these rules, patterns are observed. A pattern explains what can be a token, and these patterns are defined by means of regular expressions.
In programming language, keywords, constants, identifiers, strings, numbers, operators and punctuations symbols can be considered as tokens.
For example take variable declaration in C language,
int a=10;
Now performing lexical analysis on this code:
int (keyword), a (identifier), = (operator), 10 (constant) and ; (symbol).
(b). These are used to make the compilation process faster as the source code is not understandable to the machine, so these phases of compilers are used for converting the source code into machine code.