In: Computer Science
Answer:-
Lexical analysis is the very first phase in the compiler designing. It takes the modified source code which is written in the form of sentences. In other words, it helps you to converts a sequence of characters into a sequence of tokens. The lexical analysis breaks this syntax into a series of tokens. It removes any extra space or comment written in the source code.
Programs that perform lexical analysis are called lexical analyzers or lexers. A lexer contains tokenizer or scanner. If the lexical analyzer detects that the token is invalid, it generates an error. It reads character streams from the source code, checks for legal tokens, and pass the data to the syntax analyzer when it demands.
Example
How Pleasant Is The Weather?
See this example; Here, we can easily recognize that there are five words How Pleasant, The, Weather, Is. This is very natural for us as we can recognize the separators, blanks, and the punctuation symbol.
HowPl easantIs Th ewe ather?
Now, check this example, we can also read this. However, it will take some time because separators are put in the Odd Places. It is not something which comes to you immediately.
In this tutorial, you will learn
Basic Terminologies
What's a lexeme?
A lexeme is a sequence of characters that are included in the source program according to the matching pattern of a token. It is nothing but an instance of a token.
What's a token?
The token is a sequence of characters which represents a unit of information in the source program.
What is Pattern?
A pattern is a description which is used by the token. In the case of a keyword which uses as a token, the pattern is a sequence of characters.
Lexical Analyzer Architecture: How tokens are recognized
The main task of lexical analysis is to read input characters in the code and produce tokens.
Lexical analyzer scans the entire source code of the program. It identifies each token one by one. Scanners are usually implemented to produce tokens only when requested by a parser. Here is how this works-
Lexical Analyzer skips whitespaces and comments while creating these tokens. If any error is present, then Lexical analyzer will correlate that error with the source file and line number.
Roles of the Lexical analyzer
Lexical analyzer performs below given tasks:
Example of Lexical Analysis, Tokens, Non-Tokens
Consider the following code that is fed to Lexical Analyzer
#include <stdio.h> int maximum(int x, int y) { // This will compare 2 numbers if (x > y) return x; else { return y; } }
Examples of Tokens created
Lexeme | Token |
int | Keyword |
maximum | Identifier |
( | Operator |
int | Keyword |
x | Identifier |
, | Operator |
int | Keyword |
Y | Identifier |
) | Operator |
{ | Operator |
If | Keyword |
Examples of Nontokens
Type | Examples |
Comment | // This will compare 2 numbers |
Pre-processor directive | #include <stdio.h> |
Pre-processor directive | #define NUMS 8,9 |
Macro | NUMS |
Whitespace | /n /b /t |