In: Computer Science
What is the purpose of a compiler? What are the phases of the compilation process? What does each phase do? What are the inputs and outputs of each phase? How does a compiler differ from an interpreter?
Answer:
purpose of a compiler:
A compiler is a software program that transforms high-level source code that is written by a developer in a high-level programming language into a low level object code (binary code) in machine language, which can be understood by the processor. The process of converting high-level programming into machine language is known as compilation.
The processor executes object code, which indicates when binary high and low signals are required in the arithmetic logic unit of the processor
phases of the compilation process:
Lexical Analysis
This is the first phase of the compilation process and is handled
by the lexical analyzer which is also called the Scanner. In this
phase the input source code is scanned and separated into lexical
units called tokens. The lexical analyses reads the input code
character-by-character.
Take an example, the line of code below:
String name = “Saffron”;
The lexical analyzer would generate the following 7 tokens and entered as 7 records in the Symbol Table:
String
name
=
“
Saffron
“
;
The Symbol table is generated in this phase and populated with
tokens generated. A symbol table is typically a data structure that
holds a record for each identifier in the source code.
The output of this phase is Stream of Tokens
Syntax Analysis
This phase is handled by the syntax analyser. The stream of tokens
generated in the lexical analysis phase is analyzed further to
ensure that the input code follows the syntax of the particular
language.
Syntax errors are detected in this phase.
The output of this phase includes abstract syntax trees
Semantic Analysis
Semantic analysis is handled by the Semantic Analyses and has to do
with ensuring that the source code follows standard semantic
rules.
Type Checking is taken care of in this phase. This ensures that the
variables are assigned values according to their declaration.
So if a variable have been declared as integer and then assigned a
float, the error is trapped by the Semantic Analyzer.
This phase also identifies chunks of code such as operands and
operators of statements in the input code.
The output of this phase includes the Parse Tree
Intermediate Code Generation
Intermediate code refers to a code that is somehow between the
source code and the target code, an intermediate representation of
the input source program. One attribute of an Intermediate Code is
ease of translation to target program.
An example would be a java programs compiled into Java Bytecodes
(.class files) for the Java Virtual Machine. This intermediate code
can run on any operating system that has the JVM.
One form of intermediate code is the “Three-Address-Code” which
resembles an assembly language.
The final target code is generated from the intermediate code.
Code Optimization
I already discusses the various code optimization techniques in the
video “Code Optimization Techniques in Compiler Construction”. You
can also print out the code optimization lecture in Code
Optimization by The Tech Pro. In Code Optimization, the code is
optimized to remove redundant codes and the optimize for efficient
memory management as well as improve the speed of execution. The
intermediate code ensures that a target code can be generated for
any machine enabling portability across different platforms.
Output of this phase is the Optimized Code.
Target Code Generation
Here the target code is generated for the particular platform.
Machine instruction are generated from the optimized intermediate
code. Assignment of variables and registers is handled here.
The output of this phase is the target code.
How does a compiler differ from an interpreter?:
C
ompiler
A compiler is a programming language translator which converts High Level Language program to its equivalent Intermediate Code. Compiler read complete program once and compiles complete code.
Interpreter
An Interpreter is a programming language translator which converts High Level Language program to its equivalent Machine Code. Interpreter reads program line by line or we can say statement by statement and if statement is error free, it converts into machine code.
Compiler:
->Scans the entire program and translates it as a whole into machine code
->It takes large amount of time to analyze the source code but the overall execution time is comparatively faster.
->Generates intermediate object code which further requires linking, hence requires more memory
->Generates intermediate object code which further requires linking, hence requires more memory
->It generates the error message only after scanning the whole program. Hence debugging is comparatively hard
->Programming language like C, C++ use compilers.
Interpreter:
Translates program one statement at a time.
->It takes less amount of time to analyze the source code but the overall execution time is slower.
->No intermediate object code is generated, hence are memory efficient.
->Continues translating the program until the first error is met, in which case it stops. Hence debugging is easy
->Programming language like Python, Ruby use interpreters.