Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Compiler Design: LL(1) Parsing and Predictive Parsing, Lecture notes of Compiler Design

Uttarakhand Technical University Compiler Design

Btech compiler design notes for students to study just before exams

Typology: Lecture notes

2018/2019

On special offer

~~30 Points~~

Limited-time offer

Uploaded on 04/28/2019

vaibhav-verma-1 🇮🇳

(2)

1 document

1 / 114

This page cannot be seen from the preview

Don't miss anything!

Prepared by MOHIT KUMAR for and on behalf of Meerut Institute of Engineering and Technology, Meerut.

COMPILER DESIGN

(TCS-502)

COURSE FILE

FOR

Bachelor of Technology

Computer Science and Engineering

Session: 2007-2008

Department of Computer Science and Engineering

MEERUT INSTITUTE OF ENGINEERING AND TECHNOLOGY

MEERUT

On special offer

Partial preview of the text

Download Compiler Design: LL(1) Parsing and Predictive Parsing and more Lecture notes Compiler Design in PDF only on Docsity!

Prepared by MOHIT KUMAR for and on behalf of Meerut Institute of Engineering and Technology, Meerut.

COMPILER DESIGN

(TCS-502)

COURSE FILE

FOR

Bachelor of Technology

IN

Computer Science and Engineering

Session: 2007-

Department of Computer Science and Engineering

MEERUT INSTITUTE OF ENGINEERING AND TECHNOLOGY

MEERUT

MIET TCS-502 COMPILER DESIGN Course File II

CONTENTS

PREAMBLE

SYLLABUS

LECTURE PLAN

LECTURE NOTES

01 Introduction to Compilers and its Phases

02 Lexical Analysis

03 Basics of Syntax Analysis

04 Top-Down Parsing

05 Basic Bottom-Up Parsing Techniques

06 LR Parsing

07 Syntax-Directed Translation

08 Symbol Tables

09 Run Time Administration

10 Error Detection and Recovery

11 Code Optimization

EXERCISES

Practice Questions

Examination Question Papers

Laboratory Assignments

MIET TCS-502 COMPILER DESIGN Course File IV

SYLLABUS

(As laid down by Uttar Pradesh Technical University, Lucknow)

UNIT I:

Introduction to Compiler: Phases and passes, Bootstrapping, Finite state machines and regular expressions and their applications to lexical analysis, Implementation of lexical analyzers, lexical- analyzer generator, LEX-compiler, Formal grammars and their application to syntax analysis, BNF notation, ambiguity, YACC. The syntactic specification of programming languages: Context free grammars, derivation and parse trees, capabilities of CFG.

UNIT II:

Basic Parsing Techniques: Parsers, Shift reduce parsing, operator precedence parsing, top down parsing, predictive parsers Automatic Construction of efficient Parsers: LR parsers, the canonical Collection of LR (O) items, constructing SLR parsing tables, constructing Canonical LR parsing tables, Constructing LALR parsing tables, using ambiguous grammars, an automatic parser generator, implementation of LR parsing tables, constructing LALR sets of items.

UNIT III:

Syntax-directed Translation: Syntax-directed Translation schemes, Implementation of Syntax- directed Translators, Intermediate code, postfix notation, Parse trees & syntax trees, three address code, quadruple & triples, translation of assignment statements, Boolean expressions, statements that alter the flow of control, postfix translation, translation with a top down parser. More about translation: Array references in arithmetic expressions, procedures call, declarations, case statements.

UNIT IV:

Symbol Tables: Data structure for symbols tables, representing scope information. Run-Time Administration: Implementation of simple stack allocation scheme, storage allocation in block structured language. Error Detection & Recovery: Lexical Phase errors, syntactic phase errors semantic errors.

UNIT V:

Introduction to code optimization: Loop optimization, the DAG representation of basic blocks, value numbers and algebraic laws, Global Data-Flow analysis.

TEXTBOOK:

Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, “ Compilers: Principles, Techniques, and Tools ” Addison-Wesley.

MIET TCS-502 COMPILER DESIGN Course File V

LECTURE PLAN

(PERIOD OF STUDY: From August 2007 to November 2007)

UNIT COMPETENCIES TOPICS TEACHING

AIDS

HOURS TIME OF

STUDY

Introduction to the subject, its pre-requisites, objectives, content and plan of study

Introduction to compiler and its phases

Lexical Analysis 3

AUGUST

FIRST

Basic Concepts of the compiler, overview of the Passes, Phases, Lexical Analyzers, CFG. 3.^ Basics of Syntax Analysis

Top Down Parsing 4
Basic Bottom Up Parsing Techniques

3 SEPTEMBER

SECOND

Basic idea about the different types of Parsers and their working Mechanism.

LR Parsing 6

THIRD

Internal details about translation, actions to be attached to productions that shall produce the desired code.

Syntax Directed Translation

OCTOBER

Symbol Tables 1
Run Time Storage Organization

FOURTH

Data structures related with the compiler, scope of the information stored and the possible errors that may arise.

Error Detection and Recovery

FIFTH

(^) Optimization techniques that are related with the compiler process.

Code Optimization Transparencies on Overhead Projector OR PowerPoint Presentation on LCD Projectors

NOVEMBER

TOTAL NUMBER OF LECTURE HOURS FOR THE COURSE: 35

Puts information about identifiers into the symbol table.
Regular expressions are used to describe tokens (lexical constructs).
A (Deterministic) Finite State Automaton can be used in the implementation of a lexical analyzer.

1.2.2 Syntax Analyzer

A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program.
A syntax analyzer is also called a parser.
A parse tree describes a syntactic structure.

Example: For the line of code newval := oldval + 12 , parse tree will be:

assignment

identifier := expression

newval expression + expression

identifier number

oldval 12

The syntax of a language is specified by a context free grammar (CFG).
The rules in a CFG are mostly recursive.
A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not. - If it satisfies, the syntax analyzer creates a parse tree for the given program.

Example: CFG used for the above parse tree is: assignment Æ identifier := expression expression Æ identifier expression Æ number expression Æ expression + expression

Depending on how the parse tree is created, there are different parsing techniques.
These parsing techniques are categorized into two groups:
- Top-Down Parsing,
- Bottom-Up Parsing
Top-Down Parsing:
Construction of the parse tree starts at the root, and proceeds towards the leaves.
Efficient top-down parsers can be easily constructed by hand.
Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).
Bottom-Up Parsing:
Construction of the parse tree starts at the leaves, and proceeds towards the root.

Normally efficient bottom-up parsers are created with the help of some software tools.
Bottom-up parsing is also known as shift-reduce parsing.
Operator-Precedence Parsing – simple, restrictive, easy to implement
LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR

1.2.3 Semantic Analyzer

A semantic analyzer checks the source program for semantic errors and collects the type information for the code generation.
Type-checking is an important part of semantic analyzer.
Normally semantic information cannot be represented by a context-free language used in syntax analyzers.
Context-free grammars used in the syntax analysis are integrated with attributes (semantic rules). The result is a syntax-directed translation and Attribute grammars

Example: In the line of code newval := oldval + 12 , the type of the identifier newval must match with type of the expression (oldval+12).

1.2.4 Intermediate Code Generation

A compiler may produce an explicit intermediate codes representing the source program.
These intermediate codes are generally machine architecture independent. But the level of intermediate codes is close to the level of machine codes.

Example:

newval := oldval * fact + 1

id1 := id2 * id3 + 1

MULT id2, id3, temp ADD temp1, #1, temp MOV temp2, id

The last form is the Intermediates Code (Quadruples)

1.2.5 Code Optimizer

The code optimizer optimizes the code produced by the intermediate code generator in the terms of time and space.

Example: The above piece of intermediate code can be reduced as follows:

MULT id2, id3, temp ADD temp1, #1, id

1.2.6 Code Generator

Produces the target language in a specific architecture.

2. LEXICAL ANALYSIS

Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical analyzer does not return a list of tokens at one shot; it returns a token when the parser asks a token from it.

2.1 Token

Token represents a set of strings described by a pattern. For example, an identifier represents a set of strings which start with a letter continues with letters and digits. The actual string is called as lexeme.
Since a token can represent more than one lexeme, additional information should be held for that specific lexeme. This additional information is called as the attribute of the token.
For simplicity, a token may have a single attribute which holds the required information for that token. For identifiers, this attribute is a pointer to the symbol table, and the symbol table holds the actual attributes for that token.
Examples:
- <identifier, attribute> where attribute is pointer to the symbol table
- no attribute is needed
- <number, value> where value is the actual value of the number
Token type and its attribute uniquely identify a lexeme.
Regular expressions are widely used to specify patterns.

2.2 Languages

2.2.1 Terminology

Alphabet : a finite set of symbols (ASCII characters)
String : finite sequence of symbols on an alphabet
- Sentence and word are also used in terms of string
- ε is the empty string
- |s| is the length of string s.
Language: sets of strings over some fixed alphabet
- ∅ the empty set is a language.
- {ε} the set containing empty string is a language
- The set of all possible identifiers is a language.
Operators on Strings:
- Concatenation : xy represents the concatenation of strings x and y. s ε = s ε s = s
- s n = s s s .. s ( n times) s (^0) = ε

2.2.2. Operations on Languages

Concatenation: L 1 L 2 = { s 1 s 2 | s 1 ∈ L 1 and s 2 ∈ L 2 }
Union: L 1 ∪ L 2 = { s | s ∈ L 1 or s ∈ L 2 }
Exponentiation: L 0 = {ε} L 1 = L L 2 = LL
Kleene Closure: L* =
Positive Closure: L +^ =

Examples:

L 1 = {a,b,c,d} L 2 = {1,2}
L 1 L 2 = {a1,a2,b1,b2,c1,c2,d1,d2}

L 1 ∪ L 2 = {a,b,c,d,1,2}
L 13 = all strings with length three (using a,b,c,d}
L 1 *^ = all strings using letters a,b,c,d and empty string
L 1 +^ = doesn’t include the empty string

2.3 Regular Expressions and Finite Automata

2.3.1 Regular Expressions

We use regular expressions to describe tokens of a programming language.
A regular expression is built up of simpler regular expressions (using defining rules)
Each regular expression denotes a language.
A language denoted by a regular expression is called as a regular set.

For Regular Expressions over alphabet Σ

Regular Expression Language it denotes ε {ε} a∈ Σ {a} (r 1 ) | (r 2 ) L(r 1 ) ∪ L(r 2 ) (r 1 ) (r 2 ) L(r 1 ) L(r 2 ) (r) *^ (L(r)) * (r) L(r)

(r) +^ = (r)(r) *
(r)? = (r) | ε
We may remove parentheses by using precedence rules.
- - highest
- concatenation next
- | lowest
ab *|c means (a(b) *)|(c)

Examples:

Σ = {0,1}
0|1 = {0,1}
(0|1)(0|1) = {00,01,10,11}
0 *^ = {ε ,0,00,000,0000,....}
(0|1) *^ = All strings with 0 and 1, including the empty string

2.3.2 Finite Automata

A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and “no” otherwise.
We call the recognizer of the tokens as a finite automaton.
A finite automaton can be: deterministic (DFA) or non-deterministic (NFA)
This means that we may use a deterministic or non-deterministic automaton as a lexical analyzer.
Both deterministic and non-deterministic finite automaton recognize regular sets.
Which one?
- deterministic – faster recognizer, but it may take more space
- non-deterministic – slower, but it may take less space
- Deterministic automatons are widely used lexical analyzers.

Example:

The DFA to recognize the language (a|b)* ab is as follows.

Transition Graph

0 is the start state s {2} is the set of final states F Σ = {a,b} S = {0,1,2}

Transition Function:

a b

0 1 0

1 1 2

2 1 0

Note that the entries in this function are single value and not set of values (unlike NFA).

2.3.5 Converting RE to NFA (Thomson Construction)

This is one way to convert a regular expression into a NFA.
There can be other ways (much efficient) for the conversion.
Thomson’s Construction is simple and systematic method.
It guarantees that the resulting NFA will have exactly one final state, and one start state.
Construction starts from simplest parts (alphabet symbols).
To create a NFA for a complex regular expression, NFAs of its sub-expressions are combined to create its NFA.
To recognize an empty string ε:
To recognize a symbol a in the alphabet Σ:

0 1 2

a b

b a

i f

a i f

For regular expression r1 | r2:

N(r1) and N(r2) are NFAs for regular expressions r1 and r2.

For regular expression r1 r

Here, final state of N(r1) becomes the final state of N(r1r2).

For regular expression r*

Example: For a RE (a|b) * a, the NFA construction is shown below.

b b

(a | b)

ε (^) ε (a|b)

a ε (^) ε

(a|b) *^ a^ a

i N(r) f

ε ε

N(r (^) 2)

N(r (^) 1)

i f

ε ε

i N(r (^) 1) N(r (^) 2) f

Example:

S0 = ε-closure({0}) = {0,1,2,4,7} S0 into DS as an unmarked state

⇓ mark S ε-closure(move(S0,a)) = ε-closure({3,8}) = {1,2,3,4,6,7,8} = S1 S1 into DS ε-closure(move(S0,b)) = ε-closure({5}) = {1,2,4,5,6,7} = S2 S2 into DS transfunc[S0,a] Í S1 transfunc[S0,b] Í S ⇓ mark S ε-closure(move(S1,a)) = ε-closure({3,8}) = {1,2,3,4,6,7,8} = S ε-closure(move(S1,b)) = ε-closure({5}) = {1,2,4,5,6,7} = S transfunc[S1,a] Í S1 transfunc[S1,b] Í S ⇓ mark S ε-closure(move(S2,a)) = ε-closure({3,8}) = {1,2,3,4,6,7,8} = S

ε-closure(move(S2,b)) = ε-closure({5}) = {1,2,4,5,6,7} = S transfunc[S2,a] Í S1 transfunc[S2,b] Í S

S0 is the start state of DFA since 0 is a member of S0={0,1,2,4,7} S1 is an accepting state of DFA since 8 is a member of S1 = {1,2,3,4,6,7,8}

2.4 Lexical Analyzer Generator

Regular Expressions Lexical Analyzer

ε ε

0 1 a

4 5

6 7 8

S (^1)

S (^2)

S (^0)

Lexical Analyzer Generator

Source Program Tokens

LEX is an example of Lexical Analyzer Generator.

2.4.1 Input to LEX

The input to LEX consists primarily of Auxiliary Definitions and Translation Rules.
To write regular expression for some languages can be difficult, because their regular expressions can be quite complex. In those cases, we may use Auxiliary Definitions.
We can give names to regular expressions, and we can use these names as symbols to define other regular expressions.
An Auxiliary Definition is a sequence of the definitions of the form: d 1 → r (^1) d 2 → r (^2) . . d (^) n → r (^) n

where d (^) i is a distinct name and r (^) i is a regular expression over symbols in Σ ∪ {d 1 ,d 2 ,...,d (^) i-1}

basic symbols previously defined names

Example: For Identifiers in Pascal letter → A | B | ... | Z | a | b | ... | z digit → 0 | 1 | ... | 9 id → letter (letter | digit ) *

If we try to write the regular expression representing identifiers without using regular definitions, that regular expression will be complex. (A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *

Example: For Unsigned numbers in Pascal digit → 0 | 1 | ... | 9 digits → digit + opt-fraction → (. digits )? opt-exponent → ( E (+|-)? digits )? unsigned-num → digits opt-fraction opt-exponent

Translation Rules comprise of a ordered list Regular Expressions and the Program Code to be executed in case of that Regular Expression encountered.

R 1 P 1

R 2 P 2

. . R (^) n Pn

The list is ordered i.e. the RE’s should be checked in order. If a string matches more than one RE, the RE occurring higher in the list should be given preference and its Program Code is executed.

Lexical Analyzer

3. BASICS OF SYNTAX ANALYSIS

Syntax Analyzer creates the syntactic structure of the given source program.
This syntactic structure is mostly a parse tree.
Syntax Analyzer is also known as parser.
The syntax of a programming is described by a context-free grammar (CFG). We will use BNF (Backus-Naur Form) notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not. - If it satisfies, the parser creates the parse tree of that program. - Otherwise the parser gives the error messages.
A context-free grammar
- gives a precise syntactic specification of a programming language.
- the design of the grammar is an initial phase of the design of a compiler.
- a grammar can be directly converted into a parser by some tools.

3.1 Parser

Parser works on a stream of tokens.
The smallest item is a token.
We categorize the parsers into two groups:
Top-Down Parser
- the parse tree is created top to bottom, starting from the root.
Bottom-Up Parser
- the parse is created bottom to top; starting from the leaves
Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time).
Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-free grammars.
- LL for top-down parsing
- LR for bottom-up parsing

3.2 Context Free Grammars

Inherently recursive structures of a programming language are defined by a context-free grammar.
In a context-free grammar, we have:
- A finite set of terminals (in our case, this will be the set of tokens)
- A finite set of non-terminals (syntactic-variables)
- A finite set of productions rules in the following form A → α where A is a non-terminal and

α is a string of terminals and non-terminals (including the empty string)

A start symbol (one of the non-terminal symbol)
L(G) is the language of G (the language generated by G) which is a set of sentences.
A sentence of L(G) is a string of terminal symbols of G.
If S is the start symbol of G then

(a) ω is a sentence of L(G) iff S ⇒ ω where ω is a string of terminals of G.

If G is a context-free grammar, L(G) is a context-free language.

Lexical Analyzer

source Parser

program

token

get next token

parse

tree

Two grammars are equivalent if they produce the same language.
S ⇒ α
- If α contains non-terminals, it is called as a sentential form of G.
- If α does not contain non-terminals, it is called as a sentence of G.

3.2.1 Derivations

Example: (b) E → E + E | E – E | E * E | E / E | - E (c) E → ( E ) (d) E → id

E ⇒ E+E means that E+E derives from E
- we can replace E by E+E
- to able to do this, we have to have a production rule E→E+E in our grammar.
E ⇒ E+E ⇒ id+E ⇒ id+id means that a sequence of replacements of non-terminal symbols is called a derivation of id+id from E.
In general a derivation step is αAβ ⇒ αγβ if there is a production rule A→γ in our grammar where α and β are arbitrary strings of terminal and non-terminal symbols

α 1 ⇒ α 2 ⇒ ... ⇒ αn (αn derives from α 1 or α 1 derives αn )

At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement.
If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation.

Example:

E ⇒ -E ⇒ -(E) ⇒ -(E+E) ⇒ -(id+E) ⇒ -(id+id)

If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation.

Example:

E ⇒ -E ⇒ -(E) ⇒ -(E+E) ⇒ -(E+id) ⇒ -(id+id)

We will see that the top-down parsers try to find the left-most derivation of the given source program.
We will see that the bottom-up parsers try to find the right-most derivation of the given source program in the reverse order.

3.2.2 Parse Tree

Inner nodes of a parse tree are non-terminal symbols.
The leaves of a parse tree are terminal symbols.
A parse tree can be seen as a graphical representation of a derivation.

Compiler Design: LL(1) Parsing and Predictive Parsing, Lecture notes of Compiler Design

Related documents

Partial preview of the text

Download Compiler Design: LL(1) Parsing and Predictive Parsing and more Lecture notes Compiler Design in PDF only on Docsity!

COMPILER DESIGN

(TCS-502)

COURSE FILE

FOR

Bachelor of Technology

IN

Computer Science and Engineering

Session: 2007-

Department of Computer Science and Engineering

MEERUT INSTITUTE OF ENGINEERING AND TECHNOLOGY

MEERUT

PREAMBLE

SYLLABUS

LECTURE PLAN

LECTURE NOTES

01 Introduction to Compilers and its Phases

02 Lexical Analysis

03 Basics of Syntax Analysis

04 Top-Down Parsing

05 Basic Bottom-Up Parsing Techniques

06 LR Parsing

07 Syntax-Directed Translation

08 Symbol Tables

09 Run Time Administration

10 Error Detection and Recovery

11 Code Optimization

EXERCISES

Practice Questions

Examination Question Papers

Laboratory Assignments

(As laid down by Uttar Pradesh Technical University, Lucknow)

UNIT I:

UNIT II:

UNIT III:

UNIT IV:

UNIT V:

TEXTBOOK:

(PERIOD OF STUDY: From August 2007 to November 2007)

UNIT COMPETENCIES TOPICS TEACHING

AIDS

HOURS TIME OF

STUDY

AUGUST

FIRST

3 SEPTEMBER

SECOND

THIRD

OCTOBER

FOURTH

FIFTH

NOVEMBER

TOTAL NUMBER OF LECTURE HOURS FOR THE COURSE: 35

2. LEXICAL ANALYSIS

2.1 Token

2.2 Languages

2.3 Regular Expressions and Finite Automata

2.4 Lexical Analyzer Generator

R 1 P 1

R 2 P 2

3. BASICS OF SYNTAX ANALYSIS

3.1 Parser

3.2 Context Free Grammars

source Parser

token

get next token

parse

tree