Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Compiler Design: LL(1) Parsing and Predictive Parsing, Lecture notes of Compiler Design

Btech compiler design notes for students to study just before exams

Typology: Lecture notes

2018/2019
On special offer
30 Points
Discount

Limited-time offer


Uploaded on 04/28/2019

vaibhav-verma-1
vaibhav-verma-1 🇮🇳

5

(2)

1 document

1 / 114

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Prepared by MOHIT KUMAR for and on behalf of Meerut Institute of Engineering and Technology, Meerut.
COMPILER DESIGN
(TCS-502)
COURSE FILE
FOR
Bachelor of Technology
IN
Computer Science and Engineering
Session: 2007-2008
Department of Computer Science and Engineering
MEERUT INSTITUTE OF ENGINEERING AND TECHNOLOGY
MEERUT
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64
Discount

On special offer

Partial preview of the text

Download Compiler Design: LL(1) Parsing and Predictive Parsing and more Lecture notes Compiler Design in PDF only on Docsity!

Prepared by MOHIT KUMAR for and on behalf of Meerut Institute of Engineering and Technology, Meerut.

COMPILER DESIGN

(TCS-502)

COURSE FILE

FOR

Bachelor of Technology

IN

Computer Science and Engineering

Session: 2007-

Department of Computer Science and Engineering

MEERUT INSTITUTE OF ENGINEERING AND TECHNOLOGY

MEERUT

MIET TCS-502 COMPILER DESIGN Course File II

CONTENTS

PREAMBLE

SYLLABUS

LECTURE PLAN

LECTURE NOTES

01 Introduction to Compilers and its Phases

02 Lexical Analysis

03 Basics of Syntax Analysis

04 Top-Down Parsing

05 Basic Bottom-Up Parsing Techniques

06 LR Parsing

07 Syntax-Directed Translation

08 Symbol Tables

09 Run Time Administration

10 Error Detection and Recovery

11 Code Optimization

EXERCISES

Practice Questions

Examination Question Papers

Laboratory Assignments

MIET TCS-502 COMPILER DESIGN Course File IV

SYLLABUS

(As laid down by Uttar Pradesh Technical University, Lucknow)

UNIT I:

Introduction to Compiler: Phases and passes, Bootstrapping, Finite state machines and regular expressions and their applications to lexical analysis, Implementation of lexical analyzers, lexical- analyzer generator, LEX-compiler, Formal grammars and their application to syntax analysis, BNF notation, ambiguity, YACC. The syntactic specification of programming languages: Context free grammars, derivation and parse trees, capabilities of CFG.

UNIT II:

Basic Parsing Techniques: Parsers, Shift reduce parsing, operator precedence parsing, top down parsing, predictive parsers Automatic Construction of efficient Parsers: LR parsers, the canonical Collection of LR (O) items, constructing SLR parsing tables, constructing Canonical LR parsing tables, Constructing LALR parsing tables, using ambiguous grammars, an automatic parser generator, implementation of LR parsing tables, constructing LALR sets of items.

UNIT III:

Syntax-directed Translation: Syntax-directed Translation schemes, Implementation of Syntax- directed Translators, Intermediate code, postfix notation, Parse trees & syntax trees, three address code, quadruple & triples, translation of assignment statements, Boolean expressions, statements that alter the flow of control, postfix translation, translation with a top down parser. More about translation: Array references in arithmetic expressions, procedures call, declarations, case statements.

UNIT IV:

Symbol Tables: Data structure for symbols tables, representing scope information. Run-Time Administration: Implementation of simple stack allocation scheme, storage allocation in block structured language. Error Detection & Recovery: Lexical Phase errors, syntactic phase errors semantic errors.

UNIT V:

Introduction to code optimization: Loop optimization, the DAG representation of basic blocks, value numbers and algebraic laws, Global Data-Flow analysis.

TEXTBOOK:

Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, “ Compilers: Principles, Techniques, and Tools ” Addison-Wesley.

MIET TCS-502 COMPILER DESIGN Course File V

LECTURE PLAN

(PERIOD OF STUDY: From August 2007 to November 2007)

UNIT COMPETENCIES TOPICS TEACHING
AIDS
HOURS TIME OF
STUDY

Introduction to the subject, its pre-requisites, objectives, content and plan of study

  1. Introduction to compiler and its phases
  1. Lexical Analysis 3
AUGUST
FIRST

Basic Concepts of the compiler, overview of the Passes, Phases, Lexical Analyzers, CFG. 3.^ Basics of Syntax Analysis

  1. Top Down Parsing 4
  2. Basic Bottom Up Parsing Techniques
3 SEPTEMBER
SECOND

Basic idea about the different types of Parsers and their working Mechanism.

  1. LR Parsing 6
THIRD

Internal details about translation, actions to be attached to productions that shall produce the desired code.

  1. Syntax Directed Translation
OCTOBER
  1. Symbol Tables 1
  2. Run Time Storage Organization
FOURTH

Data structures related with the compiler, scope of the information stored and the possible errors that may arise.

  1. Error Detection and Recovery
FIFTH

(^) Optimization techniques that are related with the compiler process.

  1. Code Optimization Transparencies on Overhead Projector OR PowerPoint Presentation on LCD Projectors
NOVEMBER
TOTAL NUMBER OF LECTURE HOURS FOR THE COURSE: 35
  • Puts information about identifiers into the symbol table.
  • Regular expressions are used to describe tokens (lexical constructs).
  • A (Deterministic) Finite State Automaton can be used in the implementation of a lexical analyzer.

1.2.2 Syntax Analyzer

  • A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program.
  • A syntax analyzer is also called a parser.
  • A parse tree describes a syntactic structure.

Example: For the line of code newval := oldval + 12 , parse tree will be:

assignment

identifier := expression

newval expression + expression

identifier number

oldval 12

  • The syntax of a language is specified by a context free grammar (CFG).
  • The rules in a CFG are mostly recursive.
  • A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not. - If it satisfies, the syntax analyzer creates a parse tree for the given program.

Example: CFG used for the above parse tree is: assignment Æ identifier := expression expression Æ identifier expression Æ number expression Æ expression + expression

  • Depending on how the parse tree is created, there are different parsing techniques.
  • These parsing techniques are categorized into two groups:
    • Top-Down Parsing,
    • Bottom-Up Parsing
  • Top-Down Parsing:
  • Construction of the parse tree starts at the root, and proceeds towards the leaves.
  • Efficient top-down parsers can be easily constructed by hand.
  • Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).
  • Bottom-Up Parsing:
  • Construction of the parse tree starts at the leaves, and proceeds towards the root.
  • Normally efficient bottom-up parsers are created with the help of some software tools.
  • Bottom-up parsing is also known as shift-reduce parsing.
  • Operator-Precedence Parsing – simple, restrictive, easy to implement
  • LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR

1.2.3 Semantic Analyzer

  • A semantic analyzer checks the source program for semantic errors and collects the type information for the code generation.
  • Type-checking is an important part of semantic analyzer.
  • Normally semantic information cannot be represented by a context-free language used in syntax analyzers.
  • Context-free grammars used in the syntax analysis are integrated with attributes (semantic rules). The result is a syntax-directed translation and Attribute grammars

Example: In the line of code newval := oldval + 12 , the type of the identifier newval must match with type of the expression (oldval+12).

1.2.4 Intermediate Code Generation

  • A compiler may produce an explicit intermediate codes representing the source program.
  • These intermediate codes are generally machine architecture independent. But the level of intermediate codes is close to the level of machine codes.

Example:

newval := oldval * fact + 1

id1 := id2 * id3 + 1

MULT id2, id3, temp ADD temp1, #1, temp MOV temp2, id

The last form is the Intermediates Code (Quadruples)

1.2.5 Code Optimizer

  • The code optimizer optimizes the code produced by the intermediate code generator in the terms of time and space.

Example: The above piece of intermediate code can be reduced as follows:

MULT id2, id3, temp ADD temp1, #1, id

1.2.6 Code Generator

  • Produces the target language in a specific architecture.

2. LEXICAL ANALYSIS

ƒ Lexical Analyzer reads the source program character by character to produce tokens. ƒ Normally a lexical analyzer does not return a list of tokens at one shot; it returns a token when the parser asks a token from it.

2.1 Token

  • Token represents a set of strings described by a pattern. For example, an identifier represents a set of strings which start with a letter continues with letters and digits. The actual string is called as lexeme.
  • Since a token can represent more than one lexeme, additional information should be held for that specific lexeme. This additional information is called as the attribute of the token.
  • For simplicity, a token may have a single attribute which holds the required information for that token. For identifiers, this attribute is a pointer to the symbol table, and the symbol table holds the actual attributes for that token.
  • Examples:
    • <identifier, attribute> where attribute is pointer to the symbol table
    • no attribute is needed
    • <number, value> where value is the actual value of the number
  • Token type and its attribute uniquely identify a lexeme.
  • Regular expressions are widely used to specify patterns.

2.2 Languages

2.2.1 Terminology

  • Alphabet : a finite set of symbols (ASCII characters)
  • String : finite sequence of symbols on an alphabet
    • Sentence and word are also used in terms of string
    • ε is the empty string
    • |s| is the length of string s.
  • Language: sets of strings over some fixed alphabet
    • ∅ the empty set is a language.
    • {ε} the set containing empty string is a language
    • The set of all possible identifiers is a language.
  • Operators on Strings:
    • Concatenation : xy represents the concatenation of strings x and y. s ε = s ε s = s
    • s n = s s s .. s ( n times) s (^0) = ε

2.2.2. Operations on Languages

  • Concatenation: L 1 L 2 = { s 1 s 2 | s 1 ∈ L 1 and s 2 ∈ L 2 }
  • Union: L 1 ∪ L 2 = { s | s ∈ L 1 or s ∈ L 2 }
  • Exponentiation: L 0 = {ε} L 1 = L L 2 = LL
  • Kleene Closure: L* =
  • Positive Closure: L +^ =

Examples:

  • L 1 = {a,b,c,d} L 2 = {1,2}
  • L 1 L 2 = {a1,a2,b1,b2,c1,c2,d1,d2}
  • L 1 ∪ L 2 = {a,b,c,d,1,2}
  • L 13 = all strings with length three (using a,b,c,d}
  • L 1 *^ = all strings using letters a,b,c,d and empty string
  • L 1 +^ = doesn’t include the empty string

2.3 Regular Expressions and Finite Automata

2.3.1 Regular Expressions

  • We use regular expressions to describe tokens of a programming language.
  • A regular expression is built up of simpler regular expressions (using defining rules)
  • Each regular expression denotes a language.
  • A language denoted by a regular expression is called as a regular set.

For Regular Expressions over alphabet Σ

Regular Expression Language it denotes ε {ε} a∈ Σ {a} (r 1 ) | (r 2 ) L(r 1 ) ∪ L(r 2 ) (r 1 ) (r 2 ) L(r 1 ) L(r 2 ) (r) *^ (L(r)) * (r) L(r)

  • (r) +^ = (r)(r) *
  • (r)? = (r) | ε
  • We may remove parentheses by using precedence rules.
      • highest
    • concatenation next
    • | lowest
  • ab *|c means (a(b) *)|(c)

Examples:

  • Σ = {0,1}
  • 0|1 = {0,1}
  • (0|1)(0|1) = {00,01,10,11}
  • 0 *^ = {ε ,0,00,000,0000,....}
  • (0|1) *^ = All strings with 0 and 1, including the empty string

2.3.2 Finite Automata

  • A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and “no” otherwise.
  • We call the recognizer of the tokens as a finite automaton.
  • A finite automaton can be: deterministic (DFA) or non-deterministic (NFA)
  • This means that we may use a deterministic or non-deterministic automaton as a lexical analyzer.
  • Both deterministic and non-deterministic finite automaton recognize regular sets.
  • Which one?
    • deterministic – faster recognizer, but it may take more space
    • non-deterministic – slower, but it may take less space
    • Deterministic automatons are widely used lexical analyzers.

Example:

The DFA to recognize the language (a|b)* ab is as follows.

Transition Graph

0 is the start state s {2} is the set of final states F Σ = {a,b} S = {0,1,2}

Transition Function:

a b

0 1 0

1 1 2

2 1 0

Note that the entries in this function are single value and not set of values (unlike NFA).

2.3.5 Converting RE to NFA (Thomson Construction)

  • This is one way to convert a regular expression into a NFA.
  • There can be other ways (much efficient) for the conversion.
  • Thomson’s Construction is simple and systematic method.
  • It guarantees that the resulting NFA will have exactly one final state, and one start state.
  • Construction starts from simplest parts (alphabet symbols).
  • To create a NFA for a complex regular expression, NFAs of its sub-expressions are combined to create its NFA.
  • To recognize an empty string ε:
  • To recognize a symbol a in the alphabet Σ:

0 1 2

a b

a

b

b a

i f

ε

a i f

  • For regular expression r1 | r2:

N(r1) and N(r2) are NFAs for regular expressions r1 and r2.

  • For regular expression r1 r

Here, final state of N(r1) becomes the final state of N(r1r2).

  • For regular expression r*

Example: For a RE (a|b) * a, the NFA construction is shown below.

a

a

b b

(a | b)

a

b

ε

ε

ε

ε

b

ε

ε

ε

ε

a

ε

ε (^) ε (a|b)

ε

ε

b

ε

ε

ε

ε

a ε (^) ε

ε

(a|b) *^ a^ a

i N(r) f

ε ε

ε

ε

N(r (^) 2)

N(r (^) 1)

i f

ε ε

ε

ε

i N(r (^) 1) N(r (^) 2) f

Example:

S0 = ε-closure({0}) = {0,1,2,4,7} S0 into DS as an unmarked state

⇓ mark S ε-closure(move(S0,a)) = ε-closure({3,8}) = {1,2,3,4,6,7,8} = S1 S1 into DS ε-closure(move(S0,b)) = ε-closure({5}) = {1,2,4,5,6,7} = S2 S2 into DS transfunc[S0,a] Í S1 transfunc[S0,b] Í S ⇓ mark S ε-closure(move(S1,a)) = ε-closure({3,8}) = {1,2,3,4,6,7,8} = S ε-closure(move(S1,b)) = ε-closure({5}) = {1,2,4,5,6,7} = S transfunc[S1,a] Í S1 transfunc[S1,b] Í S ⇓ mark S ε-closure(move(S2,a)) = ε-closure({3,8}) = {1,2,3,4,6,7,8} = S

ε-closure(move(S2,b)) = ε-closure({5}) = {1,2,4,5,6,7} = S transfunc[S2,a] Í S1 transfunc[S2,b] Í S

S0 is the start state of DFA since 0 is a member of S0={0,1,2,4,7} S1 is an accepting state of DFA since 8 is a member of S1 = {1,2,3,4,6,7,8}

2.4 Lexical Analyzer Generator

Regular Expressions Lexical Analyzer

b

ε

ε

ε

ε

a

ε ε

ε

0 1 a

3

4 5

2

6 7 8

b

a

a

b

b

a

S (^1)

S (^2)

S (^0)

Lexical Analyzer Generator

Source Program Tokens

LEX is an example of Lexical Analyzer Generator.

2.4.1 Input to LEX

  • The input to LEX consists primarily of Auxiliary Definitions and Translation Rules.
  • To write regular expression for some languages can be difficult, because their regular expressions can be quite complex. In those cases, we may use Auxiliary Definitions.
  • We can give names to regular expressions, and we can use these names as symbols to define other regular expressions.
  • An Auxiliary Definition is a sequence of the definitions of the form: d 1 → r (^1) d 2 → r (^2) . . d (^) n → r (^) n

where d (^) i is a distinct name and r (^) i is a regular expression over symbols in Σ ∪ {d 1 ,d 2 ,...,d (^) i-1}

basic symbols previously defined names

Example: For Identifiers in Pascal letter → A | B | ... | Z | a | b | ... | z digit → 0 | 1 | ... | 9 id → letter (letter | digit ) *

If we try to write the regular expression representing identifiers without using regular definitions, that regular expression will be complex. (A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *

Example: For Unsigned numbers in Pascal digit → 0 | 1 | ... | 9 digits → digit + opt-fraction → (. digits )? opt-exponent → ( E (+|-)? digits )? unsigned-num → digits opt-fraction opt-exponent

  • Translation Rules comprise of a ordered list Regular Expressions and the Program Code to be executed in case of that Regular Expression encountered.
R 1 P 1
R 2 P 2

. . R (^) n Pn

  • The list is ordered i.e. the RE’s should be checked in order. If a string matches more than one RE, the RE occurring higher in the list should be given preference and its Program Code is executed.

Lexical Analyzer

3. BASICS OF SYNTAX ANALYSIS

  • Syntax Analyzer creates the syntactic structure of the given source program.
  • This syntactic structure is mostly a parse tree.
  • Syntax Analyzer is also known as parser.
  • The syntax of a programming is described by a context-free grammar (CFG). We will use BNF (Backus-Naur Form) notation in the description of CFGs.
  • The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not. - If it satisfies, the parser creates the parse tree of that program. - Otherwise the parser gives the error messages.
  • A context-free grammar
    • gives a precise syntactic specification of a programming language.
    • the design of the grammar is an initial phase of the design of a compiler.
    • a grammar can be directly converted into a parser by some tools.

3.1 Parser

  • Parser works on a stream of tokens.
  • The smallest item is a token.
  • We categorize the parsers into two groups:
  • Top-Down Parser
    • the parse tree is created top to bottom, starting from the root.
  • Bottom-Up Parser
    • the parse is created bottom to top; starting from the leaves
  • Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time).
  • Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-free grammars.
    • LL for top-down parsing
    • LR for bottom-up parsing

3.2 Context Free Grammars

  • Inherently recursive structures of a programming language are defined by a context-free grammar.
  • In a context-free grammar, we have:
    • A finite set of terminals (in our case, this will be the set of tokens)
    • A finite set of non-terminals (syntactic-variables)
    • A finite set of productions rules in the following form A → α where A is a non-terminal and

α is a string of terminals and non-terminals (including the empty string)

  • A start symbol (one of the non-terminal symbol)
  • L(G) is the language of G (the language generated by G) which is a set of sentences.
  • A sentence of L(G) is a string of terminal symbols of G.
  • If S is the start symbol of G then

(a) ω is a sentence of L(G) iff S ⇒ ω where ω is a string of terminals of G.

  • If G is a context-free grammar, L(G) is a context-free language.

Lexical Analyzer

source Parser

program

token

get next token

parse

tree

  • Two grammars are equivalent if they produce the same language.
  • S ⇒ α
    • If α contains non-terminals, it is called as a sentential form of G.
    • If α does not contain non-terminals, it is called as a sentence of G.

3.2.1 Derivations

Example: (b) E → E + E | E – E | E * E | E / E | - E (c) E → ( E ) (d) E → id

  • E ⇒ E+E means that E+E derives from E
    • we can replace E by E+E
    • to able to do this, we have to have a production rule E→E+E in our grammar.
  • E ⇒ E+E ⇒ id+E ⇒ id+id means that a sequence of replacements of non-terminal symbols is called a derivation of id+id from E.
  • In general a derivation step is αAβ ⇒ αγβ if there is a production rule A→γ in our grammar where α and β are arbitrary strings of terminal and non-terminal symbols

α 1 ⇒ α 2 ⇒ ... ⇒ αn (αn derives from α 1 or α 1 derives αn )

  • At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement.
  • If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation.

Example:

E ⇒ -E ⇒ -(E) ⇒ -(E+E) ⇒ -(id+E) ⇒ -(id+id)

  • If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation.

Example:

E ⇒ -E ⇒ -(E) ⇒ -(E+E) ⇒ -(E+id) ⇒ -(id+id)

  • We will see that the top-down parsers try to find the left-most derivation of the given source program.
  • We will see that the bottom-up parsers try to find the right-most derivation of the given source program in the reverse order.

3.2.2 Parse Tree

  • Inner nodes of a parse tree are non-terminal symbols.
  • The leaves of a parse tree are terminal symbols.
  • A parse tree can be seen as a graphical representation of a derivation.