






























































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Structure of a Compiler. ↓. Program Text. Lexer. ↓. Token Stream. Parser. ↓. Abstract Syntax Tree. Static semantics (type checking). ↓. Annotated AST.
Typology: Exams
1 / 70
This page cannot be seen from the preview
Don't miss anything!
Prof. Stephen A. Edwards Spring 2002 Columbia University Department of Computer Science
Administrivia
Class Project
Types of Programming Languages:
Imperative, Object-Oriented, Functional, Logic, Dataflow
Source Program ↓ Input → Interpreter → Output
↓ Program Text Lexer ↓ Token Stream Parser ↓ Abstract Syntax Tree Static semantics (type checking) ↓ Annotated AST Translation to intermediate form ↓ Three-address code Code generation ↓ Assembly Code
int gcd(int a, int b) { while (a != b) { if (a > b) a -= b; else b -= a; } return a; }
int gcd(int a, int b) { while (a != b) { if (a > b) a -= b; else b -= a; } return a; }
int gcd ( int a , int b ) – while ( a
!= b ) – if ( a > b ) a -= b ;
else b -= a ; ˝ return a ; ˝
A stream of tokens. Whitespace, comments removed.
func
int gcd args
arg int a
arg int b
seq while != a b
if
a b
a b
b a
return a
int gcd(int a, int b) { while (a != b) { if (a > b) a -= b; else b -= a; } return a; }
Abstract syntax tree built from parsing rules.
L0: sne $1, a, b seq $0, $1, 0 btrue $0, L1 % while (a != b) sl $3, b, a seq $2, $3, 0 btrue $2, L4 % if (a < b) sub a, a, b % a -= b jmp L L4: sub b, b, a % b -= a L5: jmp L L1: ret a
int gcd(int a, int b) { while (a != b) { if (a > b) a -= b; else b -= a; } return a; }
Idealized assembly language w/ infinite registers
gcd: pushl %ebp % Save frame pointer movl %esp,%ebp movl 8(%ebp),%eax % Load a from stack movl 12(%ebp),%edx % Load b from stack .L8: cmpl %edx,%eax je .L3 % while (a != b) jle .L5 % if (a < b) subl %edx,%eax % a -= b jmp .L .L5: subl %eax,%edx % b -= a jmp .L .L3: leave % Restore SP, BP ret
Goal is to translate a stream of characters
i n t sp g c d ( i n t sp a , sp i n t sp b
into a stream of tokens
ID int
gcd
int
a
int
b
Each token consists of a token type and its text.
Whitespace and comments are discarded.
Goal: simplify the job of the parser.
Scanners are usually much faster than parsers.
Discard as many irrelevant details as possible (e.g., whitespace, comments).
Parser does not care that the the identifer is “supercalifragilisticexpialidocious.”
Parser rules are only concerned with token types.
class CalcLexer extends Lexer;
LPAREN : ’(’ ; // Rules for puctuation RPAREN : ’)’ ; STAR : ’*’ ; PLUS : ’+’ ; SEMI : ’;’ ; protected // Can only be used as a sub-rule DIGIT : ’0’..’9’ ; // Any character between 0 and 9 INT : (DIGIT)+ ; // One or more digits
WS : (’ ’ | ’\t’ | ’\n’| ’\r’) // Whitespace { $setType(Token.SKIP); } ; // Action: ignore
Rules are names starting with a capital letter.
A character in single quotes matches that character.
LPAREN : ’(’ ;
A string in double quotes matches the string
IF : "if" ;
A vertical bar indicates a choice:
OP : ’+’ | ’-’ | ’*’ | ’/’ ;