Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Compiler Design: Syntax and Parsing for Exams, Exams of Compiler Design

Structure of a Compiler. ↓. Program Text. Lexer. ↓. Token Stream. Parser. ↓. Abstract Syntax Tree. Static semantics (type checking). ↓. Annotated AST.

Typology: Exams

2022/2023

Uploaded on 05/11/2023

tylar
tylar 🇺🇸

4.8

(19)

240 documents

1 / 70

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Syntax and Parsing
COMS W4115
Prof. Stephen A. Edwards
Spring 2002
Columbia University
Department of Computer Science
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46

Partial preview of the text

Download Compiler Design: Syntax and Parsing for Exams and more Exams Compiler Design in PDF only on Docsity!

Syntax and Parsing

COMS W

Prof. Stephen A. Edwards Spring 2002 Columbia University Department of Computer Science

Last Time

Administrivia

Class Project

Types of Programming Languages:

Imperative, Object-Oriented, Functional, Logic, Dataflow

The Compilation Process

Interpreters

Source Program ↓ Input → Interpreter → Output

Structure of a Compiler

↓ Program Text Lexer ↓ Token Stream Parser ↓ Abstract Syntax Tree Static semantics (type checking) ↓ Annotated AST Translation to intermediate form ↓ Three-address code Code generation ↓ Assembly Code

Compiling a Simple Program

int gcd(int a, int b) { while (a != b) { if (a > b) a -= b; else b -= a; } return a; }

After Lexical Analysis

int gcd(int a, int b) { while (a != b) { if (a > b) a -= b; else b -= a; } return a; }

int gcd ( int a , int b ) – while ( a

!= b ) – if ( a > b ) a -= b ;

else b -= a ; ˝ return a ; ˝

A stream of tokens. Whitespace, comments removed.

After Parsing

func

int gcd args

arg int a

arg int b

seq while != a b

if

a b

a b

b a

return a

int gcd(int a, int b) { while (a != b) { if (a > b) a -= b; else b -= a; } return a; }

Abstract syntax tree built from parsing rules.

After Translation into 3-Address Code

L0: sne $1, a, b seq $0, $1, 0 btrue $0, L1 % while (a != b) sl $3, b, a seq $2, $3, 0 btrue $2, L4 % if (a < b) sub a, a, b % a -= b jmp L L4: sub b, b, a % b -= a L5: jmp L L1: ret a

int gcd(int a, int b) { while (a != b) { if (a > b) a -= b; else b -= a; } return a; }

Idealized assembly language w/ infinite registers

After Translation to 80386 Assembly

gcd: pushl %ebp % Save frame pointer movl %esp,%ebp movl 8(%ebp),%eax % Load a from stack movl 12(%ebp),%edx % Load b from stack .L8: cmpl %edx,%eax je .L3 % while (a != b) jle .L5 % if (a < b) subl %edx,%eax % a -= b jmp .L .L5: subl %eax,%edx % b -= a jmp .L .L3: leave % Restore SP, BP ret

Lexical Analysis (Scanning)

Goal is to translate a stream of characters

i n t sp g c d ( i n t sp a , sp i n t sp b

into a stream of tokens

ID int

ID

gcd

LPAREN

ID

int

ID

a

COMMA

ID

int

ID

b

Each token consists of a token type and its text.

Whitespace and comments are discarded.

Lexical Analysis

Goal: simplify the job of the parser.

Scanners are usually much faster than parsers.

Discard as many irrelevant details as possible (e.g., whitespace, comments).

Parser does not care that the the identifer is “supercalifragilisticexpialidocious.”

Parser rules are only concerned with token types.

An ANTLR File for a Simple Scanner

class CalcLexer extends Lexer;

LPAREN : ’(’ ; // Rules for puctuation RPAREN : ’)’ ; STAR : ’*’ ; PLUS : ’+’ ; SEMI : ’;’ ; protected // Can only be used as a sub-rule DIGIT : ’0’..’9’ ; // Any character between 0 and 9 INT : (DIGIT)+ ; // One or more digits

WS : (’ ’ | ’\t’ | ’\n’| ’\r’) // Whitespace { $setType(Token.SKIP); } ; // Action: ignore

ANTLR Specifications for Scanners

Rules are names starting with a capital letter.

A character in single quotes matches that character.

LPAREN : ’(’ ;

A string in double quotes matches the string

IF : "if" ;

A vertical bar indicates a choice:

OP : ’+’ | ’-’ | ’*’ | ’/’ ;