




























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An in-depth exploration of various techniques for normalizing context-free grammars (cfgs), including eliminating useless variables, testing whether a variable derives some terminal string, and reaching the chomsky normal form. Concepts such as normal forms, nullable variables, and epsilon productions, and includes examples and proofs to illustrate the concepts.
Typology: Slides
1 / 36
This page cannot be seen from the preview
Don't miss anything!
1
Eliminating Useless Variables
Removing Epsilon
Removing Unit Productions
Chomsky Normal Form
2
Consider: S -> AB, A -> aA | a, B -> AB
Although A derives all strings of a’s, B derives no terminal strings (can youprove this fact?).
Thus, S derives nothing, and the language is empty.
4
Eventually, we can find no more variables.
An easy induction on the order in which variables are discovered shows thateach one truly derives a terminal string.
Conversely, any variable that derives a terminal string will be discovered by thisalgorithm.
5
The proof is an induction on the height of the least-height parse tree by whicha variable A derives a terminal string.
Basis: Height = 1.
Tree looks like:
Then the basis of the algorithm tells us that A will be discovered.
A
a
an
...
Docsity.com
7
Discover all variables that deriveterminal strings.
For all other variables, remove allproductions in which they appeareither on the left or the right.
8
S -> AB | C, A -> aA | a, B -> bB, C -> c
Basis: A and C are identified becauseof A -> a and C -> c.
Induction: S is identified because ofS -> C.
Nothing else can be identified.
Result: S -> C, A -> aA | a, C -> c
10
Easy inductions in both directions show that when we can discover no moresymbols, then we have all and only thesymbols that appear in derivations from S.
Algorithm: Remove from the grammar all symbols not discovered reachable from Sand all productions that involve thesesymbols.
11
A symbol is
if it appears in
some derivation of some terminalstring from the start symbol.
Otherwise, it is
Eliminate all useless symbols by:
Eliminate symbols that derive no terminalstring.
Eliminate unreachable symbols.
13
After step (1), every symbol remaining derives some terminal string.
After step (2) the only symbols remaining are all derivable from S.
In addition, they still derive a terminal string, because such a derivation canonly involve symbols reachable from S.
14
We can almost avoid using productions of the form A ->
ε
(called
The problem is that
ε
cannot be in the
language of any grammar that has no
ε
productions.
Theorem: If L is a CFL, then L-{
ε
} has a
CFG with no
ε
-productions.
16
S -> AB, A -> aA |
ε
, B -> bB | A
Basis: A is nullable because of A ->
ε
Induction: B is nullable because of B -> A.
Then, S is nullable because of S -> AB.
17
The proof that this algorithm finds all and only the nullable variables is verymuch like the proof that the algorithmfor symbols that derive terminal stringsworks.
Do you see the two directions of the proof?
On what is each induction?
19
S -> ABC, A -> aA |
ε
, B -> bB |
ε
ε
A, B, C, and S are all nullable.
New grammar: S -> ABC | AB | AC | BC | A | B | CA -> aA | aB -> bB | b
Note: C is now useless.Eliminate its productions.
20
Prove that for all variables A:
If w
ε
and A =>*
old
w, then A =>*
new
w.
If A =>*
new
w then w
ε
and A =>*
old
w.
Then, letting A be the start symbolproves that L(new) = L(old) – {
ε
(1) is an induction on the number ofsteps by which A derives w in the oldgrammar.