Lecture 7 - CYK Algorithm and Chomsky Normal Form on Context Free Grammars

Consider the CYK algorithm for parsing context-free grammars

Context Free Grammars

A context free grammar consists of:

We define a binary relation $\rightarrow G$ on $(V \cup T)^*$ by:

image-20210222090918913

Example: CFG for generating arithmetic expressions

Variables:

Terminals:

Productions:

image-20210222091302896

We can create an expression parsing the tree using the productions we see:

image-20210222091151200

Chomsky Normal Form

A CFG is in chomsky normal form if every production $P$ is of the form $A \rightarrow BC$ or $A \rightarrow a$ where $A,B,C \in V$ and $a \in T$

For every CFG $G$ there is a CFG $G’$ in Chomsky Normal Form (CNF) such that $L(G) = L(G’)$, so we can modify any CFG to be in Chomsky Normal Form

Example productions for a regular CFG:

image-20210222092350358

After converting to Chomsky Normal Form:

image-20210222092341682

This is a more restricted form but allows more generalisation.

CYK Algorithm

Stands for Cocke-Younger-Kasami and is a parsing algorithm for context free grammars in Chomsky Normal Form

Membership Problem

For a fixed CFG in CNF, if we are given a string $s$ consisting of $n$ terminals, is there a derivation $S \Rightarrow^* s$?

We could solve this by exhaustive enumeration, but this is very inefficient.

Recurrence Relation

For $i$ and $k$ with $i \leq i \leq k \leq n$ we consider the set $V(i,k) \subseteq V$ defined by: \(V(i,k) = \{A \in V \mid A \Rightarrow ^* x_ix_{i+1}...x_k\}\) We have:

image-20210222093449326

We go through all productions to check which generates $x_i$, adding the heads of these productions to $V(i,i)$

If $i \lt k$, then we apply the production $A \rightarrow BC$, where $B$ produces the part in $V(i,j)$ and $C$ produces the part in $V(j+1,k)$.

The string $s$ is derived if $S \in V(1,n)$

Pseudocode

begin
    for i <- 1 to n do
        V(i,i) <- { A in V | (A -> xi) in P }
    for b <- 1 to n - 1 do
        for i <- 1 to n - b do
            k <- i + b
            V(i,k) <- empty
            for j <- i to k - 1 do
                for (A -> BC) in P do
                    if B in V(i, j) and C in V(j + 1, k) then
                        V(i, k) <- V(i, k) union { A }
    if S in V(1, n) then accept else reject

Example

We have a $CFG$ with $T = {a,b}$, $V = {S,A,B,C}$ and the productions:

image-20210222094019622

To compute $s = baaba$ we compute the values $V(i,k)$. We can trace this with a table, starting with the diagonals:

image-20210222094057640

Next we look at $i = k-1$:

image-20210222094150327

$V(1,3) = \empty$ since $BB$ (combining the black boxes) is not in the productions, and neither is any combination of blue boxes ($SA, AA, SC, AC$).

image-20210222094500316

Next time, on Algorithms II

Travelling salesman lmao