\( \newcommand{\matr}[1] {\mathbf{#1}} \newcommand{\vertbar} {\rule[-1ex]{0.5pt}{2.5ex}} \newcommand{\horzbar} {\rule[.5ex]{2.5ex}{0.5pt}} \newcommand{\E} {\mathrm{E}} \)
deepdream of
          a sidewalk
Show Question
\( \newcommand{\cat}[1] {\mathrm{#1}} \newcommand{\catobj}[1] {\operatorname{Obj}(\mathrm{#1})} \newcommand{\cathom}[1] {\operatorname{Hom}_{\cat{#1}}} \newcommand{\multiBetaReduction}[0] {\twoheadrightarrow_{\beta}} \newcommand{\betaReduction}[0] {\rightarrow_{\beta}} \newcommand{\betaEq}[0] {=_{\beta}} \newcommand{\string}[1] {\texttt{"}\mathtt{#1}\texttt{"}} \newcommand{\symbolq}[1] {\texttt{`}\mathtt{#1}\texttt{'}} \newcommand{\groupMul}[1] { \cdot_{\small{#1}}} \newcommand{\groupAdd}[1] { +_{\small{#1}}} \newcommand{\inv}[1] {#1^{-1} } \newcommand{\bm}[1] { \boldsymbol{#1} } \require{physics} \require{ams} \require{mathtools} \)
Math and science::Theory of Computation

Context-free languages

Context-free languages

A context-free language is a language that can be generated by some context-free grammar.

Below, we define the pieces involved in this definition.

Grammar

A grammar can be considered an input datum of a specific type of string rewrite system that takes in input in the form of a list of production rules.

Below are some examples of some production rules:

\[ \begin{align*} S &\to aSb && (\text{context-free}) \\ S &\to AB && (\text{context-free}) \\ aA &\to C && (\text{not context-free}) \\ bAc &\to B && (\text{not context-free}) \\ CA &\to cA && (\text{not context-free}) \\ \end{align*} \]

A context-free grammar is one type of grammar.

Context-free grammar (my definition)

A grammar is context-free iff all production rules have a LHS consisting of a single symbol.

See the back side for the standard definition and an explanation as to why I think it contains redundancies.

A quality of context-free grammars is that no matter what order rewrites are carried out, the same language will be generated.

The implicit grammar machine

In my opinion, the concept of a context-free grammar cannot be defined without reference to the machine that takes the rules as input and carries out the re-writes. The way the rules are written strongly hint as to the operation of the machine taking them as input. I make some more remarks about this on the back side.


Standard definition

The following is a more typical definition of a context-free grammar.

Context-free grammar (standard definition)

A grammar is context-free iff all production rules have a LHS consisting of a single non-terminal symbol.

I think this definition is superfluous in its reference to terminal symbols. Any production rule set with only single symbols on the LHS implicitly defines it's "non-terminal" symbols as all such LHS symbols. Thus, there is no need for defining any notion of terminal/non-terminal symbols.

The below section is some thoughts on flaws of common formal definitions of context-free grammars.

The implicit grammar machine

Grammars are often discussed without making explicit the fact that an algorithm for generating strings is being described. A grammar is the input to one of these machines, and so one can't properly describe a grammar without describing how the machine works. The Sipser book and the Hopcroft book both fail to explicitly identify the algorithm they are relying on, and consequently, their definition of a grammar contains seemingly redundant variables.

My attempt at a medium level description of a grammar machine is that it is a string rewrite system of the following form. The machine takes in a list of tuples of the form [(string1, string2), (string1, string2), ...]. The first tuple is expected to be the starting rule which will kick-start the string generation loop. An initially empty list of intermediate string is maintained. The second string in the first tuple is added to this intermediate list. Then, the following loop continues until a loop has no effect: loop over every intermediate string, loop over every tuple, carry out a rewrite if a tuple's first string matches a substring in the input and add the generated string back to the intermediate list of strings. The resulting intermediate list of strings becomes the output.

In order for the machine to function correctly, the input will need to be encoded according to whatever encoding is expected. This encoding will place some restrictions on the tuples. For example, for a machine that works only with context-free grammars, the first string of each tuple must be a single symbol; this insures that the rule has no context and that the symbol will never appear in the output strings.

With this type of definition, the definition by Sipser seems to includes unnecessary components: variables, terminals and the start variable. Moving these quantities into the definition of the machine does not detract from the generality of the machine and cleans up the definition of a grammar.

Context-free languages vs regular languages

The three main concepts are context-free grammars, context-free languages and pushdown automata. In my opinion, what context-free grammars are are input to a program that generates a context-free language. In this way, grammars paired with the implicitly described machine is a generating machine while the pushdown automata is a deciding machine. When compared to finite automata, context-free languages are to regular languages, pushdown automata are to non-deterministic finite state machines, and there isn't an obvious equivalent of context-free grammars. One attempt to find an equivalent concept to context-free grammars is to consider an algorithm that moves over a finite state machine in a depth-first or breath-first manner and generates all strings of the language.

An interesting remark about the above comparison between regular and context-free languages is that pushdown automata do not have an equivalently powerful non-deterministic variant. Instead, a non-deterministic pushdown automata is distinctly less powerful and cannot decide all context-free languages. This raises an interesting question as to whether maybe there is an important intermediate language type between regular and context-free languages.

Context

grammar → context-free grammar → context-free language → context-free language closure → Chomsky normal form → pushdown automaton → deterministic pushdown automaton