Math and science::Theory of Computation

Pumping lemma for regular languages

The pumping lemma is useful for showing that a language cannot be represented by a finite state machine.

Can a language be accepted by an automaton?

There are two cases.

If a language contains only finite strings, we can create a finite automaton that handles each case.
If a language contains infinite strings, we look to the pumping lemma to answer this question.

The gist of the lemma

The gist of the pumping lemma is: if a finite set of states is able to represent an infinite string, then during the state transitions between start and accept states, there must be a loop at some point. This property can be used to show, by contradiction, that some languages cannot be represented by finite state automata.

Pumping lemma

If \( A \) is a regular language, then there is a number \( p \) (the pumping length) where if \( s \) is any string in \( A \) of length at least \( p \), then \( s \) may be divided into three pieces, \( s = xyz \), satisfying the following three conditions:

for each \( i \ge 0, xy^iz \in A \)
\( |y| \ge 0 \)
\( |xy| \le p \)

Interpreting the 3 criteria

The definition can be interpreted as: we know there is a loop of some unknown length in the automata. There is some length such that all strings equal to or longer than this length must 'use' the loop for acceptance (the number of states in the FSA is an upper bound, but there might be a smaller upper bound). Let \( p \) be one such length. If there is a string of the language that is larger than \( p \) then it must passes through some number of states before the loop, the loop, and some number of states after the loop. The three conditions correspond to the statements:

We can break the string into \( xy^iz \), where \( x \) passes through the states before the loop, \( y \) corresponds to the loop and \( z \) passes through the states after the loop.
The loop exists, so has a non-zero length (rule out the degenerate case).
If the string goes through the loop multiple times, there are multiple ways of splitting it into the \( xy^iz \) pieces, by moving one or more of the loop sections into the \( x \) or \( z \) portions. The 3rd condition says that it's possible to do the division into \( xy^iz \) without placing any of the looped section in the \( x \) portion. This is useful for showing some languages to be non-regular. With this in mind, the length \( p \) is the combined length of the beginning and the kernel of the repeated section.

My guess is that the 3rd criteria could be expanded to cover the case where we don't put any looped sections in the \( z \) portion either.

Example

Intuition is misleading

The following two examples highlight the need for the pumping lemma—our intuition is not reliable.

Language A = \( \{w \mid w \text{ has an equal number of 0s and 1s} \} \)
Language B = \( \{w \mid w \text{ has an equal number of 01 and 10 substrings} \} \)

Language A is non-regular, language B is regular.

\( \{0^n1^n \mid n \ge 0\} \)

This is a well known example of a non-regular language, proved so using the pumping lemma. The proof outline goes like so:

Let \( p \) be the[a?] pumping length that must exist by the lemma.
Consider the string \( s = 0^p1^p \).
We try to split \( s \) into \( xy^iz \), but find that we can't because,
\( y \) must contain a mix of 0s and 1s, otherwise \( |xy| \) and \( |y| \) wouldn't meet criteria 1 and 2, but
you can't repeat \( y \) if it contains a mix of 0s and 1s, as the language requirement is such that no zeros can follow any 1.
The contradiction implies that the language cannot be represented by a FSA.

Source

Introduction to the Theory of Computation, Sipser

This video quickly conveys the intuitive idea.