Origin of Lebesgue Integration

This article follows the steps of Henri Lebesgue as he came upon his theory of integration. The story could be started earlier, but we don't lose too much by starting with Borel, Lebesgue's adviser, at the end of the 19^th century.

Borel and the measure of a set

At the end of the 19^th century, Émile Borel was thinking about the problem of measure, that is, the problem of describing the size of things. In geometry, the measure of a solid body in 1, 2 and 3 dimensions is referred to as the length, area or volume of the solid body, respectively. This idea of measure is deeply intuitive; we are giving a single number to an object, and that number is larger for larger objects. If we loosen the restriction to solid bodies, we may extend this notion to a more general setting. We may ask: what is the measure of an arbitrary subset in 1, 2 and 3 dimensions? Specifically, we are asking for the size of a subset of \( \mathbb{R} \), a subset of \( \mathbb{R}^2 \) or a subset of \( \mathbb{R}^3 \). What would it even mean for an arbitrary set to have a measure? From the geometry lens, it might seem reasonable to give a set like the interval \( (1, 3) \) a measure of 2—this would match the idea of length in 1 dimension. But what about a more complicated set like the set of all rational numbers between 1 and 3? The measure of this set is not so obvious. It's not even obvious if it makes sense for this set to have a measure. These are the types of questions Borel sought to answer.

Borel was searching for a general definition for the measure of a set. He restricted his focus to the 1 dimension case: subsets of \( \mathbb{R} \), such as the interval \( (1, 2) \subset \mathbb{R} \). He was interested in assigning to any subset \( A \subset \mathbb{R} \) a real number \( m_A \in \mathbb{R} \) that describes the measure of \( A \). One way of viewing this is as a search for a function from the set of all subsets of \( \mathbb{R} \) (denoted as \(2^{\mathbb{R}} \)) to \( \mathbb{R} \) itself:

\[ m : 2^{\mathbb{R}} \to \mathbb{R} \]

If there was a way to assign such a measure to each set, what sort of characteristics would the measure have? In 1898, the 27 year old Borel published his Leçons Sur La Théorie Des Fonctions, a 130 page text covering a range of different topics. A short section from page 46-50 outlined some requirements for the measure of a set:

A measure is always nonnegative.
The measure of the difference of two sets (a set and a subset) is equal to the difference of their measures.
The measure of a countable union of non-overlapping sets is the sum of their measures.
Every countable set has measure 0.

It is worth pausing to understand these requirements and to consider if they have been chosen well.

Interlude: Leçons, original and translation

Below is a cropped photo of page 46 and 47 of Borel's Leçons Sur La Théorie Des Fonctions. The full text is available on Internet Archive.

The reader is free to skip this section without missing out on any crucial information.

Photo of page 46 and 47 of Borel's 1898 text.

The section Les ensembles mesurables, starting half way down page 46, is where Borel describes the requirements of a measure. This section is transcribed and translated below.

Les ensembles mesurables

Tous les ensembles que nous considérerons seront formés de points compris entre 0 et 1. Lorsqu'un ensemble sera formé de tous les points compris dans une infinité dénombrable d'intervalles n'empiétant pas les uns sur les autres et ayant une longueur totale s, nous dirons que l'ensemble a pour mesure \( s \). Lorsque deux ensembles n'ont pas de points communs, et que leurs mesures sont \( s \) et \( s' \), l'ensemble obtenu en les réunissant, c'est-à-dire leur somme, a pour mesure \( s + s' \). D'ailleurs, il importe peu dans la définition de la mesure d'un ensemble, ou dans celle de la somme de deux ensembles, que l'on néglige, ou que l'on tienne tel compte que l'on veut des extrémités des intervalles, en infinité dénombrable.

Plus généralement, si l'on a une infinité dénombrable d'ensembles n'ayant deux à deux aucun point commun et ayant respectivement pour mesure \( s_1, s_2, ..., s_n, ..., \) leur somme ensemble formé par leur réunion) a pour mesure

\[ s_1 + s_2 + ... + s_n + ... \]

Tout eela est une conséquence de la définition de la mesure. Voici maintenant des définitions nouvelles : si un ensemble \( E \) a pour mesure \( s \), et contient tous les points d'un ensemble \( E' \) dont la mesure est \( s' \), l'ensemble \( E - E' \), formé des points de E qui n'appartiennent pas à \( E' \), sera dit avoir pour mesure \( s—s' \); de plus, si un ensemble est la somme d'un infinité dénombrable d'ensembles sans partie commune, sa mesure sera la somme des mesures de ses parties et enfin les ensembles \( E \) et \( E' \) ayant, en vertu de ces définitions, \( s \) et \( s' \) comme mesures, et \( E \) renfermant tous les points de \( E' \), l'ensemble \( E-E' \) aura pour mesure \( s - s' \).

Le théorème fondamental démontré pages 41-43 nous assure que ces définitions ne seront jamais contradictoires entre elles ('); nous sommes donc libres des les adopter; nous sommes d'ailleurs assurés aussi que la mesure d'un ensemble ne sera jamais une quantité négative; mais un ensemble peut avoir pour measure zéro et avoir la puissance du continu. Tel est l'ensemble \( E \) considéré tantotôt. Si nous reprenons les notations de la page 45 et si nous désignons par \( x_n \) la mesure de \( E_n \;\; (x_n < M/n) \), l'ensemble \( E_n - E_{n+1} \) aura pour mesure \( x-n - x_{n+1} \) (nous savons que \( E_n \) renferme tous les points de \( E_{n+1} \) ). L'ensemble \( A \) des points qui n'appartiennent pas à \( E \) peut être regardé comme formé en ajoutant à \( A \) les ensembles \( E_n - E_{n+1}, E_{n+1} - E_{n+2}, ...; \) sa mesure est donc

\[ 1 - x_{n} + (x_n - x_{n+1}) + (x_{n+1} - x_{n+2}) + ... = 1 \]

puisque \( x_m \) tend vers zéro pour \( m \) infini. Donc, l'ensemble \( E \) obtenu en retranchant cet ensemble de l'ensemble de tous les points 0-1, a pour measure zéro.

Ainsi un ensemble qui a pour mesure zéro peut être non dénombrable; mais tout ensemble dénombrable a pour mesure zéro; c'est une conséquence aisée de ce qui précède.

Measurable sets

All the sets we will consider are formed by points between 0 and 1. When a set is formed by all the points included in countably infinite union of non-overlapping intervals, each having a total length s, we will say that the set has measure \( s \). When two sets have no points in common, and their measures are s and s', the set obtained by joining them, that is, their sum, has measure \(s + s' \). Moreover, it does not matter in the definition of the measure of a set, or in that of the sum of two sets, whether one neglects, or takes into account as one wishes, the extremities of the intervals, in countable infinity.

More generally, if we have countably infinite sets having no common point between them and having measure \( s_1, s_2, ..., s_n, ..., \) their sum (formed by their union) will have measure

\[ s_1 + s_2 + ... + s_n + ... \]

All this is a consequence of the definition of measure. Here are some new definitions: if a set \( E \) has measure \( s \), and contains all the points of a set \( E' \) whose measure is \( s' \), the set \( E-E' \), formed by the points of \( E \) which do not belong to \( E' \), will be said to have measure \(s-s'\); moreover, if a set is the sum of an infinite number of sets without common parts, its measure will be the sum of the measures of its parts and finally, the sets \( E \) and \( E' \) having, by virtue of these definitions, \( s \) and \( s' \) as measures, and \( E \) containing all the points of \( E' \), the set \( E-E' \) will have measure \( s - s' \).

The fundamental theorem shown on pages 41-43 assures us that these definitions will never be contradictory to each other ('); we are therefore free to adopt them; we are also assured that the measure of a set will never be a negative quantity; but a set can have zero measure and have the power of the continuous. Such is the set \( E \) considered earlier. If we take the notations of page 45 and if we designate by \( x_n \) the measure of \( E_n \;\; (x_n < M/n) \), the set \( E_n - E_{n+1} \) will have measure \( x-n - x_{n+1} \) (we know that \( E_n \) contains all the points of \( E_{n+1} \)) The set \( A \) of the points which do not belong to \( E \) can be seen as formed by adding to \( A \) the sets \( E_n - E_{n+1}, E_{n+1} - E_{n+2}, ...\); its measure is thus

\[ 1 - x_{n} + (x_n - x_{n+1}) + (x_{n+1} - x_{n+2}) + ... = 1 \]

since \( x_m \) tends to zero as \( m \) tends to infinite. So the set \( E \) obtained by subtracting this set from the set of all points \( 0-1 \), has zero measure.

Thus a set which has measure zero can be uncountable; but any countable set has measure zero; this is an easy consequence of the above.

Translated with help of www.DeepL.com on 2021-08-08 (translation sharable link). Snapshot of licence at the time of translation.

Interlude over. Back to the 4 properties...

Any system of measuring sets that breaks one of these requirements would be strange enough that it wouldn't feel like a measure. This is more obvious for properties 1 and 2 than for properties 3 and 4. What makes properties 3 and 4 harder to interpret is that they both refer to a type of infinity: countable infinity. When Borel published his text, it had only been 30 years since Cantor had introduced the idea of multiple types of infinity. These ideas were slow to be accepted, so if a reader struggles to interpret them, they are in good company. The ideas of different infinities are not discussed here, but a grasp of them is probably required to make sense of the rest of this article.

Property 3

Property 3 is repeated here:

The measure of a countable union of non-overlapping sets is the sum of their measures.

Property 3 expresses the idea that if multiple sets do not overlap, and we form a new set from their union, then the measure of the result equals sum of the measures of the individual sets.

But why is a countably infinite union specified? Why not a finite union, or an uncountably infinite union? Consider these three alternatives in turn.

Finite union

We can word property 3 for finite unions as follows.

Let \( E_1 \) and \( E_2 \) be two non-overlapping subsets of \( \mathbb{R} \). Let \( E \) denote their union, \( E = E_1 \cup E_2 \). Then

\[ m(E) = m(E_1) \cup m(E_2) \;. \]

Countably infinite union

Set theory allows for an infinite union. Reading the axiom of union might make this more clear. Below, property 3 in laid out explicitly for the case of countable union.

Let's first construct a set \( E \) from a countable union of subsets of \( \mathbb{R} \). For every natural number \( n \ge 0 \) let \( E_n \) be a subset of \( \mathbb{R} \). We can collect these sets into a set to form a set of sets, \( \{ E_n : n \in \mathbb{N} \} \). Then let \( E \) be the union of all of these sets, \( E = \bigcup \{ E_n : n \in \mathbb{N} \} \). This set is defined to exist by the axiom of union.

Property 3 for a countable union then says:

If all sets \( E_0, E_1, ... \) are non-overlapping, then the following is true:

\[ m(E) = \sum_{n=0}^{n=\infty}m(E_n) \]

This countable union includes the finite union as a special case, just let all but a finite number of the sets be empty.

This idea pushes against the boundaries of intuition. Why is it acceptable to consider the measure of an infinite union? What is interesting is that if we try to extend this to uncountably infinite unions, we fail.

Uncountably infinite union

We cannot allow an arbitrary union of sets to have summable measure without the concept of measure becoming useless. An example explains this.

Consider a single point, such as \( r = \{ 0.3 \} \). We can form a single point set \( \{ r \} \). The interval \( I = (0, 1) \) can be expressed as a uncountably infinite set of points,

\[ I = \{ r \in \mathbb{R} : 0 \gt r \gt 1 \} \]

or as the union of the equivalent point sets,

\[ I = \bigcup \{ \; \{r\} \in \mathbb{R} : 0 \gt r \gt 1 \} \]

None of these points overlap, so if we were to allow an arbitrary union to preserve summation of measure, then we would be forced to accept that \( (0, 1) \), and in fact all intervals, have either infinite measure or zero measure—two results that don't fit with our intuition that the length of \( (0, 1) \) should be non-zero and finite.

Property 4

Property 4 states:

Every countable set has measure 0.

Consider first a sub-case of this property: every finite set has measure zero. Consider a single point set, such as \( \{ 4.21 \} \). It is reasonable to assert that this set should have measure zero. There are an infinite number of rational numbers within \( (0, 1) \), so if each rational was given non-zero measure, then \( (0, 1) \) would have infinite measure.

No consider the set of all rationals within \( (0, 1) \). This is a countably infinite set. Between each rational there are infinite irrationals (a result covered by 3Blue1Brown). If the rationals were to have anything but zero measure, we would expect the irrationals to have infinitely more measure, and so we would be forced to say that the set \( [0, 1] \) had infinite measure too. Thus, we find ourselves compelled to accept property 4.

Is it possible to satisfy all properties

With the 4 properties in mind, two questions raise themselves:

Does there actually exist a measure that satisfies all 4?
If we do find such a measure, is it unique?

The 4 properties might seem more obvious than these two questions. In some sense, these questions are quite deep. A lot of mathematicians present their work as a painstaking construction of a mathematical object and then showing that it has some nice intuitive properties. Here we have the reverse process: we have some nice intuitive properties in mind and we are trying to search for some mathematical object that fits them all. In addition, we are asking: to what extent might such an object be unique? Uniqueness would allow the requirements to become a definition: "the single object that satisfies our requirements". A definition of an object by stating the properties it must have is often called a descriptive definition. Both Borel and Lebesgue began their theories with descriptive definitions.

It's worth pondering the second question for a little while now. The metric system and imperial system assign different numbers to the lengths, areas and volumes of things, yet both seem to "work" in the sense of the 4 requirements. So it seems intuitive that systems that differ by a constant multiple could be valid measurement systems, in other words, if we have some function \( m : 2^{\mathbb{R}} \to \mathbb{R} \) that assigns a measure to subsets of \( \mathbb{R} \) then the function \( m' = cm \) for some constant \( c \), is also a valid measure. According to the 4 criteria above then, if we can find one measure, we can find infinitely many other valid measures. However, if we add one additional and very reasonable requirement for a measure system, all but one of these measuring systems will disappear: if we wish for the interval \( [0, 1] \) to have a measure of 1, then if we find such a measuring function, any constant multiple of this function will assign some number other than 1 to the set \( [0, 1] \). With these requirements, we cannot produce new valid measuring functions simply by scaling. This sense of playing with possible ways of measuring things and seeing how they behave is very much in the same spirit of how Borel and Lebesgue investigated measuring things.

Borel tackled question 1 (he ignored question 2). Borel attempted to answer question 1 by proposing a measuring system for subsets of \( \mathbb{R} \). Borel observes the following. For any set which is an open interval \( (a, b) \) (where \( b > a \)) let us give it the measure \( b - a \). With this as a starting point, attempting to adhere to requirement (2) forces our hand and we must say: any set which can be expressed as a sum of non-overlapping open intervals will have a measure that is the sum of the constituent intervals. There is an existing terminology for such sets: open sets. So all open sets will have a measure that is a sum of the measure of intervals. On the other hand, closed sets are those that can be expressed as the difference between an interval and any open subset of the interval. Attempting to adhere to requirement (3) forces our hand again, and we must say: any closed set \( C \) formed from the difference \( I \setminus O \) must have a measure equal to \( m(I) - m(O) \). In other words, the measure of \( C \) equals to the difference between the measure of the interval and the measure of the open subset removed to form \( C \). In this way, Borel suggested that if our measure system assigns all intervals a measure equal to their "length", and we consider only the subsets of \( \mathbb{R} \) that can be formed by sums and differences of intervals, then we will have a measure system that meets the 4 requirements above.

The reader may have noticed some deficiencies with the above argument. Indeed, Borel did not show that requirement (1) is always satisfied (it is conceivable that some set difference might arrive at a negative measure). Furthermore, sets might be built up from intervals in multiple ways, and it's not clear that our measure system will assign a consistent measure. For example, consider an open set \( S \). There can be many ways of summing up disjoint intervals to arrive at \( S \). If we have two sets of intervals that sum to \( S \):

\[ S = I_1 \cup I_2 \cup ... \cup I_n \]

and

\[ S = I'_1 \cup I'_2 \cup ... \cup I'_n \,\;, \]

we must have

\[ m(S) = m(I_1) + m(I_2) + ... + m(I_n) \]

and

\[ m(S) = m(I'_1) + m(I'_2) + ... + m(I'_n).\]

But Borel did not show that both of these sums will be equal. This problem is usually phrased as: we must show that our measure system is well defined. Borel's ideas are a preliminary argument. Borel demonstrates that by following a descriptive definition of measure, we can start to narrow in on how our measure system must behave. The issues that Borel didn't address were later addressed rigorously by Lebesgue.

We leave the idea of measure for the moment and move to the problem of integration.

Note: for the rest of the article, the 1-D case—measuring subsets of \( \mathbb{R} \)—will continue to be discussed without much consideration for higher dimensions. Luckily, the ideas carry over intuitively to higher dimensions. The benefit of 1-D is that notation and definitions are easier to write and easier to read compared to notation used for an arbitrary number of dimensions.

Integration

A function maps elements of a set to elements of another set; a measure maps sets to a real number, and an integral takes in two inputs—a function and a subset of the function's domain—and maps to a real number. In this sense, they are all functions, they just differ in their inputs and outputs. Using function notation, the comparison is as follows:

A function \( f : A \to B \) maps an element of \( A \) to an element of \( B \), denoted as \( f(A) \).
A measure function \( m : 2^{\mathbb{R}} \to \mathbb{R} \) maps a subset \(E \subset \mathbb{R} \) to a real number \( m(E) \).
An integral function \( l : \mathcal{F} \times \mathbb{R} \to \mathbb{R} \) maps a function \( f \in \mathcal{F} \) and a subset \( E \subset \mathbb{R} \) to a real number \( l(f, E) \). \( \mathcal{F} \) refers to the set of all functions from \( \mathbb{R} \to \mathbb{R} \).

With this perspective, the integral written as:

\[ \int_{a}^{b} f(t) \dd{t} \]

is just elaborate syntax for:

\[ l(f, [a, b] ) \]

which describes function \(f : \mathbb{R} \to \mathbb{R} \) along with the interval \( [a, b] \) being inputted to the function \( l \), and a real number being returned.

In terms of meaning, while a measure assigns a size to a set, an integral can be thought of as assigning a size to the space between the function and its domain. For both measure and integration, the sets that get assigned these sizes are all subsets of the reals, \( \mathbb{R} \), or higher dimensions, \( R^2, R^3, ... \; \).

The signature of the integration function—its domain and codomain— are clear; however, there are many functions with this signature, and it's not as clear which of these functions best represents the meaning of integration. Neither is it clear that there is just one such function—there could be many! Many mathematicians unsatisfied with existing integration functions searched for good integration functions, Lebesgue being one of them.

Integration before 1900

By the year 1900, the search for a good integration function had already resulted in various different formulations. Often, mathematicians tackled the subject in passing as they worked on some other topic that required integration, for which the existing ideas of integration somehow fell short.

Here we will just mention some names of Lebesgue's predecessors and move on:

Cauchy (1823)
Dirichlet (never published, but Lipschitz documented his lectures in 1864)
Riemann (1867)
Harnack (1883)
Holder (1884)
Darboux (1875)
De la Vallee-Poussin (1994)
Stieltjes (1895)

The integrals these people proposed were named eponymously, such as the Riemann integral. We can understand Lebesgue integration without understanding the details of these integrals; it is sufficient to appreciate that in 1900, the search for a better integration definition was still ongoing.

Lebesgue part I: characterizing the integral

Lebesgue wrote his doctoral thesis, Intégrale, longueur, aire in 1902 (French for Integral, length, area). The full text is available on Internet Archive. Lebesgue's advisor was none other than Émile Borel, (Borel being 4 years older). For his thesis, Lebesgue set out to identify and condense the properties common to all the previous integrals. His realization was that there were properties common to all integrals, and that it was worth investigating whether these properties could be used to actually define integration. Instead of the properties being a consequence, Lebesgue inverted the problem and considered the properties as the starting point. In other words, if these properties are to hold, what must be implied about the nature of integration and its formulation? Might this approach lead us to a better constructive definition of an integral? It was the case that a better definition was needed—the mathematicians listed above found themselves tackling integration due to issues with the definitions available to them.

This inversion of the problem is in the same spirit as Borel's investigation of measure. This is not the only connection between Borel's work on measure and Lebesgue's work on integration; indeed, we arrive at a much more concrete and important connection later on. But for now, let's proceed by listing the properties Lebesgue identified as common to all integration theories.

Lebesgue had written about his integration theory before his thesis; he briefly introduced some of his ideas in the 132^nd volume of French journal Comptes rendus, published in 1901. After his thesis, he presents his theory in a more comprehensive form in Leçons sur l’intégration et la recherche des fonctions primitives. The quotations that appear below come from this text.

The 6 properties

Here are Lebesgue's original words, translated by Pezin & Kotz:

"It is our purpose to associate with every bounded function which is defined in a finite interval \( (a, b) \)—positive, negative, or equal to zero—a certain finite number \( \int_{a}^{b} f(x) dx \) which we will call the integral of \( f \) on \( (a, b) \) and which satisfies the following conditions:

For any \( a \), \( b \) and \( h \), we have:

\[ \int_{a}^{b}f(x) dx \int_{a-h}^{b-h} f(x+h) dx \]

For any \( a \), \( b \), \( c \) we have:

\[ \int_{a}^{b} f(x) dx + \int_{b}^{c} f(x)dx + \int_{c}^{a} f(x)dx = 0 \]

\( \int_{a}^{b}\left( f(x) + \phi(x) \right) dx = \int_{a}^{b}f(x) dx + \int_{a}^{b} \phi(x) dx \)

If \( f \ge 0 \) and \( b > a \), then also: \( \int_{a}^{b}f(x)dx \ge 0. \)

\( \int_{0}^{1}dx = 1. \)

If \( f_n \) tends increasingly to \( f \), then the integral of \( f_n \) tends to the integral of \( f \).

The significance, necessity, and corollaries of the first five conditions of this problem of integration are more or less evident..."

It's hard to imagine removing one of the 1-5 conditions and ending up with a meaningful sense of integration. The importance of the 6^th condition should eventually become clear. In a later publication, Lebesgue reformulates the properties by removing the 6^th and reworking the 2^nd (the 6^th becomes a consequence of the rewritten 2^nd).

Out of the 6 properties, only (4) assigns a specific value to an integral. So, at this point, there is only a single function, \( f : \mathbb{R} \to \mathbb{R}, f(x) = 1 \), over a specific interval \( [0, 1] \) for which we know the value of the integral.

Lebesgue part II: searching for conclusions

Lebesgue proceeded by searching for interesting conclusions that might follow once the 6 properties are assumed. If some integration procedure has the 6 properties above, what else can we say about it? Lebesgue found quite a lot to say.

4 conclusions

The following 4 statements are true if one assumes that properties 1-5 above are true.

\( \int_{a}^{b} 0 \dd{x} = 0 \)
\( \int_{a}^{b} -f \dd{x} = - \int_{a}^{b} f \dd{x} \)
if \( f \le g \) and \( a \le b \) then

\[ \int_{a}^{b} f \dd{x} \le \int_{a}^{b} g \dd{x} \]
\( \int_{a}^{b} 1 \dd{x} = b - a \)

What Lebesgue has shown is that it's impossible to design an integration procedure that breaks any of these 4 properties without also breaking one or more of the first 5 properties decided to be necessary for a reasonable integration procedure. The 6^th property comes into play soon.

Before moving on, it might be interesting to notice how result (4) gives us a way to calculate any one of the infinite number of integrals of the form \( \int_{a}^{b} 1 \dd{x} = b - a \). For example, \( \int_{3}^{7.5} 1 \dd{x} \) must equal \( 4.5 \). Already, the journey out from the 6 minimal properties is finding concrete assignments.

5^th conclusion

Lebesgue continued his search for conclusions that follow from properties 1-6. He reported the following finding.

First we need some definitions. Let \( f \) be a bounded function on \( [a, b] \). Let it be bounded below by \( l \) and above by \( L \). In other words, \( f \) is a function \( f : [a, b] \to [l, L] \) where both \( [a, b] \subset \mathbb{R} \) and \( [l, L] \subset \mathbb{R} \). Let \( l_0 < l_1 ... < l_n \) be \(n \) reals between \( l \) and \( L \), with the \( l_0 = l \) and \( l_n = L \). For each adjacent points \( l_i \) and \( l_{i+1} \), there is a set:

\[ E_i = \{ x \in \mathbb{R} : l_{i-1} < f(x) < l_{i} \} \; . \]

This set is all values within \( [a, b] \) which map through \( f \) to values within \( (l_{i-1}, l_{i}) \). For each of these sets, \( E_i \), we define a function:

\[ \psi_i : \mathbb{R} \to \mathbb{R}, \quad \psi_i(x) := \begin{cases} 1 && \text{ if } x \in E_i \\ 0 && \text{ otherwise } \end{cases} \]

In other words, each \( \psi_i \) is zero everywhere except for the subset \( E_i \) where the inputs are all mapped to 1. Functions like are called characteristic functions for the set they are defined upon. For example, \( \psi_1 \) is the characteristic function for the set \( E_1 \).

We are also going to briefly refer to the function \( f \) restricted to a subset of its domain. The notation that will be used is: if \( A \subset [a, b] \) is a subset of \( f \)'s domain, then the function \( f_{|_{A} : A \to \mathbb{R}, \; f_{|_{A}}(x) = f(x) \) is the function resulting from restricting \( f \) to have domain \( A \). \( f_{|_{A}} \) is otherwise the same as \( f \).

With the required definitions in place, we can continue. Consider one of the subsets \( E_i \subset [a, b] \). \( f \) is restricted to \( E_i \), denoted as \( f_{|_{E_i}} \), then we have the inequality:

\[ \begin{equation} l_{i-1} \psi_i \lt f_{|_{E_i}} \lt l_{i}\psi_i \label{bounds1} \end{equation} \]

Denote the two sums of \( n \) characteristic functions like so:

\[ \begin{align*} \Psi_{n, \text{ lower}} &= l_0 \psi_1 + l_1 \psi_2 + l_2 \psi_3 ... + l_{n-1} \psi_n \\ \Psi_{n, \text{ upper}} &= l_1 \psi_1 + l_2 \psi_2 + l_3 \psi_3 ... + l_n \psi_n \\ \end{align*} \]

and because each \( E_1, E_2, ..., E_n \) is disjoint, \( \eqref{bounds1} \) can be extended to the whole domain of \( f \):

\[ \begin{equation} \Psi_{n, \text{ lower}} \lt f \lt \Psi_{n, \text{ upper}} \label{bounds2} \end{equation} \]

We can show that \( \Psi_{n, \text{ lower}} \) and \( \Psi_{n, \text{ upper}} \) , when parameterized by \( n \), both converge to \( f \):

\[ \begin{equation} \lim_{n \to \infty} \Psi_{n, \text{ lower}} = f = \lim_{n \to \infty} \Psi_{n, \text{ upper}} \label{bounds3} \end{equation} \]

[proof]

So far in this section only standard properties of function have been used, such as function addition and convergence of functions. Next the 6^th property chosen by Lebesgue will be employed to arrive at Lebesgue's next conclusion:

\[ \begin{equation} \lim_{n \to \infty} \left( \int_{a}^{b} \Psi_{n, \text{ lower}} \dd{x} \right) = \int_{a}^{b} f(x) \dd{x} \label{conclusion5a} \end{equation} \]

and also:

\[ \begin{equation} \lim_{n \to \infty} \left( \int_{a}^{b} \Psi_{n, \text{ upper}} \dd{x} \right) = \int_{a}^{b} f(x) \dd{x} \label{conclusion5b} \end{equation} \]

These two statements follow from \( \eqref{bounds3} \) just above and from Lebesgue's 6^th property (6).

The importance of the 5th conclusion

In the statements \( \eqref{conclusion5a} \) and \( \eqref{conclusion5b} \), observe that \( f \) is a arbitrary bounded function from \( [a, b] \to \mathbb{R} \) while \( \Psi_{n, \text{ lower}} \) and \( \Psi_{n, \text{ upper}} \) are both sums of characteristic functions—step functions admitting a value of either 0 or 1. Given that we have not yet tried to create a constructive definition of the integral, we don't know how to determine the integral of any of these functions. \( \eqref{conclusion5a} \) and \( \eqref{conclusion5b} \) make the problem much easier: we now know that if there exists an integration procedure adhering to Lebesgue's 6 properties, then constructing it for characteristic functions is all that is needed to allow us to determine the integral for any arbitrary bounded function: it will be equal the limit of integrals of step functions. After demonstrating this result, Lebesgue directs his attention to these characteristic functions.

Lebesgue part III: step functions and the return to measure

Let \( E \subset [a, b] \) be a subset of the reals. Let \( \psi_E \) be a characteristic function for \( E \). In other words, \( \psi_E \) is defined as:

\[ \psi_E : \mathbb{R} \to \mathbb{R}, \quad \psi_E(x) := \begin{cases} 1 && \text{ if } x \in E \\ 0 && \text{ otherwise } \end{cases} \]

The integral \( \int_{a}^{b} \psi_E(x) \dd{x} \) depends only on the set \( E \). In particular, it seems to depend only on the size of \( E \). To check this intuition, consider \( E \) being the interval \( [a, b] \). We know that:

\[ \begin{align*} \int_{a}^{b} \phi_{E}(x) \dd{x} &= \int_{a}^{b} 1 \dd{x} \\ &= b - a && \text{(from Lebesgue's 4th conclusion)} \end{align*} \]

This matches Borel's belief that the measure of \( [a, b] \) should be \( b - a \). So there is some evidence to suggest that the integral of a characteristic function for a set \( E \) is a measure of the size of \( E \). This was Lebesgue's belief. Lebesgue's writes [cite]:

"Here is the problem which is to be solved: Our purpose is to associate with every bounded set \( E \) consisting of points on the \( x \) axis a certain nonnegative number, \( mE \), which will be called the measure of \( E \), and which satisfies the following conditions:

Two congruent sets have the same measure.

A set which is the sum of a finite or countable number of pairwise disjoint sets has a measure equal to the sum of the measures of the summands.

The measure of the set of all points of the interval \( (0, 1) \) equals 1."

Where did these conditions come from? They are similar in nature but distinctly different than those proposed by Borel. Lebesgue chose conditions (1')-(3') so that the 6 properties of integration would be satisfied for characteristic functions. The correspondence for characteristic functions is as follows:

(1') implies (1)
(3') implies (5)
(2') implies (2), (3), (4) and (6)

[proof]

Lebesgue part IV: solving the measure problem

Lebesgue, equipped with the 3 measurement properties, begins again the process of searching for conclusions: what can be said about measurement systems that have these 3 properties? Can enough be said to construct a measurement system? Lebesgue is asking the same questions about measure that Borel did.

Out of Lebesgue's 3 measurement properties, only (3') assigns a measure to a set, specifically, \( m((0, 1)) = 1 \). Our goal is to assign measures to as many sets as possible. The task then is to use what little else is available, just statements (1') and (2'), to re-use this single assignment in order to find measure for other sets.

Open sets

Lebesgue's search focused on the behaviour of open sets. Why? Because by definition open sets are a union of intervals, such as \( [4, 7] \). And for such intervals, we know their measure: \( m([4, 7]) = 3\). What is the intervals making up an open set \( A \) overlap, for example \( A = [4, 7] \cup [5, 10] \)? Luckily, any union of overlapping intervals can be expressed as a union of non-overlapping intervals through intersection: \( A = [4, 5] \cup [5, 7] \cup [7, 10] \). In this way, we know that all bounded open sets have a measure. This makes them useful sets to work with.

Inner measure

[todo]

Outer measure

[todo]

Measure: a constructive definition

[todo]

Lebesgue part V: constructing integration

[todo]

TBC

Welcome to my unfinished article! One day, it will be done.

2021.08.07 (Last mod: 2021.08.18)