Math and science::Analysis

Radon-Nikodym derivative

Measure theory. Recap.

We seek functions called measures that assign a non-negative value to sets in a way that relates to the set's size or extent; natural examples are length, area and volume. We will first insist that a measure \( \mu \) is translation invariant. We will then insist that the measure satisfies disjoint additivity, in fact a countable disjoint additivity:

\[ \mu \left( \cup_{i=1}^{\infty} A_i \right) = \sum_{i=1}^{\infty} \mu(A_i) \]

where all \( A_i \) are disjoint. Countable disjoint additivity is called sigma-additivity.

σ-algebras

Unfortunately, due to the power of the axiom of choice, a measure is unable to satisfy translation invariance and sigma-additivity without a limitation: not all subsets of a set \( \mathcal{X} \) can be given a measure. Instead we "restrict" ourselves to special sets of subsets of \( \mathcal{X} \) called σ-algebras. It's not much of a restriction, as we will include all useful sets; the restriction is just a way to ring fence out the pathalogical sets. "Algebra" in the term refers closure under union, intersection and complement; and the σ prefix refers to "summa", the Latin for summation, and is used as a language label to upgrade the algebra to be closed under countable unions. Countable unions with complements implies closure under countable intersections Instead of being simply mentioned in passing, the σ-algebras are normally brought along by being incorporated into the definition of a measurable space, which is just the tuple \( (\mathcal{X}, \mathcal{A} ) \) where \( \mathcal{A} \) is a σ-algebra over \( \mathcal{X} \). Or further, into the tuple \( (\mathcal{X}, \mathcal{A}, \mu) \), called a measure space.

Lebesgue measure

Let \( \mathcal{X} \) be n-dimensional Euclidean space, and let \( \mathcal{A} \) be the smallest σ-algebra built from all open rectangles. The members of this σ-algebra are given the name Borel sets. If we assert that the measure assigns to boxes the standard notion of length/area/volume with is the multiple of the length of each side, then the resulting measure exists and is unique and is called the Lebesgue measure. Normally denoted \( \mu \).

Lebesgue integration

The Lebesgue integration carries out integration by handing all the work over to the Lebesgue measure. For a function \( f : A \to \mathbb{R} \), for every value \( u \) in the codomain, collect as a Borel subset \( C \) of \( A \) the elements such that \( \forall a in C, \; f(a) = u \). Then, the contribution to the integration sum is \( \mu(C) * u \). Repeat for all non-zero values of \( u \) in the codomain. We write:

\[ \int_{A} f \dd{\mu} \]

For functions that take on countable or uncountably infinite values, the integration is defined with limits of functions taking on finite values.

Creating new measures

Let \( (\mathcal{X}, \mathcal{A}, \mu ) \) be a measure. Just think of \( \mu \) as the Lebesgue measure. Let \( f : X \to [0, \infty) \) be a function. Then if we use the integration of \( f \) over some subset \( A \) to define a function \( m_f \) of \( A \):

\[ m_f(A) := \int_{A} f \dd{\mu} \]

then this function satisfies all the criteria of a measure!

Being a measure, we can integrate using it, and when doing so, we will use notation like:

\[ \int_{A} g \dd{m_f} \]

With this notation, we can write:

\[ \int_{A} f \dd{\mu} = \int_{A} 1 \dd{m_f} \]

This gives us a symbolic relationship (like a syntax rewrite rule):

\[ \dd{m_f} = f \dd{\mu} \]

What is quite amazing is that if we any two measures \( v \) and \( u \) defined on the same measurable space \( (\mathcal{X}, \mathcal{A}) \), then as long as \( u(A) = 0 \implies v(A) = 0 \), there is a function connecting them in the sense that:

\[ v(A) = \int_{A} f \dd{u} \]

This function \( f \) is called a Radon-Nikodym derivative of \( v \) with respect to \( u \). \( f \) is unique almost everywhere, meaning that \( f \) can be changed on a set of \( u \)-measure zero.

Probability distribution

A measure \( P \) on a measure space \( (\mathcal{X}, \mathcal{A} ) \) that satisfies:

\[ P(\mathcal{X}) = 1 \]

is called a probability distribution or probability measure. \( P(A) \) is the probability of \( A \). If \( P \) has a Radon-Nikodym derivative \( p \) with respect to another measure \( \mu \), i.e.:

\[ P(A) = \int_{A} p \dd{\mu} \]

then \( p \) is called a probability density of \( P \) with respect to \( \mu \).

The Radon-Nikodym derivative is used to justify importance sampling. Can you remember how?

Importance sampling

Often, we wish to compute an expectation under a probability measure \( P \):

\[ \mathbb{E}_P[g] = \int_{\mathcal{X}} g(x)\,\dd{P(x)}.\]

If direct sampling from \( P \) is difficult, we can instead sample from another measure \( Q \) that is easier to handle. To connect them, we require that \( P \) is absolutely continuous with respect to \( Q \), written \( P \ll Q \). This guarantees the existence of a Radon–Nikodym derivative:

\[ f = \frac{\dd{P}}{\dd{Q}}. \]

This function \( f \) is called the importance weight. Using the change-of-measure identity, we can rewrite the expectation as:

\[ \mathbb{E}_P[g] = \int g(x)\,\dd{P(x)} = \int g(x) f(x)\,\dd{Q(x)} = \mathbb{E}_Q[g(x)f(x)]. \]

Hence, if we draw samples \( x_1, \dots, x_n \sim Q \), the empirical estimate

\[ \frac{1}{n}\sum_{i=1}^n g(x_i) f(x_i) \]

is an unbiased estimator of \( \mathbb{E}_P[g] \). The method is known as importance sampling. It relies entirely on the Radon–Nikodym relationship \( \dd{P} = f\,\dd{Q} \), which states precisely the condition under which reweighting samples from one distribution yields expectations under another—even on non-Euclidean or mixed spaces.

Source

Theory of Point Estimation, Lehmann and Casella.
ChatGPT.