Math and science::INF ML AI

One difference of squares from two. Posteriors with Gaussians.

\( (a-x)^2 + (x-b)^2 \)

This expression appears when dealing with posterior probabilities for the mean parameter of a Gaussian distribution. Can you remember how?

Minimum at \( x = \frac{a + b}{2} \)

Let \( a < b \) and define \( d=b - a \). If \( x \) is constrained to be between \( a \) and \( b \), then the expression is maximized when \( x=a \) or \(x=b \). The maximum value is \( d^2 \). If \( x \) is not constrained to \( [a, b] \), then the value of the expression can grow unbounded. The expression is minimized when \( x=\frac{d}{2} \). Let \( h=\frac{d}{2} \) be the half-width of the interval. When \( x \) is at the midpoint, \( x=a + h \), then the expression becomes:

\[ h^2 + h^2 = 2h^2 = \frac{d^2}{2} \]

If you increase \( x \) slightly, \( x \to x + \varepsilon \), then the expression would become:

\[ (h + \varepsilon)^2 + (h - \varepsilon)^2 = 2h^2 + 2\varepsilon^2. \]

The expression increases in value. It was indeed minimized at the midpoint when \( x = a + h \), taking the value \( 2h^2 = \frac{d^2}{2} \). Visualising the larger square \( (a-b)^2\) containing two quarter-sized \( h^2\) squares maybe help intuition.

From two squares to one square

The expression:

\[ (a-x)^2 + (x-b)^2 \]

can be rewritten as one square involving \( x \):

\[ \begin{aligned} (a-x)^2 + (x-b)^2 &= a^2 + x^2 - 2ax + x^2 + b^2 - 2bx \\ &= 2x^2 - 2x(a + b) + a^2 + b^2 \\ &= 2((x - \frac{a + b}{2})^2 - \frac{(a + b)^2}{4}) + a^2 + b^2 \\ &= 2(x - \frac{a + b}{2})^2 - \frac{(a^2 + 2ab + b^2)}{2} + \frac{2a^2 + 2b^2}{2} \\ &= 2(x - \frac{a + b}{2})^2 + \frac{a^2 + b^2 - 2ab}{2} \\ &= 2(x - \frac{a + b}{2})^2 + \frac{(a - b)^2}{2} \end{aligned} \]

Here again we see that the expression is minimized when \( x = \frac{a + b}{2} \), and the minimum value is \( \frac{(a - b)^2}{2} \).

Posterior probability with Gaussian prior and likelihood

Assume that a random variable \( Y \) is distributed according to a Gaussian distribution with unknown mean \( X \) and known (fixed) variance \( \sigma^2 \), \( Y \sim \mathcal{N}(X, \sigma^2) \). Furthermore, assume that the prior distribution of \( X \) is Gaussian, \( X \sim \mathcal{N}(\mu_{x}, \sigma_{x}^2) \). In the most simple case, imagine we are given a single observation \( y_0 \) of \( Y \). The probability of this observation paired with a value of \( X \) is:

\[ \begin{aligned} P(Y=y_0, X=x) &= \mathcal{N}(y_0; x, \sigma^2) \mathcal{N}(x; \mu_{x}, \sigma_{x}^2) \\ &= \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(y_0 - x)^2}{2\sigma^2}\right) \frac{1}{\sqrt{2\pi\sigma_{x}^2}} \exp\left(-\frac{(x - \mu_{x})^2}{2\sigma_{x}^2}\right) \\ &= \operatorname{const} \exp\left(-\frac{(y_0 - x)^2}{2\sigma^2} - \frac{(x - \mu_{x})^2}{2\sigma_{x}^2}\right) \end{aligned} \]

For the moment, make the further assumption that both \( \sigma \) and \( \sigma_{x} \) are 1. Then the expression becomes:

\[ P(Y=y_0, X=x) = \operatorname{const} \exp\left(-\frac{(y_0 - x)^2}{2} - \frac{(x - \mu_{x})^2}{2}\right) \]

This expression has the now familiar form:

\[ (a-x)^2 + (x-b)^2 \]

for which we know can be rewritten as:

\[ 2(x - \frac{a + b}{2})^2 + \operatorname{const}. \]

Doing so gives us:

\[ \begin{aligned} P(Y=y_0, X=x) &= \operatorname{const} \exp\left(-\frac{(y_0 - x)^2}{2} - \frac{(x - \mu_{x})^2}{2}\right) \\ &= \operatorname{const} \exp\left(-\frac{1}{2} 2(x - \frac{y_0 + \mu_{x}}{2})^2 + \operatorname{const}\right) \\ &= \operatorname{const} \exp\left(-\frac{1}{2} 2(x - \frac{y_0 + \mu_{x}}{2})^2 \right) \\ &= \operatorname{const} \exp\left(-\frac{1}{2} \frac{(x - \frac{y_0 + \mu_{x}}{2})^2}{\frac{1}{2}} \right) \\ \end{aligned} \]

So we see that the posterior probability of \( X \) given the observation \( y_0 \) is Gaussian with mean \( \frac{y_0 + \mu_{x}}{2} \) and variance \( \frac{1}{2} \).

Precisions add

Calling 1/variance the precision, we see that the precision of the posterior distribution is the sum of the precisions of the prior and the likelihood:

\[ \begin{aligned} \text{precision of posterior} &= \frac{1}{\sigma_{x}}^2 + \frac{1}{\sigma}^2 \\ &= 1^2 + 1^2 \\ &= 2 \end{aligned} \]

Standard notation

If \( Y \sim \mathcal{N}(X, \sigma^2) \) and \( X \sim \mathcal{N}(\mu_x, \sigma_x^2) \), then the posterior probability:

\[ P(X = x \mid Y = y_0) \propto \exp\left( -\frac{(y_0 - x)^2}{2\sigma^2} - \frac{(x - \mu_x)^2}{2\sigma_x^2} \right). \]

When \( \sigma = \sigma_x = 1 \), the posterior simplifies to:

\[ P(X = x \mid Y = y_0) \propto \exp\left( -\frac{\left(x - \frac{y_0 + \mu_x}{2}\right)^2}{\frac{1}{2}} \right), \]

This is a Gaussian distribution with:

\[ \text{Posterior mean: } \mu_\text{posterior} = \frac{y_0 + \mu_x}{2}, \] \[ \text{Posterior variance: } \sigma_\text{posterior}^2 = \frac{1}{2}. \]