Math and science::INF ML AI

Covariance matrix

Let $X$ and $Y$ be two random variables. The covariance between $X$ and $Y$ is defined as:

\begin{aligned} C o v [X, Y] & := E [(X - E [X]) (Y - E [Y])] \\ = E [X Y] - E [X] E [Y] \end{aligned}

Let the vector $Z$ be defined like so: $Z := [\begin{matrix} X \\ Y \end{matrix}]$ . Thus, $Z$ is a vector of random variables.

The covariance matrix for $Z$ is defined as:

\begin{aligned} C o v [Z] & := E [(Z - E [Z]) (Z - E [Z])^{T}] \\ = [\begin{array}{c} V a r (X) & C o v (X, Y) \\ C o v (X, Y) & V a r (Y) \end{array}] \end{aligned}

Where the expectation is an elementwise operation. The covariance matrix is a result of a matrix multiplication of two vector-like matrices, which produces a 2x2 matrix. (Yes, it is valid!).

Matrix interpretation

An intepretation of such a 2x1*1x2 matrix multiplication is:

\begin{aligned} [\begin{array}{c} A \\ B \end{array}] [\begin{array}{c} C & D \end{array}] & = [\begin{array}{c} A C & A D \\ B C & B D \end{array}] \\ = [\begin{array}{c} C (\begin{matrix} A \\ B \end{matrix}) & D (\begin{matrix} A \\ B \end{matrix}) \end{array}] \end{aligned}

The first matrix can be considered a transformation matrix which transforms a single dimension into 2 dimensions. $A$ is the factor by which the input scalar is multiplied by to produce the first output dimension; $B$ is the same quantity for the second output dimension. The matrix $[\begin{matrix} C & D \end{matrix}]$ can be considered a list of two separate scalars that will be transformed separately.

For the case of $Z Z^{T}$ , if $Z$ has $D$ dimensions, then the output is D vectors combined horizontally into a matrix, where each vector is the original $Z$ multiplied by one of it's components.

For the 2 dimensional covariance matrix we have:

\begin{aligned} C o v [Z] & = E [(Z - E [Z]) (Z - E [Z])^{T}] \\ = E [[\begin{array}{c} X - μ_{X} \\ Y - μ_{Y} \end{array}] [\begin{array}{c} X - μ_{X} & Y - μ_{Y} \end{array}]] \\ = E [[\begin{array}{c} (X - μ_{X}) (\begin{matrix} X - μ_{X} \\ Y - μ_{Y} \end{matrix}) & (Y - μ_{Y}) (\begin{matrix} X - μ_{X} \\ Y - μ_{Y} \end{matrix}) \end{array}]] \\ = [\begin{array}{c} C o v (X, X) & C o v (Y, X) \\ C o v (X, Y) & C o v (Y, Y) \end{array}] \\ = [\begin{array}{c} V a r (X) & C o v (X, Y) \\ C o v (X, Y) & V a r (Y) \end{array}] \end{aligned}

The covariance matrix is symmetric, like all matrixes of the form $X X^{T}$ . Its diagonal is the variances of each random variable.

Random variable interpretation

Covariance is the expected value of the random variable $Z = (X - \bar{X}) (Y - \bar{Y})$ . Imagine the probability mass function of $X$ and $Y$ , then $X - \bar{X}$ and $Y - \bar{Y}$ , then the 2 dimensional $(X - \bar{X}, Y - \bar{Y})$ , then finally the 1 dimensional $Z$ . The covariance is a single value representing the expectation (product sum) of the value-probabilities of $Z$ .

Use in projected variance

Let \(