Math and science::Algebra

Preloaded dot product

Matrix products $W^{T} W$ and $(W W^{T})^{- 1}$ can be thought of as preloading a dot product of vectors after both vectors are transformed by $W$ or $W^{- 1}$ .

$W^{T} W$

Consider the dot product $x^{T} y$ of two vectors $x$ and $y$ . Now consider the transformed vectors $W x$ and $W y$ . The dot product of the transformed vectors is:

\begin{aligned} (W x)^{T} (W y) & = x^{T} W^{T} W y & = x^{T} (W^{T} W) y \end{aligned}

So when looking at a matrix product $W^{T} W$ , know that it will carry out the transformation and then dot product of any two vectors if they are placed on either side of the product.

Special case: length² of transformed vector

Let $b = W x$ . What is the of $b$ ? Let's make this a function that takes in any vector $x$ and returns the length-squared of the transformed vector $W x$ . This function is simply $f (x) = x^{T} W^{T} W x$ . We can see $W^{T} W$ as being the implementation of this function.

$W W^{T}$

$W W^{T}$ acts similarly, but with an inverse transformation.Again, consider the dot product $x^{T} y$ of two vectors $x$ and $y$ . Now consider the transformed vectors $W^{- 1} x$ and $W^{- 1} y$ , where we are now inverting the transformation $W$ . The dot product of the transformed vectors is:

\begin{aligned} (W^{- 1} x)^{T} (W^{- 1} y) & = x^{T} (W^{- 1})^{T} W^{- 1} y & = x^{T} (W W^{T})^{- 1} y \end{aligned}

So when looking at a matrix product $W W^{T}$ , know that when this matrix is inverted, $(W W^{T})^{- 1}$ , it will carry out the inverse transformation and then dot product of any two vectors if they are placed on either side of the product.

$Q Q^{T} = (Q Q^{T})^{- 1}$ for orthonormal $Q$

When $Q$ is orthonormal, $Q^{- 1} = Q^{T}$ , so it's easy to see that:

\begin{aligned} (Q Q^{T})^{- 1} & = (Q^{T})^{- 1} Q^{- 1} \\ = Q Q^{T} \end{aligned}

So orthonormal $Q Q^{T}$ could be thought of as preloading the inverse then dot product of two vectors, even without the matrix inverse operation appearing in the formula. However, more pertinently, we have $Q Q^{T} = Q Q^{- 1} = I$ , so $Q Q^{T}$ is the identity matrix. And so we have arrived at the intuitive idea that a rotation transformation, $Q$ , doesn't change the length of a vector or the angle between two vectors.

$Q Q^{T}$ when not full rank (projection)

If $Q$ has say 2 orthonormal columns in $R^{3}$ , then it doesn't have an inverse, but we can ask for the projection of a vector $b$ onto the column space of $Q$ by computing $Q Q^{T} b$ . Here, $Q^{T} b$ projects $b$ onto the column space of $Q$ , and then $Q (Q^{T} b)$ rehydrates the projection so that it's expressed in the original space.

Example

Consider a random vector $X : R^{n} \to R^{n}$ distributed as a standard normal distribution, $X \sim N (0, I)$ . We can derive a random variable $Y = W X$ for some matrix $W$ . Using the change of variable formula for distributions, we can derive the distribution of $Y$ .

If $X$ and $Y$ were 1 dimensional random variables, with $Y = g (X)$ , the change of variable formula would be:

P_{Y} (t) = P_{X} (g^{- 1} (t)) | \frac{d}{d t} g^{- 1} (t) |

The multidimensional version, with $Y = W X$ , of this formula is:

P_{Y} (t) = P_{X} (W^{- 1} t) | det (\frac{d}{d t} W^{- 1} t) |

and applying this to the standard normal distribution, we get:

\begin{aligned} P_{Y} (t) & = P_{X} (W^{- 1} t) | det (W^{- 1}) | \\ = Constant \times \exp (- \frac{1}{2} (W^{- 1} t)^{T} (W^{- 1} t)) \\ = Constant \times \exp (- \frac{1}{2} t^{T} (W^{- 1})^{T} W^{- 1} t) \\ = Constant \times \exp (- \frac{1}{2} t^{T} (W W^{T})^{- 1} t) \\ = N (0, (W W^{T})^{- 1}) \\ = N (0, S) \end{aligned}

where $S$ is the covariance matrix of $S$ (aka the cross-covariance matrix $S = Cov (Y, Y)$ .

So while the covariance matrix $Σ$ of $Y$ is specified to parameterize the multivariate normal distribution, under the hood, this matrix is preloading the dot product of two vectors that are brought back to the original space by inverting the transformation $W$ .

Note: the fact that $W W^{T}$ is the covariance matrix of $Y$ is another separate result that is a special case of the more general theorem:

Cov (W Z) = W Cov (Z) W^{T}

for which there is another page.