Instead of thinking of the matrix transpose as the swapping of rows and
columns, there is a more meaningful characterization:
Matrix transpose
Let \( A \) be an \( m \times n \) matrix and \( x \in \mathbb{R}^n \)
and \( y \in \mathbb{R}^m \) be column vectors. Then the transpose of \( A \),
denoted \( A^{T} \) is the matrix such that the following holds:
So each element of \( x \) multiplies by all elements of \( y \).
The two possible orders of multiplication are:
Convert \( x \) to a single vector in the same space as \( y \), then do a
dot product.
Carry out two dot products with \( y \), one for each of the vectors that
\( x \) will scale, then mix the result based on the elements of \( x \).
With the first order, the \( 3 \times 2 \) intermediate result are
two vectors in the same space as \( y \), transposed to row vectors. With the
second order, the \( 2 \times 1 \) intermediate result are two scalars
representing completed dot products with \( y \).