Fisher information
Fisher information tells you how much information a potential observation of a random variable would provide about a parameter of that variable's probability distribution. It can be considered "information" in the sense that it adds for independent random samples.
Setup
Let X be a random variable, \( X : \Omega \to Z \), where \( Z \subseteq \mathbb{R}^n \). Let \( X \) have a probability distribution \( X \sim p_{\theta}(x) \) taken from a family of such distributions, parameterized by \( \theta \in \Theta \). We will assume \( \theta \) to be a real, but the ideas apply even if \( \theta \) is a vector of reals.
I write \(p(x; \theta) \) interchangeably as \( p_{\theta}(x) \). The latter is useful when emphasizing a distribution for a fixed \( \theta \). But, for Fisher Information, we will be interested in viewing \( p \) as being a function with input \( \theta \) also.
See the reverse for a more precise setup.
There are three commonly used and equivalent ways to define Fisher information. We start with the relative entropy perspective.
The entropy of a single distribution, \( p_{\theta} \) is:
The relative entropy of another member distribution \( p_{\phi} \) with respect to \( p_{\theta} \) is:
Then we can define Fisher Information:
Fisher Information
Fisher Information \( I(\theta) \) is the second derivative of [what?] with respect to [what?] evaluated at \( \theta \):
From the perspective of how quickly \( p_{\phi} \) deviates from \( p_{\theta} \):
Consider the map \( \phi \mapsto H(f_{\phi} | f_{\theta}) \) for fixed \( \theta \). This function is minimized at \( \phi = \theta \), with a value of zero, and the first derivative is zero at this point also. The second derivative gives a measure of how quickly the distribution \( f_{\phi} \) deviates from \( f_{\theta} \) as \( \phi \) moves away from \( \theta \). This is the Fisher information \( I(\theta) \) of the family of distributions, evaluated at \( \theta \). All of this without ever observing data.
Can you remember two other equivalent forms of the definition?