Say you observe data \( x \in \mathbb{R} \) from a distribution that
depends on a parameter \( \theta \in \mathbb{R} \), \( x \sim p(x | \theta) \).
The score function is simply the derivative of
\( \log(p) \) with respect to \( \theta \):
The emphasis is on the data being fixed, and the score being a function of \( \theta \), and so the score function is described as being the derivative of the log-likelihood
with respect to parameter \( \theta \). Typically, it is calculated from
a list of observations \( x_1, \ldots, x_n \).