Gompertz distribution
Gompertz distribution
The Gompertz distribution is defined equivalently by:
Probability density function
Survival function
Hazard function
Gompertz distribution. Motivation.
A survival function can be expressed as:
The transformation \( y \) is constrained so that all 3 of the following statements hold:
From here on, we are concerned with the situation where \( t \ge 0 \).
Gompertz (1825) assumed that \( y \) took the form:
Gompertz describes the motivation in detail, relating it to a geometric progression of deaths within long fixed length periods. The transformation satisfies the constraints above, on the condition that \( k > 0 \). That \( k \) must be positive is implied by the survival function being positive function.
Gompertz distribution. Properties.
Below are some properties of the Gompertz distribution.
Mode
Differentiating \( f(t) \) and equating to zero, we find the mode:
If \( \alpha < k \), then the mode is at \( t = 0 \). If \( \alpha > k \), then the mode is at \( t_m = \frac{1}{\alpha} \ln(\frac{\alpha}{k}) \). Wikipedia also notes that when the mode is positive, the cumulative distribution evaluated at \( t_m \) is always between 0 and 0.6321:
Translation
If \( X : \Omega \to [0, \infty) \) is Gompertz distributed with parameters \( (k, \alpha) \), then a forward shifted variable \( Y = X - t_a \) is Gompertz distributed with parameters \( (h(t_a), \alpha) \).
Proof. With a change of variable, \( t' = t - v \), we will show that \( S(t) = S(v)S*(t') \) where \( S* \) is the same survival function as \( S \), but with the \( k \) parameter being set to \( k \to h(v) = k \exp^{\alpha v} \). With this done, it will follow that \( S(t|t>v) = S*(t').
Let \( t' = t - v \).
The consequence of this truncating a Gompertz distribution at time \( t_a \) (i.e. setting to zero the all probability mass at times less than \( t_a \)) and making a variable change such that \( t_0 = t_a \) leaves a Gompertz distribution with parameters \( (h(t_a), \alpha) \).
\( f, F, S \) and \( h \), definitions
Let \( T \subset \mathbb{R} \) be the codomain of a random variable. Typically, the codomain is non-negative, \( T \subseteq [0, \infty) \). Use \( t \in T \) to denote a generic value in the codomain. Let \( t_0 \) denote the infimum of the codomain (typically \( t_0 = 0 \)).
Let \( f : \mathbb{R} \to T \) be a probability density function of the underlying random variable. We then make a number of definitions:
- Cumulative distribution function: \( F : T \to [0, 1] \).
\[ F(t) = \int_{t_0}^t f(x) \, dx \]
- Survival function: \( S : T \to [0, 1] \).
\[ S(t) = 1 - F(t) \]
- Hazard function: \( h : T \to [0, \infty) \).
\[ h(t) = \frac{f(t)}{S(t)} \]
Also called the intensity function.
Survival function. Intuition.
The survival function maps a time \( t \) to a probability mass, representing the probability that the event has not occurred yet, by time \( t \ge t_0 \). This is the most natural representation to work with when answering the question: what is the probability I will live until at least age 46.
The survival function acts as a re-normalizing factor in
When \( t=t_a \) \( f_{t>t_a} \) is the hazard function:
and so, the hazard function is the continually re-normalized density function. When worrying about dying, the hazard function \( h(t_b) \) tells us the danger of dying at \( t=t_b \), assuming that one has lived until \( t_b \). If you knew that you would go to war for 4 years once you reach 18, then your hazard function would sharply spike at 18 and then sharply drop again when the war ends, or you get discharged. High hazard values denote times at which you should exercise caution.