Math and science::INF ML AI

Gompertz distribution

The Gompertz distribution is defined equivalently by:

Probability density function

f (t) = k e^{α t} e^{- \frac{k}{α} (e^{α t} - 1)}

Survival function

S (t) = e^{- \frac{k}{α} (e^{α t} - 1)}

Hazard function

h (t) = k e^{α t}

Gompertz distribution. Motivation.

A survival function can be expressed as:

\begin{aligned} S (t) & = \exp (- \int_{t_{0}}^{t} h (x) d x) & (where h is the hazard function) \\ = \exp (- y (t)) & (thus defining y) \end{aligned}

The transformation $y$ is constrained so that all 3 of the following statements hold:

\begin{aligned} y (t_{0}) & = 0 \\ y (\infty) & = \infty \\ y^{'} (t) & \geq 0 \end{aligned}

From here on, we are concerned with the situation where $t \geq 0$ .

Gompertz (1825) assumed that $y$ took the form:

y (t) = \frac{k}{α} (e^{α t} - 1) .

Gompertz describes the motivation in detail, relating it to a geometric progression of deaths within long fixed length periods. The transformation satisfies the constraints above, on the condition that $k > 0$ . That $k$ must be positive is implied by the survival function being positive function.

Gompertz distribution. Properties.

Below are some properties of the Gompertz distribution.

Mode

Differentiating $f (t)$ and equating to zero, we find the mode:

\begin{aligned} 0 = f^{'} (t) = (- k e^{α t} + α) k e^{- \frac{k}{α} (e^{α t} - 1) + α t} \\ ⟹ α = k e^{α t} \\ ⟹ t = \frac{1}{α} \ln (\frac{α}{k}) \end{aligned}

If $α < k$ , then the mode is at $t = 0$ . If $α > k$ , then the mode is at $t_{m} = \frac{1}{α} \ln (\frac{α}{k})$ . Wikipedia also notes that when the mode is positive, the cumulative distribution evaluated at $t_{m}$ is always between 0 and 0.6321:

0 < F (t_{m}) < 1 - e^{- 1}

Translation

If $X : Ω \to [0, \infty)$ is Gompertz distributed with parameters $(k, α)$ , then a forward shifted variable $Y = X - t_{a}$ is Gompertz distributed with parameters $(h (t_{a}), α)$ .

Proof. With a change of variable, $t^{'} = t - v$ , we will show that $S (t) = S (v) S * (t^{'})$ where $S *$ is the same survival function as $S$ , but with the $k$ parameter being set to $k \to h (v) = k \exp^{α v}$ . With this done, it will follow that \( S(t|t>v) = S*(t').

Let $t^{'} = t - v$ .

\begin{aligned} S (t) & = e^{- \frac{k}{α} (e^{α t} - 1)} & by definition of S \\ = e^{- \frac{k}{α} (e^{α (v + t^{'})} - 1)} & substitute t \\ = e^{- \frac{k}{α} (e^{α v} e^{α t^{'}} - 1)} & expand \\ = e^{- \frac{k}{α} (e^{α v} - e^{α v} + e^{α v} e^{α t^{'}} - 1)} & (0 \to + 1 - 1) trick \\ = e^{- \frac{k}{α} (e^{α v} + e^{α v} (e^{α t} - 1) - 1)} & group \\ = e^{- \frac{k}{α} (e^{α v} - 1 + e^{α v} (e^{α t} - 1))} & rearrange \\ = e^{- \frac{k}{α} (e^{α v} - 1)} e^{\frac{k e^{α v}}{α} (e^{α t} - 1)} & rearrange \\ = S (v) S^{*} (t^{'}) & by definition of S^{*} \end{aligned}

The consequence of this truncating a Gompertz distribution at time $t_{a}$ (i.e. setting to zero the all probability mass at times less than $t_{a}$ ) and making a variable change such that $t_{0} = t_{a}$ leaves a Gompertz distribution with parameters $(h (t_{a}), α)$ .

$f, F, S$ and $h$ , definitions

Let $T \subset R$ be the codomain of a random variable. Typically, the codomain is non-negative, $T \subseteq [0, \infty)$ . Use $t \in T$ to denote a generic value in the codomain. Let $t_{0}$ denote the infimum of the codomain (typically $t_{0} = 0$ ).

Let $f : R \to T$ be a probability density function of the underlying random variable. We then make a number of definitions:

Cumulative distribution function: $F : T \to [0, 1]$ .
$F (t) = \int_{t_{0}}^{t} f (x) d x$
Survival function: $S : T \to [0, 1]$ .
$S (t) = 1 - F (t)$
Hazard function: $h : T \to [0, \infty)$ .
$h (t) = \frac{f (t)}{S (t)}$

Also called the intensity function.

Survival function. Intuition.

The survival function maps a time $t$ to a probability mass, representing the probability that the event has not occurred yet, by time $t \geq t_{0}$ . This is the most natural representation to work with when answering the question: what is the probability I will live until at least age 46.

The survival function acts as a re-normalizing factor in

f_{t > t_{a}} (t) = \frac{f (t)}{S (t_{a})}

that allows

f

to be transformed with the information that the event has not occurred by time

t_{a}

. While

f

is normalized by

F (\infty) = 0

f_{t > t_{a}}

should be normalized by the remaining probability mass

1 - F (t_{a}) = S (t_{a})

, which will be less that 1.

When $t = t_{a}$ $f_{t > t_{a}}$ is the hazard function:

f_{t > t_{a}} (t_{a}) = h (t_{a})

and so, the hazard function is the continually re-normalized density function. When worrying about dying, the hazard function $h (t_{b})$ tells us the danger of dying at $t = t_{b}$ , assuming that one has lived until $t_{b}$ . If you knew that you would go to war for 4 years once you reach 18, then your hazard function would sharply spike at 18 and then sharply drop again when the war ends, or you get discharged. High hazard values denote times at which you should exercise caution.

Source

Page 513 of Philosophical Transactions for the Year 1825 (link)

"Maximum-likelihood Estimation of the Parameters of the Gompertz Survival Function", by Garg et. al (1970) (link)

Gompertz distribution

Gompertz distribution

Probability density function

Survival function

Hazard function

Gompertz distribution. Motivation.

Gompertz distribution. Properties.

Mode

Translation

f,F,S and h, definitions

Survival function. Intuition.

Source

$f, F, S$ and $h$ , definitions