Math and science::INF ML AI

Symbol codes

A binary symbol code \( C \) for ~~an ensemble~~ a probability space \( (\Omega, \mathcal{A}_x, \mathcal{P}_x) \) is a mapping from the range of events \( \mathcal{A}_x = \{a_1, ..., a_n\} \), to [...]. Let \( x \in \mathcal{A}_x \). \( c(x) \) denotes the codeword corresponding to \( x \) and \( l(x) \) will denote its length, with shorthand \( l_i = l(a_i) \).

The extended code, denoted \( C^+ \), is a mapping from [...] to [...] obtained by concatenation:

\[ c^+(x_1x_2...x_N) = c(x_1)c(x_2)...c(x_N) \]

A code \( C(X) \) is said to be [...] if, under the extended code \( C^+ \), no two distinct strings have the same encoding. So:

\[ \forall x,y \in \mathcal{A}_x^+, x \neq y \implies c^+(x) \neq c^+(y) \]

or equivalently:

\[ \forall x,y \in \mathcal{A}_x^+, c^+(x) = c^+(y) \implies x = y \]

A symbol code is a prefix code if [...]. A prefix code is also known as a self-punctuating or instantaneous code, as an encoded string can be decoded from left to right without having to look ahead to subsequent codewords. A prefix code is uniquely decodable.

The expected length \( L(C, X) \) of a symbol code \( C \) for ensemble \( X \) is:

\[ L(C,X) = \sum_{x \in \mathcal{A}_x} P(x)l(x) \]

Alternatively written as:

\[ L(C, X) = \sum_{i=1}^{I} p_il_i \]

Where \( I = |\mathcal{A}_x| \)

09.05.2019