\( \newcommand{\matr}[1] {\mathbf{#1}} \newcommand{\vertbar} {\rule[-1ex]{0.5pt}{2.5ex}} \newcommand{\horzbar} {\rule[.5ex]{2.5ex}{0.5pt}} \newcommand{\E} {\mathrm{E}} \)
deepdream of
          a sidewalk
Show Question
\( \newcommand{\cat}[1] {\mathrm{#1}} \newcommand{\catobj}[1] {\operatorname{Obj}(\mathrm{#1})} \newcommand{\cathom}[1] {\operatorname{Hom}_{\cat{#1}}} \newcommand{\multiBetaReduction}[0] {\twoheadrightarrow_{\beta}} \newcommand{\betaReduction}[0] {\rightarrow_{\beta}} \newcommand{\betaEq}[0] {=_{\beta}} \newcommand{\string}[1] {\texttt{"}\mathtt{#1}\texttt{"}} \newcommand{\symbolq}[1] {\texttt{`}\mathtt{#1}\texttt{'}} \newcommand{\groupMul}[1] { \cdot_{\small{#1}}} \newcommand{\groupAdd}[1] { +_{\small{#1}}} \newcommand{\inv}[1] {#1^{-1} } \newcommand{\bm}[1] { \boldsymbol{#1} } \require{physics} \require{ams} \require{mathtools} \)
Math and science::INF ML AI

Belief networks

A belief network (called a Bayesian network by Koller) is a grapical model that represents a distribution of the form
\[ p(x_1, ..., x_D) = \prod_{i=1}^{D} p(x_i \vert pa(x_i)) \\\text{where } pa(x_i) \text{ represents the parental variables of variable } x_i\]

Parental variables is just a term for the variables that appear as conditional variables for a given variable.

A belief network represents a distribution as a directed graph where the \( i \)th node corresponds to the factor \( p(x_i \vert pa(x_i)) \), with directed connections from all \( pa(x_i) \) nodes to the \( i \)th node. 

According to the rules above, a distribution \( p(x_1, x_2, x_3, x_4) \) with no independence statements can be modelled with a cascade graph like so:


Derivation

Without loss of generality, using Bayes' rule the distribution \( p(x_1, ..., x_n) \) can be represented as:

\[
\begin{aligned}
p(x_1, ..., x_n) &= p(x_1 \vert x_2, ..., x_n)p(x_2, ..., x_n) \\
&= p(x_1 \vert x_2, ..., x_n)p(x_2 \vert x_3, ..., x_n)p(x_3, ..., x_n) \\
&= p(x_n) \prod_{i=1}^{n-1} p(x_i \vert x_{i+1}, ..., x_n)
\end{aligned}
\]

More accurately, a belief network represents a set of distributions that have the same dependency relationships, not just a single distribution with speficied probabilities.

Example

Deleting any connection represents the existance of some independence. For example:


Source

Bayesian Reasoning and Machine Learning
David Barber
p38-39