next up previous
Next: Subthreshold CMOS Circuits Up: Building Bayesian Networks with Previous: Graphical Models

Subsections

The Sum-Product Algorithm

Bayesian networks are so named because the can be solved using Bayes rule:

$\displaystyle P(A\vert B) = \frac{P(A\vert B)P(A)}{P(B)}$ (1)

Unfortunately, for large networks, Bayes' simple rule becomes unruly, and so it is necessary to use a message-passing algorithm. The sum-product algorithm is a general form of the forward-backward algorithm [8]. It attempts to compute various marginal functions associated with the global function in a factor graph. Once one or more nodes have been observed to be of some value, messages are passed inward from an arbitrary set of nodes at the edge of a graph. When they reach an edge, their direction is reversed, and when they reach their origin, they are absorbed.

Message Passing

The first messages are passed from some set of singly-connected function nodes to the variable nodes that they depend upon. No computation is necessary in this step because the messages are simple identity messages that specify the apriori knowledge stored in the function that creates them. The variable nodes then pass the same identity message along to the next function node that they are connected to. These steps are marked ``1" and ``2" respectively in figure 2 below.

Figure 2: Message passing pattern for the sum-product algorithm. Dark nodes represent functions and open nodes represent variables. Each arrow represents a message being passed, and the number by each arrow specifies the time at which that message is sent.
\fbox{\includegraphics[width=15cm]{graphics/software/messages.eps}}

Assuming the neighboring function node is connected to more than one variable, it must wait for all but one of its neighbors to send it a message. Once those messages are received, it creates a message of its own and passes that message along to the variable that did not send it a message. This message is created from the messages passed to it and its internal apriori knowledge. Equation 2 show how this message is created.

$\displaystyle \sum_{\sim\{x\}} \left ( f(x)\prod_{y\in n(f)\setminus\{x\}}\mu_{h\rightarrow x}(y) \right )$ (2)

In equation 2, $ x = n(f)$ is the set of arguments to the function $ f$, and $ \mu_{h\rightarrow x}(y)$ is a message from some neighbor $ y$ to the function. This marginalization is somewhat difficult to understand, but is quick to compute. In English, for each possible argument to $ f$, we compute the value of the $ f$, which is a piece of apriori knowledge, then multiply it by each message we have received according to that same argument. The sum of these computations becomes the message to the next variable, marked ``3" in figure 2.

Step 4 is the same as step 2. In step 5, the functions at the periphery of the graph reverse the flow of messages, performing the same marginalization as in step 3. Step 6 requires a different type of marginalization:

$\displaystyle \prod_{h\in n(x)\setminus \{f\}}\mu_{h\rightarrow x}(x)$ (3)

In this case, $ x$ is the variable creating the message, and $ n(x)\setminus \{f\}$ is the set of functions that have sent $ x$ a message. $ n(x)\setminus \{f\}$ does not include the message received while the messages were traveling in the other direction. Functions that receive only a single message may omit this marginalization, as it is trivial, and simply pass the message on to its one neighbor not in $ n(x)\setminus \{f\}$.

The final messages, 7 and 8, are computed as shown above, and the algorithm terminates where it began.

Computing marginal posterior distributions

While messages themselves are not useful in making decisions, they facilitate the calculation of the marginal posterior distribution at each variable in the graph. Once a variable has received all of the messages in the forward and backwards directions, computation of the distribution is trivial--simply the product of the messages received from every neighboring function, scaled so their sum is unity [2].

$\displaystyle p(x\vert\textbf{v}) = \beta \prod_{h\in n(x)\{f\}}\mu_{h\rightarrow x}(x)$ (4)

Here, $ \textbf{v}$ is the set of observations, and $ \beta$ is computed so that $ \sum_x P(x\vert\textbf{v}) = 1$. The resulting value represents the probability that any particular node did or did not happen given the observations made and the set of relationships between the nodes of the graph, which is the solution to the network.


next up previous
Next: Subthreshold CMOS Circuits Up: Building Bayesian Networks with Previous: Graphical Models
Samuel Luckenbill 2002-05-08