Bayesian networks are so named because the can be solved using Bayes rule:
Unfortunately, for large networks, Bayes' simple rule becomes unruly, and so it is necessary to use a message-passing algorithm. The sum-product algorithm is a general form of the forward-backward algorithm [8]. It attempts to compute various marginal functions associated with the global function in a factor graph. Once one or more nodes have been observed to be of some value, messages are passed inward from an arbitrary set of nodes at the edge of a graph. When they reach an edge, their direction is reversed, and when they reach their origin, they are absorbed.
The first messages are passed from some set of singly-connected function nodes to the variable nodes that they depend upon. No computation is necessary in this step because the messages are simple identity messages that specify the apriori knowledge stored in the function that creates them. The variable nodes then pass the same identity message along to the next function node that they are connected to. These steps are marked ``1" and ``2" respectively in figure 2 below.
|
Assuming the neighboring function node is connected to more than one variable, it must wait for all but one of its neighbors to send it a message. Once those messages are received, it creates a message of its own and passes that message along to the variable that did not send it a message. This message is created from the messages passed to it and its internal apriori knowledge. Equation 2 show how this message is created.
In equation 2,
is the set of arguments to the
function
, and
is a message from some
neighbor
to the function. This marginalization is somewhat difficult
to understand, but is quick to compute. In English, for each possible
argument to
, we compute the value of the
, which is a piece of
apriori knowledge, then multiply it by each message we have received
according to that same argument. The sum of these computations becomes
the message to the next variable, marked ``3" in figure 2.
Step 4 is the same as step 2. In step 5, the functions at the periphery of the graph reverse the flow of messages, performing the same marginalization as in step 3. Step 6 requires a different type of marginalization:
In this case,
is the variable creating the message, and
is the set of functions that have sent
a
message.
does not include the message received
while the messages were traveling in the other direction. Functions that
receive only a single message may omit this marginalization, as it is
trivial, and simply pass the message on to its one neighbor not in
.
The final messages, 7 and 8, are computed as shown above, and the algorithm terminates where it began.
While messages themselves are not useful in making decisions, they facilitate the calculation of the marginal posterior distribution at each variable in the graph. Once a variable has received all of the messages in the forward and backwards directions, computation of the distribution is trivial--simply the product of the messages received from every neighboring function, scaled so their sum is unity [2].
Here,
is the set of observations, and
is computed so
that
. The resulting value represents the
probability that any particular node did or did not happen given the
observations made and the set of relationships between the nodes of the
graph, which is the solution to the network.