We have seen that if all variables are mutually independent, it is easy to calculate the joint distribution (e.g. n coin flips) by multiplying the individual probabilities. However, unconditional independence is rare, and unrealistic. It is also useless. Some variables are easy to measures and others are hidden (latent) and we hope we can measure them. If everything is independent of everything else than we can't measure the latent variables.
We also saw that if variables are not mutually independent, we can use the chain rule to calculate the joint distribution.
What if don't have full Independence but a set of variables is independent of another sets of variables? We can exploit this partial independence that we call conditional independence to simplify the calculations.
Definition 10 (Conditional independence) Let X, Y, and Z be random variables. We say that X is conditionally independent of Y given Z provided the probability distribution governing X is independent of the value of Y given the value of Z; that is:
more compactly, we write
or equivalently:
We write:
What happens to the chain rule in the case of conditional independence?} $$P(Z, X, Y) = P(Z) P(X|Z) P(Y|Z, X)$$
Given that:
Then we simplify the chain rule as follows:
Conditional probability is a key concept in graphical models such as Bayes Nets, that encodes probabilistic relationships among variables.