Minimum description length

The concept of minimizing description length as a practical method of carrying out model comparison in the light of data was pioneered by Wallace and Boulton. Jorma Rissanen's name is also strongly associated with this concept. The MDL community can be divided into two, according to whether the researcher views MDL as being equivalent to Bayesian model comparison, or different.

The view that MDL is an approximation to Bayesian model comparison is explained in David MacKay's Information Theory, Inference, and Learning Algorithms. (see link below) As Shannon showed, the optimal description length for data D, given assumptions H, is the `Shannon information content' log_2(1/P(D|H)). And in Bayesian inference, the likelihood of the model H (also known as the evidence for the model) is P(D|H). Thus an accurate implementatin of MDL should return precisely the evidence.

External links

On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay, has many chapters on Bayesian methods, including introductory examples; compelling arguments in favour of Bayesian methods; state-of-the-art Monte Carlo methods, message-passing methods, and variational methods; and examples illustrating the intimate connections between Bayesian inference and data compression.