The view that MDL is an approximation to Bayesian model comparison is explained in David MacKay's Information Theory, Inference, and Learning Algorithms. (see link below) As Shannon showed, the optimal description length for data D, given assumptions H, is the `Shannon information content' log_2(1/P(D|H)). And in Bayesian inference, the likelihood of the model H (also known as the evidence for the model) is P(D|H). Thus an accurate implementatin of MDL should return precisely the evidence.
External links