The name comes from annealing in metallurgy, which is a technique involving heating and controlled cooling of a material in order to improve its properties by removing crystal defects. In this process the metal reaches its most stable configuration, minimizing its total internal energy.
The principles involved in simulated annealing are similar. Each point in the search space has an energy associated with it, which indicates how good it is. The goal is to find the point with the minimum energy. The algorithm starts off at an arbitrary point; at each step chooses some neighbor of the current point and moves to that point with a certain probability. Neighbors are points that are "close" to each other in a problem-dependent fashion (For example, in the traveling salesman problem, we may define two tours to be neighbors if and only if one can be converted to the other by interchanging a pair of adjacent cities). The probability of transition is a function of the energy difference between the two points and a global time-dependent parameter called the temperature.
Let δE be the difference in energy, and T the temperature. If δE is negative (i.e., the new point has a smaller energy) then the algorithm moves to the new point with probability 1. If not, it does so with probability e-δE/T. This rule is deliberately similar to the Maxwell-Boltzmann distribution governing the distribution of molecular energies.
Together with the genetic algorithm and the ant colony algorithm, simulated annealing is an important technique for solving optimization problems with large solution spaces when specific techniques do not apply. Interestingly, all these algorithms were discovered by observing the computational achievements of nature.
See also: Automatic label placementControlling the temperature
It is clear that the behavior of the algorithm is crucially dependent on the temperature: If T is 0, it reduces to the greedy algorithm, always moving to a point of lower energy. If T is infinity, it moves around randomly. In general, the algorithm is sensitive to coarser energy variations for large T and finer variations for small T. This is exploited in designing the annealing schedule, which is the procedure for varying T with time (by time is meant the number of iterations). At first T is set to infinity, and is gradually decreased to zero ("cooling"). This enables the algorithm to initially get to the general region of the search space containing good solutions, and later hone in on the optimum. The exact annealing schedule, however, cannot be generically prescribed; it must be chosen depending on the problem. It has been observed that designing an effective cooling function is more of an art than a science!