Naively placed labels overlap a lot, resulting in a map that's very hard to read. Therefore, a GIS must allow a few possible placements of each label, and often also an option of resizing, rotating, or even removing the label. Then, it must select a set of placements that results in the least overlap, and has other desirable properties. For every but the most trivial setups, the problem is NP-Hard.
The simplest greedy algorithm places consecutive labels on the map in positions that result in minimal extra overlap of labels. Its results are not satisfactory even for very simple problems, but it is extremely fast.
Slightly more complex algorithms rely on local optimization to reach a local optimum of a placement evaluation function - in each iteration placement of a single label is moved to another position, and if it improves the result, the move is preserved. It performs reasonably well for maps that are not too densely labelled. Slightly more complex variations try moving 2 or more labels at the same time. The algorithm ends after reaching some local optimum.
The algorithm that gives the best results with relatively good performance - simulated annealing - is very simple. It works like local optimization, but it may keep a change even if it worsens the result. The chance of keeping such change is , where is change is the evaluation function, and is the temperature. The temperature is gradually lowered according to the annealing schedule. When the temperature is high, simulated annealing performs almost random changes to the label placement, being able to escape a local optimum. Later, when hopefully a very good local optimum has been found, it behaves in a manner similar to local optimization. The main challenges in developing a simulated annealing solution are choosing a good evaluation function and a good annealing schedule - generally too fast cooling will degrade the solution, and too slow cooling will degrade the performance, but the schedule is usually quite a complex algorithm, with more than just one parameter.
Another class of algorithms are the various genetic algorithms. They are often quite complex, and they usually perform worse than simulated annealing.
Other algorithms are also used, like various graph solutions, integer programming etc.
One simple optimization that's important on real maps is dividing a set of labels into smaller sets that can be solved independently. Two labels are rivals if they can overlap in one of the possible placements. Transitive closure of this relation divides the set of labels into possibly much smaller sets. On uniformly and densely labelled maps, usually the single set will contain the majority of labels, and on maps for which the labelling is not uniform it may bring very big performance benefits. For example when labelling a map of the world, North America may be probably labelled independently from Eurasia etc.