Prisoner's dilemma

The prisoner's dilemma is an example of a non-zero-sum game that demonstrates a conflict between rational individual behavior and the benefits of cooperation in certain situations.

Table of contents

1 Brief Outline
2 The Iterated Prisoner's Dilemma
3 Variants

3.1 Chicken
3.2 Assurance Game

4 Friend or Foe
5 References
6 See Also
7 External Link

Brief Outline

The classical Prisoner's Dilemma is as follows:

Two suspects are arrested by the police. The police have insufficient evidence for a conviction, and having separated them, visit each of them and offer the same deal: If you confess and your accomplice remains silent, he gets the full 10-year sentence and you go free. If you both stay silent, all we can do is give you both 6 months for a minor charge. If you both confess, you each get 5 years.

Each prisoner individually reasons like this: Either my accomplice confessed or he did not. If he did, and I remain silent, I get 10 years, while if I confess I only get 5. If he remained silent, then by confessing I go free, while by remaining silent I get 6 months. In either case, it is better for me if I confess. Since each of them reasons the same way, both confess, and get 5 years. But although each followed what seemed to be rational argument to achieve the best result, if they had instead both remained silent, they would only have served 6 months.

This illustrates that if the two had been able to communicate and cooperate, they would have been able to achieve a result better than what each could achieve alone. There are many situations in the social sciences that have this same property, so understanding these situations and how to resolve them is important. In particular, it demonstrates that there are situations where na�ve reasoning in one's own best interest does not always lead to a best result.

In political science, the Prisoner's Dilemma is often used to illustrate the problem of two states getting into an arms race. Both will reason that they have two options, either to increase military expenditure or to make an agreement to reduce weapons. Neither state can be certain that the other one will keep to such an agreement; therefore, they both incline towards military expansion. The irony is that both states seem to act rationally, but the result is completely irrational.

The Iterated Prisoner's Dilemma

In his book The Evolution of Cooperation (1984), Robert Axelrod explored an extension to the classical PD scenario, which he called the "iterated prisoner's dilemma." In this, participants have to choose their mutual strategy again and again, and have memory of their previous encounters. The example he used was two people meeting and exchanging closed bags, with the understanding that one of them contains money, and the other contains an item being bought. Either player can choose to honor the deal by putting into his bag what he agreed, or he can defect by handing over an empty bag. This mirrors the prisoner's dilemma: for each participant individually, it is better to defect, though both would be better off if both cooperated.

Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, "greedy" strategies tended to do very poorly in the long run while more "altruistic" strategies did better, as judged purely by self-interest. He used this to show a possible mechanism to explain what had previously been a difficult hole in Darwinian theory: how can apparently altruistic behavior evolve from the purely selfish mechanisms of natural selection?

The best deterministic strategy was found to be "Tit for Tat", which Anatol Rapoport developed and entered in a contest for the best strategy that could be written as a computer algorithm. It was as simple as any entered, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, do what your opponent did on the previous move. A slightly better strategy is "Tit for Tat with forgiveness". When your opponent defects, on the next move you sometimes cooperate anyway with small probability (around 1%-5%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the lineup of opponents. "Tit for Tat with forgiveness" is best when miscommunication is introduced to the game. That means that sometimes your move is incorrectly reported to your opponent: you cooperate but your opponent hears that you defected.

For the iterated Prisoner's Dilemma, it is not always correct to say that any given strategy is the best. For example, consider a population where everyone defects every time, except for a single individual following the Tit-for-Tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being Tit-for-Tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game. Simulations of populations have been done, where individuals with low scores die off, and those with high scores reproduce. The mix of algorithms in the final population generally depends on the mix in the initial population.

If an iterated Prisoner's Dilemma is going to be iterated exactly N times, for some known constant N, then there is another interesting fact. The Nash equilibrium is to defect every time. That is easily proved by induction. You might as well defect on the last turn, since your opponent will not have a chance to punish you. Therefore, you will both defect on the last turn. Then, you might as well defect on the second-to-last turn, since your opponent will defect on the last no matter what you do. And so on. For cooperation to remain appealing, then, the future must be indeterminate for both players. One solution is to make the total number of turns N random.

Another odd case is "play forever" prisoner's dilemma. The game is repeated an infinite number of times, and your score is the average (suitably computed).

The Prisoner's Dilemma game is fundamental to certain theories of human cooperation and trust. On the assumption that transactions between two people requiring trust can be modelled by the Prisoner's Dilemma, cooperative behavior in populations may be modelled by a multi-player, iterated, version of the game. It has, consequently, fascinated many, many scholars over the years. A not entirely up-to-date estimate (Grofman and Pool 1975) puts the count of scholarly articles devoted to it at over 2,000.

Variants

There are also some variants of the game, with subtle but important differences in the payoff matrices, which are listed below:-

Chicken

Another important non zero-sum game type is called "Chicken", in which it is always in a player's interest to co-operate whatever the other player does. In Chicken mutual defection is the worst possible outcome (hence an unstable equilibrium), but in the Prisoner's Dilemma the worst possible outcome is cooperating while the other person defects (so both defecting is a stable equilibrium). In both games, "both cooperate" is an unstable equilibrium.

A typical payoff matrix would read:

If both players cooperate, they each get +5.
If one cooperates and the other defects, the first gets +1 and the other gets +10.
If both defect, they each get -20.

Chicken is named after the car racing game. Two cars drive towards each other for an apparent head-on collision. Each player can swerve to avoid the crash (cooperate) or keep going (defect). Another example often given is that of two farmers who use the same irrigation system for their fields. The system can be adequately maintained by one person, but both farmers gain equal benefit from it. If one farmer does not do his share of maintenance, it is still in the other farmer's interests to do so, because he will be benefiting whatever the other one does. Therefore, if one farmer can establish himself as the dominant defector - ie, if the habit becomes ingrained that the other one does all the maintenance work - he will be likely to continue to do so.

Assurance Game

An Assurance Game has a similar structure to the Prisoner's Dilemma, except that the rewards for mutual co-operation are higher than those for defection. A typical pay-off matrix would read:

If both players cooperate, they each get +10.
If you cooperate and the other guy defects, you get +1 and they get +5.
If both defect, they each get +3.

The Assurance Game is potentially very stable because it always gives the highest rewards to players who establish a habit of mutual co-operation. However, there is still the problem that the players might not realise that it is in their interests to co-operate. They might, for example, mistakenly believe that they are playing a Prisoner's Dilemma or Chicken game, and arrange their strategies accordingly.

Friend or Foe

Friend or Foe is a game show currently airing on the Game Show Network. It is an example of the Prisoner's Dilemma game tested by real people, but in an artificial setting. On the game show, three pairs of people compete. As each pair is eliminated, they play a game of Prisoner's Dilemma to determine how their winnings are split. If they both cooperate ("Friend"), they share the winnings 50-50. If one cooperates and the other defects ("Foe"), the defector gets all the winnings and the cooperator gets nothing. If both defect, both leave with nothing. Notice that the payoff matrix is slightly different from the standard one given above, as the payouts for the "both defect" and the "I cooperate and opponent defects" cases are identical. This makes the "both defect" a neutral equilibrium, compared with being a stable equilibrium in standard Prisoner's Dilemma. If you know your opponent is going to vote "Foe", then your choice does not affect your winnings. In a certain sense, "Friend or Foe" is between "Prisoner's Dilemma" and "Chicken".

The payoff matrix is

If both players cooperate, they each get +1.
If both defect, they each get 0.
If you cooperate and the other person defects, you get +0 and they get +2.

Friend or Foe would be useful for someone who wanted to do a real-life analysis of Prisoner's Dilemma. Notice that you only get to play once, so all the issues involving repeated playing are not present. The "tit for tat" strategy does not come into play, because you only play once.

References

Robert Axelrod The Evolution of Cooperation (1984)
Grofman and Pool (1975). Bayesian Models for Iterated Prisoner's Dilemma Games. General Systems 20:185-94.
William Poundstone Prisoner's Dilemma: John von Neumann, Game Theory, and the Puzzle of the Bomb. Doubleday, 1992. ISBN 0385415672. A wide-ranging popular introduction, as the title indicates.

External Link

Play the iterated prisoner's dilemma online