簡明博弈論（Game Theory）(1)

Game theory is most often described as a branch of applied mathematics and economics that studies situations where players choose different actions in an attempt to maximize their returns. The essential feature, however, is that it provides a formal modelling approach to social situations in which decision makers interact with other minds. Game theory extends the simpler optimization approach developed in neoclassical economics.

博弈論(Game Theory)，有時也稱為對策論，或者賽局理論，應用數學的一個分支, 目前在生物學，經濟學，國際關係，電腦科學, 政治學，軍事戰略和其他很多學科都有廣泛的應用。主要研究公式化了的激勵結構（遊戲或者博弈（Game)）間的相互作用。是研究具有鬥爭或競爭性質現象的數學理論和方法。也是運籌學的一個重要學科
Prisoner's dilemma：In game theory, the prisoner's dilemma is a type of non-zero-sum game in which two players can "cooperate" with or "defect" (i.e. betray) the other player. In this game, as in all game theory, the only concern of each individual player ("prisoner") is maximizing his/her own payoff, without any concern for the other player's payoff. In the classic form of this game, cooperating is strictly dominated by defecting, so that the only possible equilibrium for the game is for all players to defect. In simpler terms, no matter what the other player does, one player will always gain a greater payoff by playing defect. Since in any situation playing defect is more beneficial than cooperating, all rational players will play defect.

The unique equilibrium for this game is a Pareto-suboptimalsolution—that is, rational choice leads the two players to both play defect even though each player's individual reward would be greater if they both played cooperate. In equilibrium, each prisoner chooses to defect even though both would be better off by cooperating, hence the dilemma.

In the iterated prisoner's dilemma the game is played repeatedly. Thus each player has an opportunity to "punish" the other player for previous non-cooperative play. Cooperation may then arise as an equilibrium outcome. The incentive to defect is overcome by the threat of punishment, leading to the possibility of a cooperative outcome. If the game result is infinitely repeated, cooperation may be a Nash equilibrium although both players defecting always remains an equilibrium.

囚徒困境是博弈論的非零和博弈中具代表性的例子，反映個人最佳選擇並非團體最佳選擇。雖然困境本身只屬模型性質，但現實中的價格競爭、環境保護等方面，也會頻繁出現類似情況。

單次發生的囚徒困境，和多次重複的囚徒困境結果不會一樣。

在重複的囚徒困境中，博弈被反覆地進行。因而每個參與者都有機會去「懲罰」另一個參與者前一回合的不合作行為。這時，合作可能會作為均衡的結果出現。欺騙的動機這時可能被受到懲罰的威脅所克服，從而可能導向一個較好的、合作的結果。作為反覆接近無限的數量，納什均衡趨向於帕累托最優。

The classical prisoner's dilemma

The Prisoner's dilemma was originally framed by Merrill Flood and Melvin Dresher working at RAND in 1950. Albert W. Tucker formalized the game with prison sentence payoffs and gave it the "Prisoner's Dilemma" name (Poundstone, 1992).

The classical prisoner's dilemma (PD) is as follows:

Two suspects, A and B, are arrested by the police. The police have insufficient evidence for a conviction, and, having separated both prisoners, visit each of them to offer the same deal: if one testifies for the prosecution against the other and the other remains silent, the betrayer goes free and the silent accomplice receives the full 10-year sentence. If both stay silent, both prisoners are sentenced to only six months in jail for a minor charge. If each betrays the other, each receives a two-year sentence. Each prisoner must make the choice of whether to betray the other or to remain silent. However, neither prisoner knows for sure what choice the other prisoner will make. So this dilemma poses the question: How should the prisoners act?

The dilemma can be summarized thus:

	Prisoner B Stays Silent	Prisoner B Betrays
Prisoner A Stays Silent	Both serve six months	Prisoner A serves ten years Prisoner B goes free
Prisoner A Betrays	Prisoner A goes free Prisoner B serves ten years	Both serve two years

The dilemma arises when one assumes that both prisoners only care about minimizing their own jail terms. Each prisoner has two options: to cooperate with his accomplice and stay quiet, or to defect from their implied pact and betray his accomplice in return for a lighter sentence. The outcome of each choice depends on the choice of the accomplice, but each prisoner must choose without knowing what his accomplice has chosen to do.

Let's assume the protagonist prisoner is working out his best move. If his partner stays quiet, his best move is to betray as he then walks free instead of receiving the minor sentence. If his partner betrays, his best move is still to betray, as by doing it he receives a relatively lesser sentence than staying silent. At the same time, the other prisoner's thinking would also have arrived at the same conclusion and would therefore also betray.

If reasoned from the perspective of the optimal outcome for the group (of two prisoners), the correct choice would be for both prisoners to cooperate with each other, as this would reduce the total jail time served by the group to one year total. Any other decision would be worse for the two prisoners considered together. When the prisoners both betray each other, each prisoner achieves a worse outcome than if they had cooperated.

This demonstrates very elegantly that in a non-zero sum game the Pareto optimum and the Nash Equilibrium can be opposite. In other words, that a non-zero sum game may not have a solution that is both optimal and stable; the Pareto optimum can be unstable and the Nash equilibrium can be sub-optimal.

Alternately, the "Stay Silent" and "Betray" strategies may be known as "don't confess" and "confess", or the more standard "cooperate" and "defect", respectively.

	甲沉默（合作）	甲認罪（背叛）
乙沉默（合作）	二人同服刑半年	甲即時獲釋；乙服刑10年
乙認罪（背叛）	甲服刑10年；乙即時獲釋	二人同服刑2年

如同博弈論的其他例證，囚徒困境假定每個參與者（即「囚徒」）都是利己的，即都尋求最大自身利益，而不關心另一參與者的利益。參與者某一策略所得利益，如果在任何情況下都比其他策略要低的話，此策略稱為「嚴格劣勢」，理性的參與者絕不會選擇。另外，沒有任何其他力量干預個人決策，參與者可完全按照自己意願選擇策略。

囚徒到底應該選擇哪一項策略，才能將自己個人的刑期縮至最短？兩名囚徒由於隔絕監禁，並不知道對方選擇；而即使他們能交談，還是未必能夠盡信對方不會反口。就個人的理性選擇而言，檢舉背叛對方所得刑期，總比沉默要來得低。試設想困境中兩名理性囚徒會如何作出選擇：

若對方沉默、背叛會讓我獲釋，所以會選擇背叛。
若對方背叛指控我，我也要指控對方纔能得到較低的刑期，所以也是會選擇背叛。

二人面對的情況一樣，所以二人的理性思考都會得出相同的結論——選擇背叛。背叛是兩種策略之中的支配性策略。因此，這場博弈中唯一可能達到的納什均衡，就是雙方參與者都背叛對方，結果二人同樣服刑2年。

這場博弈的納什均衡，顯然不是顧及團體利益的帕累托最優解決方案。以全體利益而言，如果兩個參與者都合作保持沉默，兩人都只會被判刑半年，總體利益更高，結果也比兩人背叛對方、判刑2年的情況較佳。但根據以上假設，二人均為理性的個人，且只追求自己個人利益。均衡狀況會是兩個囚徒都選擇背叛，結果二人判決均比合作為高，總體利益較合作為低。這就是「困境」所在。例子漂亮地證明了：非零和博弈中，帕累托最優和納什均衡是相衝突的。

changjiang

嘯傲江湖之風雲再起

changjiang 發表在痞客邦留言(0) 人氣( 771 )

個人分類：經世濟民

▲top

請先登入以發表留言。

嘯傲江湖之風雲再起

近期文章

最新迴響

個人頭像

文章分類

嘯傲江湖之風雲再起