Medical Science

Seven rules to avoid the tragedy of the commons

2019-01-04 06:21:46

Highlights


  • We propose strategies to stabilize cooperation in the iterated public-goods game.

  • They can recover cooperation in the presence of implementation error.

  • They are secured from repeated exploitation.

  • They suppress neutral drift to unconditional cooperation.

  • For an n-person case, the memory length of such strategies cannot be less than n.


Abstract

Cooperation among self-interested players in a social dilemma is fragile and easily interrupted by mistakes. In this work, we study the repeated n-person public-goods game and search for a strategy that forms a cooperative Nash equilibrium in the presence of implementation error with a guarantee that the resulting payoff will be no less than any of the co-players’. By enumerating strategic possibilities for n=3, we show that such a strategy indeed exists when its memory length m equals three. It means that a deterministic strategy can be publicly employed to stabilize cooperation against error with avoiding the risk of being exploited. We furthermore show that, for general n-person public-goods game, m ≥ n is necessary to satisfy the above criteria.

1. Introduction

Conflicts between individual and collective interests are observed across a variety of fields from genetics to international politics. For example, genes can inflict damage to other genes in the same genome for spreading at a higher rate even if it threatens the ‘host’ (Archetti, 2003, Burt, Trivers, 2009, Friberg, Rice, 2008, Haig, 1993, Hurst, Atlan, Bengtsson, 1996). Some microorganisms spend energy to produce chemicals that are beneficial for the whole population, which can be exploited by non-cooperating mutants (Crespi, 2001, Gore, Youk, Van Oudenaarden, 2009, Greig, Travisano, 2004, Rankin, Bargum, Kokko, 2007, Velicer, 2003). The same conflict exists in cooperative mammals when they hunt in group or stand guard against predators (Bednarz, 1988, Clutton-Brock, O’riain, Brotherton, Gaynor, Kansky, Griffin, Manser, 1999, Packer, Scheel, Pusey, 1990). One of such cooperative species is Homo sapiens, the political animal (Kollock, 1998). Not surprisingly, human societies have constantly experienced the Tragedy of the Commons (Baer, Harte, Haya, Herzog, Holdren, Hultman, Kammen, Norgaard, Raymond, 2000, Dietz, Ostrom, Stern, 2003, Hardin, 1968, Ostrom, Burger, Field, Norgaard, Policansky, 1999) and struggled to preserve common goods against it. There are a few conventional ways to achieve this goal: The first is the ‘Leviathan’ solution involving governmental regulations (Carruthers, Stoner, 1981, Heilbroner, 1980, Ophuls, 1973). The next is the market mechanism dealing with the common resource as private property (Demsetz, 1974, Sinn, 1984, Smith, 1981). The third one is institutional design for collective actions of civil society (Ostrom, 1990) such as punishment (Boyd, Gintis, Bowles, Richerson, 2003, Boyd, Richerson, 1992, Chen, Szolnoki, Perc, 2015, De Quervain, Fischbacher, Treyer, Schellhammer, et al., 2004, Fehr, Gächter, 2002, Gürerk, Irlenbusch, Rockenbach, 2006, Hauert, Traulsen, Brandt, Nowak, Sigmund, 2007, Szolnoki, Perc, 2015, Szolnoki, Perc, 2017) combined with a reputation system (Brandt, Hauert, Sigmund, 2003, Rockenbach, Milinski, 2006). However, the above answers are becoming hard to justify on a global scale because there is no world government (Paavola, 2012), market failure is likely to occur (Stern, 2007), and institutionalism tends to fail for a large group of people (Araral, 2014). The question is then what can be done about players that refuse to contribute.

To give a concrete form to this question, let us consider the n-person public-goods (PG) game (Boyd, Richerson, 1988, Milinski, Semmann, Krambeck, Marotzke, 2006, Milinski, Sommerfeld, Krambeck, Reed, Marotzke, 2008, Perc, Jordan, Rand, Wang, Boccaletti, Szolnoki, 2017). In our setting, each player may either cooperate (c) by contributing a private token to a public pool or defect (d) by refusing it. The tokens are then multiplied by a factor ρand equally distributed to the players. The multiplication factor ρ must be greater than one and smaller than the number of players to describe the conflict between individual and collective interests. A player’s payoff is then defined as

(1){ρncnwhentheplayercooperates(c)1+ρncnwhentheplayerdefects(d),
where nc is the number of cooperators including the focal player. Everyone prefers d, and zero contribution is a Nash equilibrium of this one-shot game. In many circumstances, however, the players are bound to interact repeatedly for a long time. Under the shadow of the future, it becomes possible to devise strategies conditioned on previous interactions, whereby reciprocity (Frank, Obradovich, Sun, Woon, LeVeck, Rahwan, 2018, Hamilton, Axelrod, 1981, Trivers, 1971) comes into play in organizing collective efforts for the public pool. For example, a generalized version of tit-for-tat (TFT) forms a Nash equilibrium if error probability is strictly zero (Boyd, Richerson, 1988, Lichbach, 1992) in a noiseless environment (Fudenberg, Levine, 1998, Szabó, Fath, 2007, Szolnoki, Perc, Szabó, 2009).


In terms of the repeated PG game, our question is the following: What can we advise players of this game in the presence of implementation error, if they wish to achieve full cooperation without being exploited repeatedly by others? The direct n-person generalization of TFT cannot be an answer because it results in a series of wasteful retaliation when someone defects by mistake. An all-or-none strategy (Pinheiro et al., 2014), a generalization of win-stay-lose-shift (Kraines, Kraines, 1989, Liu, Chen, Zhang, Wang, Perc, 2012, Nowak, Sigmund, 1993, Posch, 1999), is a strong candidate for our purpose because it constitutes a subgame perfect Nash equilibrium if the benefit-cost ratio of cooperation is sufficiently high (Hilbe et al., 2017). Unfortunately, the cost-benefit analysisper se is often a difficult issue in practice. Another drawback is that this strategy systematically yields a higher payoff to its co-player who plays unconditional defection (AllD). Although the idea of the zero-determinant (ZD) strategies (Hilbe, Wu, Traulsen, Nowak, 2014, Press, Dyson, 2012, Szolnoki, Perc, 2014, Szolnoki, Perc, 2014) may help to have control over the situation in the n-person case (Pan et al., 2015), the probabilistic retaliation prescribed by most ZD strategies can hardly be an available option when it comes to policymaking (Dror, 1983). Indirect reciprocity (McNamara, Doodson, 2015, Nax, Perc, Szolnoki, Helbing, 2015, Nowak, Sigmund, 1998, Ohtsuki, Iwasa, 2004, Ohtsuki, Iwasa, Nowak, 2009, Panchanathan, Boyd, 2004, Sugden, 2004, Suzuki, Akiyama, 2005, Suzuki, Akiyama, 2007, Suzuki, Akiyama, 2007) is also difficult to bring into action because it requires an agreement on every player’s reputation within a time scale of successive interactions. To sum up, we need a deterministic strategy of direct reciprocity such that solves the repeated PG game with an arbitrary multiplication factor ρ ∈ (1, n) and nonzero error probability e > 0. It is reasonable to say that the players may safely adopt a certain strategy Ω if it satisfies the following three criteria (Yi et al., 2017) (see Fig. 1(a)).

  • 1.

Efficiency: Mutual cooperation is achieved with probability one as error probability eapproaches zero, when all players use this strategy Ω. Mathematically speaking, we consider a strategy profile P={s1,s2,,si,,sn} of n players, and the relevant observable is player i’s long-term payoff, defined as

(2)filimT1Tt=0T1Fi(t),

where Fi(t) denotes the instantaneous payoff at time t that player i receives with si. When everyone uses Ω with e > 0, the Markovian dynamics of strategic interaction converges to a unique stationary distribution, from which fi is readily calculated for a given strategy profile (Nowak, 1990, Nowak, Sigmund, El-Sedy, 1995). The efficiency criterion essentially means that lime0+fi=ρ when P=PΩ{Ω,Ω,,Ω}.


  • 2.

    Defensibility: The strategy Ω ensures that none of the co-players can obtain higher long-term payoffs against Ω regardless of the co-players’ strategies and initial state when e=0. This condition assures lime0+(fifj)0, where si=Ω and jdenotes any possible co-player of i. When combined with efficiency, this criterion is strong enough for PΩ to be a cooperative Nash equilibrium, in which lime0+fi=ρ for every player i. To verify this statement, suppose that a player, say j, unilaterally switches to another strategy while the others keep using Ω. Player j’s resulting long-term payoff is denoted as fj, and each of the other Ω-using players obtains a certain payoff ϕ, which may not equal ρ. According to the Pareto optimality of SΩ, the total payoff of the n players becomes less than or equal to that of PΩ, i.e., (n1)ϕ+fjnρ. Defensibility means that ϕfj for j’s every possible choice, which leads to the conclusion that fjρ.

  • 3.

  • Distinguishability: If si=Ω and all its co-players are unconditional cooperators (AllC), player i can exploit them to earn a strictly higher long-term payoff than the co-players’. That is, fi > fj when j is an AllC player. This may be sharpened further by requiring the same inequality for every possible number of Ω-players between one and n1, but such refinement turns out to be irrelevant in this work. This criterion is introduced to suppress invasion of AllC due to neutral drift (Imhof, Fudenberg, Nowak, 2005, Imhof, Fudenberg, Nowak, 2007, Imhof, Nowak, 2010). It should be noted, however, that this criterion does not fully eliminate the possibility of a second-order drift via a third strategy between Ω and AllC.


Fig. 1

Fig. 1. (a) Illustration of the three criteria imposed on a strategy Ω. [i] Efficiency: Mutual cooperation is realized when all players use Ω. [ii] Defensibility: It is guaranteed that a player using Ω never has a lower long-term payoff than those of the co-players whatever strategies they use. [iii] Distinguishability: When the co-players are naive cooperators, a player using Ω has a strictly higher payoff than the others’. (b) An example of a transition graph. Suppose that Alice takes d at the state of (cc, cd, cd). There are four next possible states (cd, dc, dc), (cd, dc, dd), (cd, dd, dc), and (cd, dd, dd) depending on the moves of Bob and Charlie. In this manner, a memory-2 strategy is represented by a graph having 26 nodes, each of which has 4 outgoing links.

Note the seemingly conflicting requirements expressed in these criteria: Ω must recover cooperation from erroneous defection while protecting itself from malicious ones. It is very doubtful that one can tell other players’ intentions, however, especially when they have longer memories and better computational power. Worse is that they may even conspire together to entrap our focal player. The dilemma between efficiency and defensibility is so severe that one often feels almost forced to compromise one of them, but we have to ask ourselves whether they are really mutually exclusive.

In fact, it is known that the criteria can be met in the Prisoner’s Dilemma (PD) game, an equivalent of the two-person PG game. The resulting strategy is based on TFT but able to correct the player’s own error (Yi et al., 2017). It is tempting to apply this strategy to the n-person game, e.g., by reducing the situation to an effective two-person game between one and the other players. However, this idea does not work when n > 2 for the following reason: Suppose that everyone has adopted the strategy. When someone made a mistake, the other players will respond by taking d to defend themselves. Although the first player tries to redeem the mistake, the point is that the other n1 players see each other choosing d. As long as some other players are defecting, the strategy will advise against returning to c for the sake of defensibility, so it fails to reestablish cooperation. This ‘observer’ effect illustrates a fundamental difficulty of the n-person game when n > 2.

Now, our question on the Tragedy of the Commons boils down to whether it is possible to meet the three criteria of efficiency, defensibility, and distinguishability for n > 2. In this work, we report two findings: First, when the number of players is n=3, we can explicitly construct Ω whose memory length m is three. Second, we show for the general n-person case that m must be greater than or equal to n for a strategy to satisfy the efficiency and the defensibility criteria simultaneously. In other words, the Tragedy of the Commons among three players can be safely solved for an arbitrary multiplication factor, in the sense of a cooperative Nash equilibrium, when the error probability is vanishingly small yet nonzero. At the same time, such a solution will become more and more intricate as the number of players increases.

2. Methods

Let us explain how to check the above criteria in the three-person PG game by means of direct enumeration. First of all, we have to define the game. We will denote the three players as Alice, Bob, and Charlie, respectively. The payoff matrix from Alice’s point of view is defined as

(3)M(012cρ23ρ13ρd1+23ρ1+13ρ1),
where the column indices represent the number of defectors among Bob and Charlie. The next step is to choose an appropriate strategy space. It is common to classify strategies according to their memory length m (Baek et al., 2016). For example, if a strategy has m=2, it refers to two previous time steps to make a decision at time t. The players’ memory state in total can be written as St=(At2At1,Bt2Bt1,Ct2Ct1), where At, Bt, and Ct represent the moves taken from {c, d} by Alice, Bob, and Charlie at time t, respectively. The number of states is thus 23m=64, but the actual number can be reduced to 40 because Alice’s moves will not be affected even if Bob and Charlie exchange their names (Table 1). For this reason, the number of possible strategies for Alice amounts to N(m=2)=2401.1×1012. This is an upper bound for direct enumeration because the number increases to N(m=3)=22885.0×1086, which is comparable to the estimated number of protons in the universe. For this reason, we begin by restricting ourselves to m=2.


m=2
AbbreviatedComplete notation
(**, 22)(**, dd, dd)
(**, 21)(**, dd, dc) or (**, dc, dd)
(**, 20)(**, dc, dc)
(**, 12)(**, cd, dd) or (**, dd, cd)
(**, 11)(**, cd, dc) or (**, dc, cd)
(**, 11)(**, dd, cc) or (**, cc, dd)
(**, 10)(**, dc, cc) or (**, cc, dc)
(**, 02)(**, cd, cd)
(**, 01)(**, cc, cd) or (**, cd, cc)
(**, 00)(**, cc, cc)

We are now ready to deal with the criteria. Suppose that Alice is using a certain strategy, sAlice. Among all the transitions between every pair of states, only some are allowed by sAlice: Note that St+1=(At1At,Bt1Bt,Ct1Ct) shares At1,Bt1, and Ct1 with St. From Alice’s point of view, her strategy Ω has already determined At from St, leaving only two unknowns, Bt and Ct. Therefore, every state can be followed by one of four possibilities, depending on Bob’s and Charlie’s moves. In graph-theoretic terms, each state can be mapped to a node so that every possible transition from St to St+1 allowed by her strategy sAlice is completely specified by a graph of 64 nodes, each of which has 4 outgoing links as shown in Fig. 1(b). Hereafter, we denote it as the transition graph of the strategy. The defensibility criterion requires that Alice must not be exploited repeatedly by any of her co-players. Let us define a loop as a sequence of consecutive states StSt+1St+ν with St=St+ν. This is an important unit of analysis because only states on a loop can be visited repeatedly to affect the players’ long-term payoffs. For sAlice to be defensible, it should satisfy inequalities τ=0ν1[FAlice(τ)FBob(τ)]0 and τ=0ν1[FAlice(τ)FCharlie(τ)]0 against any finite-memory strategies j and k, where the payoffs are evaluated along every possible loop StSt+1St+ν of the transition graph of sAlice. In other words, Alice’s strategy must not contain a ‘risky’ loop, along which the sum of Alice’s payoffs is smaller than that of either Bob or Charlie. If no risky loop exists in the transition graph of sAlice, this is a sufficient condition for fAlice ≥ fBob and fAlice ≥ fCharlie with arbitrary strategies of Bob and Charlie when e0+ (Yi et al., 2017). The existence of risky loops can be investigated by means of the Floyd-Warshall algorithm (Hougardy, 2010).

To reduce the strategy space to search, we first check the defensibility under the assumption that Bob or Charlie always defects (AllD). Then, we can exclude the following states from consideration, (**, 20), (**, 11), (**, 10), (**02), (**, 01), and (**, 00), because these are inconsistent with the assumption. We are left with 16 states originating from (**, 22), (**, 21), (**, 12), and (**,11¯). The number of possible (sub-)strategies is thus 216=65,536,which is readily tractable. By an exhaustive search for the defensibility, we obtain 48 sub-strategies out of the 216 possibilities. As a consequence, the number of strategies is reduced to 48×224=805,306,368. For these remaining strategies, we comprehensively check the defensibility criterion by using a supercomputer without the assumption that Bob or Charlie is an AllD player.

The efficiency and distinguishability criteria can be checked by calculating fAlice when P={sAlice,sBob,sCharlie}={sAlice,sAlice,sAlice} and {sAlice, AllC, AllC}, respectively. The long-term payoff fAlice is calculated from the stationary probability distribution over the states, which can be obtained by linear algebraic calculation. If sAlice fulfills all these criteria, it is an Ω strategy, and we will also call it ‘successful’.

3. Result

3.1. Successful strategies for the three-person game

Our first result is impossibility: No memory-2 strategy satisfies the efficiency and defensibility criteria together, according to our direct enumeration of N(m=2)=1.1×1012 cases. Although 3,483,008 strategies have passed the defensibility criterion, none of them satisfies the efficiency criterion. The joint application of defensibility and efficiency turns out to be too tough for strategies with m ≤ 2.

However, we have a class of strategies that are partially efficient. Out of 3,483,008 defensible strategies, 544 strategies show stationary probability  ≈ 1/8 at (cc, cc, cc), whereas the probability is close to zero for the others. We further impose the distinguishability criterion and obtained 256 strategies that are defensible, distinguishable, and partially efficient. Each of them will be called a partially successful strategy (PS2), and their full list is given in Table 2.

m=2
StateAtRuleStateAtRuleStateAtRuleStateAtRule
(cc, 00)ci(cd, 00)cii(dc, 00)dii(dd, 00)cvi
(cc, 01)d
(cd, 01)*
(dc, 01)[ddcdcdcd]
(dd, 01)*
(cc, 02)d
(cd, 02)[ddccdddd]
(dc, 02)ciii(dd, 02)d
(cc, 10)[cdccccdd]i(cd, 10)d
(dc, 10)ci(dd, 10)d
(cc, 11)d
(cd, 11)civ(dc, 11)d
(dd, 11)*
(cc, 11)d
(cd, 11)d
(dc, 11)d
(dd, 11)d
(cc, 12)[ccdddddd]
(cd, 12)d
(dc, 12)d
(dd, 12)d
(cc, 20)ci(cd, 20)d
(dc, 20)*i(dd, 20)d
(cc, 21)d
(cd, 21)d
(dc, 21)cv(dd, 21)d
(cc, 22)d
(cd, 22)d
(dc, 22)*
(dd, 22)d

The low efficiency of a PS2 is explained as follows: When all players adopt a PS2, some states converge to (cc, cc, cc) and some others to (dd, dd, dd). The respective sets of the states will be denoted as c- and d-clusters (Fig. 2). We note that the fully defective state (dd, dd, dd) is robust against one-bit error because (dc, dd, dd), (dd, dc, dd), and (dd, dd, dc) belong to the d-cluster. It is actually a necessary condition to be defensible against AllD: A player must defect when one of the co-players keeps defecting. Suppose that they are trapped in the fully defective state. Even if Charlie cooperates by mistake, Alice and Bob have no chance to change their moves because these two cannot distinguish each other from an AllD player. When the same argument applies to the n-person game, the fully defective state must be robust against (n2)-bit error. To escape from (dd, dd, dd), two players, say Bob and Charlie, have to make error at the same time. Then, Alice may turn to cooperation at a subsequent round. As a consequence, the probability to escape from the d-cluster, denoted as Pesc(d), is of O(e2) for n=3. If we look at the probability of escaping from the c-cluster, Pesc(c), it also turns out to be of O(e2). Because the escape probabilities have the same order of magnitude, the system can transit back and forth between the c- and d-clusters, so that the clusters occupy similar amounts of stationary probabilities even in the limit of e → 0. This is the reason that the stationary probability of (cc, cc, cc) is significantly less than 100%.

Fig. 2
m=2,

To make the strategy efficient, the c-cluster must be robust against any two-bit error, i.e., yielding Pesc(c)O(e3). Our finding is that it is possible to design successful strategies by making the memory length longer and overriding some of the moves prescribed by the PS2. We have enumerated all possible occurrences of two-bit errors and introduced moves to correct these errors as shown in Table 3. In Fig. 3, we depict paths due to two-bit flip errors with brown arrows. To recover mutual cooperation, we add recovery paths as indicated by the green arrows. The strategy goes as follows: (i) Each player will usually follow one of the PS2’s. (ii) If the memory of three consecutive states shows unusual transition such as represented by the brown arrows, the players will activate “Plan B” to follow the green arrows. In other words, we override the moves in Table 3, whereby the memory-2 PS2’s are extended to memory-3 successful strategies. We have confirmed that the stationary probability of the fully cooperative state (ccc, ccc, ccc) indeed approaches one as e → 0 without violating the defensibility and distinguishability criteria. It is thus concluded that at least 256 successful strategies do exist for the three-person PG game when memory length is three. One of the successful memory-3 strategies thereby obtained is shown in Table 4.

m=3
StateAtStateAt
(ccd, ccd, ccc)c(cdc, ccd, ccc)c
(ccd, ccc, ccd)c(cdc, ccc, ccd)c
(cdc, cdc, ccd)c(ccc, cdc, ccd)c
(cdc, ccd, cdc)c(ccc, ccd, cdc)c
(cdd, ccd, ccd)c(dcd, cdc, cdc)c
(ddc, cdd, cdd)c(cdd, dcc, cdc)c
(cdd, ddc, cdd)c(cdd, cdc, dcc)c
(cdd, cdd, ddc)c

Fig. 3

Table 4. One of successful memory-3 strategies. We have picked up the strategy having the largest number of c. The left column shows the state of Bob and Charlie, whereas Alice’s state is shown on the right.

Bt3Bt2Bt1Ct3Ct2Ct1At3At2At1

cccccdcdccdddccdcdddcddd
ccccccccdcccdc
cccccd/ccdcccdcccdccc
ccccdc/cdcccccdcdcdcd
ccccdd/cddcccdddddddd
cccdcc/dcccccccdcccdc
cccdcd/dcdcccdcccdccc
cccddc/ddcccccdcdcdcd
cccddd/dddcccdddddddd
ccdccddcccdccd
ccdcdc/cdcccdccccdcdc
ccdcdd/cddccddddddddd
ccddcc/dccccddcccdccc
ccddcd/dcdccddccddccd
ccdddc/ddcccddcdcdcdc
ccdddd/dddccddddddddd
cdccdccdcdcccd
cdccdd/cddcdcddcdddcd
cdcdcc/dcccdccdcccdcd
cdcdcd/dcdcdcdcdcdcdc
cdcddc/ddccdccdcdcdcd
cdcddd/dddcdcddcdddcd
cddcddddcdddcd
cdddcc/dcccdddddddddd
cdddcd/dcdcdddddddddd
cddddc/ddccddddccddcd
cddddd/dddcddddcdddcd
dccdccccdcccdc
dccdcd/dcddccdcccdccc
dccddc/ddcdcccdcdcdcd
dccddd/ddddccdddddddd
dcddcddccddccd
dcdddc/ddcdcddcdcdcdc
dcdddd/ddddcddddddddd
ddcddccdcdcdcd
ddcddd/dddddcddcdddcd
ddddddddcdddcd

3.2. Necessary memory length for the n-person game

Generalizing the above impossibility result, we can show that m ≥ n is required for a strategy to be successful for the n-players PG game when n ≥ 3. We have already seen that the fully defective state must be robust against (n2)-bit error to be defensible against AllD, which means that Pesc(d)O(en1). On the other hand, the efficiency criterion requires that Pesc(c)/Pesc(d)0 as e → 0. In other words, we need Pesc(c)O(en), which implies that the fully cooperative state has to be robust against (n1)-bit error. We note the following: If the fully cooperative state of a memory-m strategy is robust against k-bit error, its memory length m must be greater than k for this strategy to be defensible. A rigorous proof for this statement is given in the next paragraph, but a rough explanation goes as follows: Suppose that k-bit error happened to a player, say Bob, so that he took the opposite moves k times in a row by mistake. He must have m=k+1 at least to realize and correct his own mistakes. Otherwise, he could not tell if he has committed the errors. In our case, k equals n1 in the n-person PG game, which leads to the inequality m ≥ n. In what follows, we will show that a memory-k strategy cannot satisfy the defensibility criterion if it makes the fully cooperative state robust against k-bit error. The proof consists of two steps:

First, suppose that there exists a memory-k defensible strategy S whose fully cooperative state is robust against k-bit error. We begin by assuming that the players are in the fully cooperative state with a strategy profile P={S,S,,S}. If error occurs at t=1 and may also have occurred for 2 ≤ t ≤ k only on the moves of Alice (denoted by A), the fully cooperative state is recovered in a finite time step t=trec(>k). This is because the fully cooperative state of the strategy S is assumed to be robust against k-bit error and the total number of errors is less than or equal to k. Depending on how error occurs for 2 ≤ t ≤ k, the sequences of moves taken by A for this period can be arbitrary, so the number of possible patterns is 2k1. Each of these sequence of Alice’s moves will be denoted by Γi, where i=12k1. The move at t in Γi is denoted by Γit, which is either c or d. On the other hand, we suppose that no error occurs on the other players (denoted by A¯ as a collective entity), so A¯ shows an identical sequence of moves, following S. The sequence of moves by A¯ for the noise pattern i is denoted by Δi and its move at t is denoted by Δit (Fig. 4(a)). It is important that Δit actually depends only on A’s previous moves Γi1,,Γit1 because the moves of A¯ are deterministic. Let P(Γi) and P(Δi) denote the payoff of A and A¯ for 1 ≤ t ≤ trec, respectively. According to the defensibility of A¯’s strategy S, we have an inequality P(Γi) ≤ P(Δi), whereas P(Γi) ≥ P(Δi) is not necessarily true because A made errors in following the strategy S.

Fig. 4
A¯,

Second, we consider the case where A follows the strategy S but the other players in A¯make an alliance and move together (Fig. 4(b)). We will show that A¯ can repeatedly exploit A by choosing their moves as follows.

  • 1.

  • Start from full cooperation.

  • 2.

  • A¯ defects first, while A is cooperating.

  • 3.

  • A¯ continues defecting until they reach full defection. A must eventually defect, otherwise she is exploited. At this stage, A¯ has a higher net payoff than A’s because they started defection earlier than A.

  • 4.

  • At a certain point, say t=1,A¯ returns to c. A is still defecting to defend herself, so we observe (Γi1,Δi1)=(d,c).

  • 5.

  • A¯ then takes Δi2 of Fig. 4(a). Note that Δi2 is independent of i because it only depends on Γi1 and Δi1, which are fixed as d and c, respectively. On the other hand, A’s move, Γi2, may be either c or d, depending on S.

  • 6.

  • Find a sequence i for which Γi2 equals A’s previous move. A¯ then takes Δi3, which is identical for any Δi as long as Γi2 is the same. Once again, A’s move, Γi3, may be either c or d.

  • 7.

  • Repeat the above sequence until A¯ takes Δitrec. In short, A¯ is simulating one of Δi’s to recover full cooperation. Which Δi to choose depends on S, but there is always one noise pattern that produces such Γi and Δi. After this series of moves, they eventually get back to full cooperation.


In addition to the payoff advantage in step 3, A¯’s net payoff from step 4 to 7 is always higher than or equal to A’s because P(Δi) ≥ P(Γi) for any i. In this way, A¯ can repeatedly exploit A, which contradicts our assumption that S is defensible. Hence, there is no memory-k strategy S that is defensible and robust against k-bit error. For this reason, if the fully cooperative state of a memory-m strategy is robust against k-bit error, its memory length mmust be greater than k for this strategy to be defensible. If Alice’s memory length is longer than k, on the other hand, the state from Alice’s point of view will be (d,Γi1,Γi2,,Γik,d,Δi1,Δi2,,Δik). She can thus recognize that the state started from full defection and that Bob and Charlie initiated the change. In this case, Alice can defend herself by keeping d.

4. Discussion

In summary, we have found that three players can safely maintain full cooperation in the PG game in a noisy environment with e0+. From Tables 2 and 3, we see that a successful strategy Ω is conditioned by the following rules.

  • 1.

  • Preserve cooperation. If everyone cooperated last time, Alice chooses c according to Ω. In other words, c is the choice at (*c, *0), such as (cc, 00), (cc, 10), (cc, 20), (dc, 10), and (dc, 20) (see Table 1 for the abbreviated notations).

  • 2.

  • Challenge the co-players’ naivety. An exception of the first rule is (dc, 00), at which dis prescribed because of the distinguishability criterion. Due to this prescription, if Bob and Charlie are AllC, they are exploited by Alice who alternates between (dc, 00) and (cd, 00). This explains why Alice cooperates at (cd, 00).

  • 3.

  • Retreat if the co-players are not naive. If Bob and Charlie are not AllC, on the other hand, they will choose d in response to Alice’s unilateral defection, so the resulting state will be (dc, 02) instead of (dc, 00). If this is the case, Alice cooperates to avoid TFT retaliation.

  • 4.

  • Forgive after responding to provocation. Note that (dc, 02) corresponds to (dc, cd, cd), which Bob or Charlie would interpret as (cd, 11). The strategy prescribes c at (cd, 11) so that full cooperation is recovered quickly when Bob and Charlie also use Ω.

  • 5.

  • Grab the chance to cooperate. If the state was initially full defection, it is definitely safe to defect, so (dc, 21) is accessed by two simultaneous mistakes of Alice and another player with probability of O(e2). As a result, (dc, dc, dd) and its permutations form the outermost periphery of the c-cluster in Fig. 2. If Alice reaches (dc, 21) by chance, she chooses c once again to establish full cooperation.

  • 6.

  • Don’t be evil. If everyone uses Ω, (dc, 21) such as (dc, dc, dd) is followed by (cc, cc, dd), which is (dd, 00) from Charlie’s viewpoint. He has been the only defector for two rounds, but he is now supposed to contribute to the public pool. If the co-players are using Ω, they will punish Charlie’s successive defection, which leads the state to (dc, 02) that we have already examined at the third rule.


The above six rules explain the basic behavior of a PS2 (Table 2). To make it fully efficient, we have to add one more rule:

  • 7.

  • Look at the context. With referring to the memory of t3, one must override some moves of a PS2 as listed in Table 3. The basic recipe is that Alice has to cooperate in the subsequent two rounds if she defected by mistake. In addition, the transition from state (dd, dc, dd) to (dc, cc, dc) must also be allowed if the former one is a part of (cdd, ddc, cdd).


We believe that most of these patterns can be extended to n > 3 in principle, although we have not completed the search for a successful strategy in these cases yet. Obviously, the necessary condition of m ≥ n means that the strategy space expands super-exponentially: The number of strategies for the n-person game with memory length n is 22n2, which is far beyond our computational feasibility if we are to enumerate these strategies comprehensively. A possible alternative could be to build a strategy based on a successful strategy of the (n1)-person game. We are currently working on the four-person game based on a successful strategy for n=3. Since a successful strategy for n=3 has the c-cluster robust against two-bit error, we will be able to construct a PS2 for n=4 based on them, whose c- and d-clusters are two-bit error tolerant. We expect that a PS2 can be elevated to successful strategies by extending the memory length and overriding some of the moves just as we did for n=3. If this attempt succeeds, it will then be possible to apply this procedure iteratively to solve the general n-person game.

Similarly to the ZD strategies, our successful strategies are capable to give the player control over the payoff differences relative to the co-players’. A nontrivial aspect of our finding is that one may publicly announce his or her deterministic strategy, making the future moves predictable by the co-players. By analyzing the announced strategy, the co-players understand that it is impossible to exploit the focal player. With the knowledge of the focal player’s strategy, the co-players may also safely adopt the same one to enjoy the above properties, by which a cooperative Nash equilibrium is reached. In an actual human population, of course, it would be fair to say that the performance generally depends on the learning dynamics as well as the interaction structure (Battiston, Perc, Latora, 2017, Szolnoki, Perc, 2012), and the precise understanding of their interplay will pose an interesting research question. From a biological point of view, one could also ask if our computational solution is accessible by means of a certain form of evolutionary dynamics. We believe that it is possible in principle, but it could take an exceedingly long time because the number of strategic possibilities is literally astronomical. As the number of players increases, most of the prescriptions of successful strategies will appear cryptic unless one looks at the whole connection structure of states, just as c sometimes turns out to be the correct choice for Alice at (dc, 22) (Table 3). It suggests that our moral instinct, which has been shaped by evolution, might fail to guide us in dealing with the immense complexity of the game. The important point is that we can nevertheless induce mutual cooperation by advising the players to adopt this strategy, in the light of its efficiency, defensibility, and distinguishability.

Acknowledgment

We gratefully acknowledge discussions with Beom Jun Kim and Hyeong-Chai Jeong. Y.M. acknowledges support from CRESTJST and from MEXT as “Exploratory Challenges on Post-K computer (Studies of multi-level spatiotemporal simulation of socioeconomicphenomena)” and from Japan Society for the Promotion of Science (JSPS) (JSPSKAKENHI; grant no. 18H03621). S.K.B. was supported by Basic Science Research Programthrough the National Research Foundation of Korea (NRF) funded bythe Ministry of Science, ICT and Future Planning (NRF grant no. 2017R1A1A1A05001482). This project was supported by JSPS and NRF under the Japan-Korea Scientific Cooperation Program. This work was supported under the framework of international cooperation program managed by the National Research Foundation of Korea (NRF grant no. 2016K2A9A2A08003695). This research used computational resources of the K computer provided by the RIKEN Center for Computational Science through the HPCI System Research project (Project ID:hp160264).