Selfishness and optimality
Putting some more or less formal game theory basics to informal use (hey, that could be the title of Vitalik Buterin’s memoirs)
The price of anarchy
The price of anarchy is a concept from game theory that quantifies how harmful purely selfish behavior is in a given setting. It compares what happens when everyone acts selfishly to what would happen if some benevolent dictator made everyone’s choices for them.
The Prisoner's Dilemma game
The Prisoner’s Dilemma game is famous because rational, selfish prisoners will both confess and have to spend 5 years each in jail, whereas a benevolent dictator would have them both keep quiet and do only 1 year in jail. So we have selfish behavior costing 5 years per agent and optimal behavior costing 1 year per agent. The ratio of the costs of these outcomes, 5 years to 1 year, is called the price of anarchy. The price of anarchy for the Prisoner’s Dilemma game pictured above is 5.
Another kind of game theory setting where this idea is used (and where very interesting results have been proven) is sending traffic from point A to point B through a network–it could be commuters on a road network or data on a computer network. To create and analyze a problem of this kind, all we need is a network, the amount of traffic needing to get from point A to point B, and, importantly cost functions for each of the links in the network. Intuitively, the cost function for a link in a commuter network is just a way of saying how much longer does it take to travel on this particular link when there’s traffic vs when there’s not.
My commute in Champaign is actually simpler than even this
The cost functions above tell us that the roads from A to D and from C to B are maybe ultra wide. It doesn’t matter how many people travel on them, it will just take 10 minutes to traverse. The other two roads, from A to C and D to B are roads where the travel time is linear in the amount of traffic on that road.
Suppose we have 10 units of traffic that want to get from point A to point B, maybe A is the neighborhood you live in and B is the neighborhood where you work . If everyone has complete knowledge of the cost functions on the graph, is rational, and must commit to their travel plan before departing and not deviate based on what they observe, then selfish drivers will wind up routing 5 units of traffic through the A-C-B path and 5 units through the A-D-B path. With 5 units passing along the road from A to C, the travel time for those units will be 5 minutes and the time from C to D will be an additional 10 minutes no matter how much traffic there is. Similar logic applies for the bottom path. Thus everyone’s travel time when acting selfishly is 15 minutes .
Likewise, even if we had a dictator choosing routes for people, they wouldn’t do any better than sending half of the traffic through the top route and half through the bottom route. The price of anarchy–the ratio of selfish to optimal behavior–is 1 in this case.
This may all seem pretty straightforward. So far modeling traffic as a network with costs might seem like an overly mathy way to tell us the same things that common sense does. But we don’t have to search very far to find interesting (if far-fetched) examples that violate our expectations.
Suppose teleportation gets invented. This can only be good for our daily commute, right? Let’s see.
What will selfish drivers do in this case? It’s maybe not easy to see, but it’s a bit like the prisoner’s dilemma again. All selfish drivers will use the teleporter and travel along the A-C-D-B path, but this technological marvel will have actually added 5 minutes to everyone’s commute. If everyone goes on this route, it takes everyone 20 minutes to get to work!
To see why, suppose as we are eating breakfast we learn that everyone will use the A-C-D-B path that day, does it make any for us to choose an alternate path given this information? If we don’t deviate it’s going to take them 20 minutes as we have said. If we unilaterally take the A-C-B path, it still takes 20 minutes. If we unilaterally take the A-D-B path, it also takes 20 minutes. Likewise, if we weren’t using the A-C-D-B path and learned that we weren’t the only ones doing this, we would have an incentive to start using that path in subsequent commutes because if as long as that path isn’t being utilized by everyone the travel time on it is strictly less than 20 minutes. So everyone gets stuck using the teleporter and extending their commute time.
The optimizing dictator, on the other hand, would have everyone ignore the teleporter with half of everyone going through the A-C-B path and half going through the A-D-B path, giving everyone a travel time of 15 minutes. The price of anarchy–the ratio of performance when everyone is selfish versus performance when people are forced to act in a way that’s optimal–is 20/15 or 4/3.
Higher level, more practical insights
Dictatorships and optimally controlling behavior when the price of anarchy is high is not free. For commuter traffic, steering behavior involves traffic laws, police presence, toll routes, carpool lanes, and all the bureaucratic overhead each of these entail. For computer network traffic, it may or may not be easier to steer behavior depending on who controls the network but it is undoubtedly going to be difficult to respond to upredictable traffic and calculate optimal routing quickly enough to maximize traffic flow.
Trying to control behavior, however, often isn’t going to be the most cost-effective strategy. For one thing, it can be shown in general that so long as the costs on the links in the network aren’t highly nonlinear, the price of anarchy is probably manageable.
Beyond that, providing even modest amounts more capacity than a network actually needs can shrink the difference in costs for optimal and selfish behavior. Even in a network with exponential cost functions, 10% overprovisioning puts an upper bound on the price of anarchy at 2.1. To upper bound the price of anarchy so tightly strictly by trying to control traffic would probably cost a great deal more than adding the network capacity required to overprovision it.
And some speculation
I bothered to write all of this up because a class I’m involved with looks at the interplay of game theory, incentive design, behavioral economics, and the bigger picture of how these relate to desirable outcomes.
In discussing this topic, we wondered whether and how these concepts might be applied to socio-technical systems of a different kind. How would we apply these ideas to get insights on the spread of fake news or user behavior on Reddit, for example?
One idea I came up with centers on the theoretical result regarding nonlinearity above–that a network can only have a high price of anarchy, if it has some highly non-linear cost functions on some of its edges.
Consider a community online like your favorite subreddit and how much damage a motivated user with a single account might do if they wanted to. Which mechanisms or features of the community are more robust to adversarial behavior by the user? The mechanism around the integrity of the top stories feed, I’d argue, is pretty robust. A single user with a single account, no matter how they act, won’t be able to cause their own blog posts to crowd out everyone else’s submissions–at best they can upvote their own posts and downvote everyone else’s.
On the other hand, with respect to the harmony or social capital within the community, we have all experienced a single troll can rapidly erode civility and trust in an online community. This usually happens if there aren’t good moderators. Yaknow, moderators, the optimizing dictators who try to close the gap between selfish behavior and what’s best for the community as a whole.
Without even making this very formal, it leads to some questions. Is there some sense in which perturbations to the social fabric of these communities are highly nonlinear, whereas the integrity mechanisms have a closer to linear response to the behavior of a single user? If so, could this offer a model of why community best practices almost always have a moderator in place? What does this say about when we do and when we don’t need moderators for face to face conversations? What other properties of communities are easier or more difficult to derail for a single individual? What is the effect of collusion among users or sockpuppets on all of the above?
 There are some subtle theoretical reasons why I’m saying 10 units of traffic rather than 10 people. In particular it’s important for the current analysis that these units be divisible, e.g. I could send 9.5 units of traffic in one direction and 0.5 in another.
 What we mean when we say, “if everyone is selfish and rational, this is what will happen” is that the behavior we are describing is a Nash equilibrium.
In the future, maybe I’ll talk more about what a Nash equilibrium is and critique its usefulness in practice. But for now let’s just observe that 1) a Nash equilibrium doesn’t in general correspond to globally optimal behavior, which is really a main point of this post and 2) if you want to see whether a situation is a Nash equilibrium it is sufficient to be able to claim that no one has any incentive to unilaterally deviate from the state of affairs described by the Nash equilibrium. In this case, if drivers were going 5 units through the top path and 5 through the bottom, none of them have any incentive to wish they had unilaterally done something different while everyone else did the same thing they were already doing.
First image from psychologyofgames.com
Content gets a big help from Tim Roughgarden. Twenty lectures on algorithmic game theory. Cambridge University Press, 2016.