Tuomas Sandholm

Tuomas Sandholm

Professor at Carnegie Mellon University in the Computer Science Department. Founder and director of the Electronic Marketplaces Laboratory. He has published over 450 papers, and has 27 years of experience building optimization-powered electronic marketplaces. His team won the AAAI 2016 Annual Computer Poker Competition Heads-Up No-Limit Texas Hold'em competition

Keynote Speech

10:40 AM - 11:00 AM Saturday, April 15th, 2017

Click here for alternative video link.

Ok, thank you so much. It’s a real pleasure being here. I just went back from China actually, two days ago. I’ll tell you about those experiences as well. I’ll talk to you about super-human AI for strategic reasoning, under imperfect information. And in particular, I’ll talk about how we beat the top players first here in Pittsburgh and then I’ll talk about experiences in China.

I’m a Professor here in Computer Science and also a founder and CEO of Strategic Machine Inc., which is commercializing these technologies and this is joint work with my PhD student.

So when we title imperfect information games, the techniques for games like go, chess, checkers and tic-tac-toe don’t apply. Because whenever there’s alternative move, we don’t know the exact state of the world. And the techniques I’ll talk to you about are application independent. So over the benchmark that I’ll talk to you about these poker, There are really no techniques for poker. They could be applied to basically any incomplete information game.

The main challenges here are first of all, uncertainty about what others and chance will do, unknown state. And, because of this unknown state, we have to think about how do our actions signal to the opponent or opponents about our private information. And conversely, how do the opponents’ actions signal to us about opponents’ private information.

And this is where the concept of Nashian equilibrium from Game Theory comes in. So John Nash invented this concept in 1950. And, it really transformed economics and many other sciences. And in 1994, he got Nobel prize for that work. But of course, that is just a definition of what rational player is when you have more than one player. It doesn’t actually do anything. So to operationalize that, you have to marry up with algorithms to actually compute strategies according to Nahian equilibrium.

All right, and most real world things are actually like this: typically there are more than one player. And you don’t have complete information. So that’s why I founded Strategic Machine Inc., to commercialize these technologies. It’s a CMU spin-out. Applications of these include negotiation, both at the consumer level and business-to-business, business strategy optimization, strategic pricing. So for example, most pricing, even if it’s optimized today, it’s optimized as if you are monopolist or as if the other competitors’ prices are fixed. Here, instead we take a strategic view. Areas of finance, like upstairs blockstreeting, auctions, next generation cyber security, political campaigns, military applications and so forth. And also a kind of radical application we’re working on, which is theory evolution, and biological adaptation strategically, for example, for treatment planting. And, there we are looking at viral and bacteria infections ,and how to fight them, as well as how to steer a patient’s own tissual population, so as to fight cancers and all do you mean, diseases. And of course, there are lots of other applications as well. And this is in contrast to games of complete information, which really has very few real world applications.

Now one of these techniques [that] are domain independent can only go benchmark for this is poker. And if you are careful and go back to John Nash’s 1950 PhD dissertation, you will find, that one and only application theory, was poker. Since then, there’s been a lot of work, steady progress on Poker. And the field really took off around 12 years ago. and there was the annual computable poker competition organized by the Association for the Advanced artificial intelligence which was announced in 2005 and started in 2006. And that was really what enabled the different research groups around the world to compare their results, build on each other’s result, and compare progress year to year. And what that has led to, is multiple orders of magnitude stability improvements in this core technology for reasoning. And so much so at that point it changed what Game theory can do. It’s largely used to be a thinking tool where you have small models, try to manually think through them, while now, it’s possible to take all of the details in the domain into account, and so, this, computational.

So Heads up No-Limit Texas Hold’em in particular has become the main benchmark and challenge problem for AI for imperfect information games. It’s a very large game. It has 10 to the 161 different situations player can fix. So let’s stop to think about that number for a second. It’s more than the number of atoms in the universe. And furthermore, if you had for each atom in the universe a whole other universe, and compute the number of those sub-atoms, it’s still more than that too. So there’s no way you are going to store these games or brutal force through them. You need AI techniques to solve them. And no prior AI has been able to beat top humans. So if you think about game play in AI, there has been great subsets, for example in Othello, checkers, chess and go. But Heads up No-Limit Texas Hold’em has really remains illusive. It’s a tough game. Because it’s both very large, and it’s a game of imperfect information.

So, in this January, I organized this spring version of AI rematch. And why I call it a rematch is because I had organized a similar competition in April and May, 2015. And we could not beat the best team at this game back then. Now here I invited 4 of the top 10 Heads up No-Limit Texas Hold’em specialists, pros to Pittsburgh to play. Here we have Dong Kim, Jason Les, Jimmy Chou and Daniel McAuley. They are all from different countries so this is really the international elite of this game. We played 120,000 hands of poker over 20 days. And, the players played for a prize for $200,000. So I raised the money so we couldn’t win anything: the money was gonna go to the humans. But we didn’t just give each of them $50,000; rather we had them paid out based on how well they did against liberatus, which is our AI, so they had the incentive to play, very seriously, all the way to date. More than this money, they have to fight with mankind on their shoulders, so they took it very seriously. Every night they were studying, they brought in computers, standard programs for studying pokers, custom program they had made for studying poker, idea consultants, other Texas Hold’em pros in Pittsburgh. And they really took this very seriously. So this is what it looks like [slides]. This is Rivers Casino in Pittsburgh [picture] having fun with Jason Les here [picture] It’s actually great fun to look at how this program plays: it plays very differently from humans’ play, because it doesn’t start from learning human data. It starts from just losing up the game. So it figures its own strategy, using optimization and AI. It plays very differently than humans’ think how poker should be played. So here’s Jason thinking [picture]. And the final result was that liberatus, which is our AI, beat the top humans in this game by a lot, specifically by 147 milli-big-blinds per hand. This was statistically significant at 99.98% level. And also each human individual lost to liberatus .

So then, just last week, we did a similar event in China, in Haikou, and we played 36,000 hands against a team of 6 Chinese poker pros. We played for 2 million RMB. We didn’t take it all. The opponents were very well-prepared. So not only were they expert poker players, but they were also computer scientists, machine learning experts. [laughter from audience] And they had, one of them, was the one and only Chinese World series of poker bracelet winner. And they studied liberatus’ hands history in advance in order try to break the AI. We played for four and half days, in 9 sessions. Well, their strategy didn’t work. Lengpudashi, which means “cold poker masker”, it’s liberatus’ Chinese brother if you will, won by 220 milli-big-blinds per hand, so even more than in January. And it won each of the 9 sessions, and it also beat each of the individuals. When considered, not the team, but the individuals.  

So how does liberatus and Lengpudashi, how do they work? I get this question a lot. So now I’m going to give you a whelming tour about what’s behind these AI’s. So they all have three main modules. There’s more but these are the main pieces: one is what happens before the event, we feed the rules of the game to abstraction algorithms. It runs an abstraction of the game, so it’s for Nashian Equilibrium approximations thereof. Then there’s an end-game solver, that kicks in during the playing of the hand, to refine those strategies in real time, and then there’s a self-improver, which runs in the background. I’ll talk about each one of these three pieces.

So ahead of the event, we run on a supercomputer. And in the case of libratus we ran about 15 million core hours on the supercomputer called bridges at the Pittsburgh supercomputing center. Then obviously the game is too big to solve directly, so we ran an abstraction algorithm to come up with a smaller abstract game. Then we use the classical equilibrium finding algorithm to find the approximate equilibrium of that game. And then we reverse map that back to the original game. The abstraction algorithm, well we don’t need to go to the details. It is the same as the abstraction algorithm essentially that we used for a prior what called Tartanian 8, tartainian for tartans for CMU, and this is the eighth version thereof. The main ideas behind that are imperfect recollapse abstraction, potential where-ups abstraction and earth-moving extensive similarity matrix, if you know what those mean.

[We are working on filling the second half of the speech.]