Man vs. Machine, a short history

As a game freak and a data wonk, there are few things more interesting to me than the ongoing battle between man and machine, with popular games serving as the battlefield.  For each game, there’s only one time in human history when machines take us down and never look back, and for almost all games, that moment in time is in the recent past.  These stories of seemingly unbeatable champions finally meeting their match and conceding defeat give us a glimpse into the unlimited potential for future problem-solving techniques.  Welcome to my short history of Man vs. Machine.

Backgammon (1979) – BKG 9.8

When world champion Luigi Villa lost a backgammon match 7-1 to a program by Hans Berliner in 1979, it was the first time that a bot had beaten a world champion in any game.  Later analysis of the games showed that the human was actually the stronger player and only lost due to bad luck, but Pandora’s box had been opened.  In the 90’s, when neural networks revolutionized the bots, machines had truly reached the level of top humans.  TD-Gammon was developed in 1991 and was followed in 1992 by Jellyfish and Snowie.  There were no databases of moves and no expert advice given to the machines.  They were only taught to track certain features on the board (like the number of consecutive blocking points) and to decide for themselves whether they were meaningful.  They played themselves millions of times, following a simple racing strategy at first, but soon learned what maximized wins and began to appropriately value and seek better positions.  It’s truly like AI; the bots had taught themselves how to play backgammon!

I asked my friend Art Benjamin (who was the all-time point leader in the American Backgammon Tour at the time) when it became clear that bots were truly superior and he said…

I guess I would say it happened in the mid to late 90s with Jellyfish and then Snowie. Can’t offer an exact date and I can’t think of a specific event. There was a backgammon server called FIBS (First Internet Backgammon Server) that was big in the 90s and the top rated player was Jellyfish. Only later was it revealed that it was a Bot. I think that gave it instant recognition as a force to be reckoned with.

At one of the backgammon tournaments, Art introduced me to Jeremy Bagai, who did something that I think is awesome.  He wrote a book that used Snowie to locate and analyze mistakes in the classic strategy guides.  He basically took the bibles of backgammon and fixed them, according to what the bots had discovered.  How great would it be to have books like that in every field, showing specific cases where objective progress has been made?  I think the toughest program out there these days is eXtreme Gammon, so maybe it’s time for another edition of that book that corrects Snowie’s mistakes?

Checkers (1994) – Chinook

In 1989, a team led by Jonathan Schaeffer from the University of Alberta created a computer program that could play checkers called  Chinook.  In 1990, Chinook was already ready to take its first crack at the world title, but fell short against Marion Tinsley.  Tinsley, who is considered the best checkers player of all-time, won 4 games to Chinook’s 2, with 33 draws.  In the rematch in 1994, it seemed that Chinook might actually have a chance against the seemingly unbeatable human champion (to give an idea of his dominance, Tinsley won his 1989 world title with a score of 23 draws, 9 wins, and 0 losses!)  However, after 6 draws, the match came to an unfortunate and premature end: Tinsley had to concede due to abdominal pains, later diagnosed as cancerous lumps on his pancreas.

Here’s the whole story.

Using strategies such as minimax heuristic, depth-first search, and alpha-beta pruning, in combination with an opening database and a set of solved end-games, Chinook held onto its title with a 20-game draw against the #2 player, Don Lafferty, but hadn’t yet truly become unbeatable.  During the match, Lafferty broke Chinook’s 149-game unbeaten streak, which I believe earned him the title of “last human to beat a top computer program at checkers.”

After the next match, in 1995, it was official: machine had surpassed man.  Don Lafferty fell by a score of 1-0 with 35 draws.  A couple years later, Chinook retired after being unbeaten for 3 years.  If there were any doubts about whether or not Tinsley would still be able to beat Chinook, those questions were put to rest in 2007 when it was announced that checkers was solved.  Schaeffer’s team had done it: they proved that checkers is a draw if both sides play perfectly.

Chess (1997) – Deep Blue

Deep Blue became the first computer program to win a chess game vs. a current world champion when it took a point off of Kasparov on its way to a 4-2 loss in 1996.  However, what most people remember is the rematch in 1997, which “Deeper Blue” actually won, 3.5 to 2.5.  At one point during the match, the program ran into a bug and played a random move, which unnerved Kasparov, since he was familiar enough with computer strategies to know that no machine would have selected the move.  Under huge psychological pressure and suspicion that the other team was cheating, Kasparov blundered away the match against the IBM behemoth, which was capable of evaluating 200 million positions per second.

Several matches followed between human champions and top computer programs that resulted in draws, so computer superiority in chess wasn’t actually clearly established until 2005, when Hydra was unleashed on the world.  It dispatched 7th-ranked Michael Adams by a brutal score of 5.5 to 0.5.  2005 may also be the year that goes down in history as the last time a human beat a top machine in tournament play (Ruslan Ponomariov).  As of 2008, it was still true that humans playing alongside computers (“centaurs”) were superior to bots playing by themselves, but these days, it looks like even that is no longer the case.  The top commercial chess programs available today include Komodo, Houdini, and Rybka and they are continuing to improve, leaving humans far behind.

Chess may never be solved like checkers was, but impressive progress has been made on the endgame, which has now been solved for 7 pieces or less on the board.  Similar to the insights in Jeremy Bagai’s backgammon book, there are endgames that were presumed to be draws for many years that turn out to be wins if played perfectly, in one case only if over 500 perfect moves are played (good luck learning that one!)  I love this quote from Tim Krabbe about his experience with these solved endgames:

The moves are beyond comprehension. A grandmaster wouldn’t be better at these endgames than someone who had learned chess yesterday. It’s a sort of chess that has nothing to do with chess, a chess that we could never have imagined without computers. The Stiller moves are awesome, almost scary, because you know they are the truth, God’s Algorithm – it’s like being revealed the Meaning of Life, but you don’t understand a word.

Othello (1997) – Logistello

This story is short and sweet.  Computer scientist Michael Buro started developing an Othello-playing bot called Logistello in 1991 and retired it seven years later, after it dispatched the world champion Takeshi Murakami by a score of 6-0.  Othello is so popular in Japan, 9 television stations covered the event.  Afterwards, Murakami said “I don’t think I made any mistakes in the last game. I believed I could win until the end.”

Scrabble (2006) – Quackle

The next human champion to fall to a computer in his respective game was David Boys.  Boys, the 1995 world champion, had qualified for the honor to face the machine by beating out around 100 humans in an 18-round match.  He looked like he would send the machine back for another development cycle after winning the first 2 rounds, but Quackle didn’t crack under the pressure and won the remaining games to take the match 3-2.  As usual, beating the world champion wasn’t enough for the game freaks of the world; Mark Richards and Eyal Amir took things to the next level by building a bot that takes into account the opponent’s plays to predict what tiles are in his rack.  It then selects moves that block high-scoring opportunities the opponent might have, proving that AI truly is ultimately evil.

Jeopardy (2011) – Watson

In 2011, IBM was back in the business of high-profile man vs. machine matches when it created Watson and took down two of the best all-time Jeopardy champions.   In the end, it had a higher score than both humans put together and, as with Deep Blue, the machine itself was a beast: 2,880 parallel processors, each cranking out 33 billion operations per second, and had 16 terabytes of RAM.  Despite some humorous mistakes, such as the time it considered milk to be a non-dairy powdered creamer, Watson’s victory strikes me as the most impressive in this list.  The difficulty in developing a system able to interpret natural language and deal with puns and riddles and come up with correct answers in seconds (searching the equivalent of a million books per second) is off the charts.  We’re in the world of Sci-Fi, folks.

Poker (2015?) – PokerSnowie?

I’m going out on a limb here and predicting that poker is the next game to fall to the bots and that the moment is just about here.  Being a game of imperfect information, poker has been particularly resistant to the approaches that were so successful for games such as backgammon, in which the machine taught itself how to play.  An optimal poker strategy generated this way tends to include betting patterns that an experienced player can recognize and exploit.

By the turn of the century, pokerbots had gotten pretty good at Limit Hold ‘Em (eventually winning a high-profile heads-up game against professionals), but the more popular variation, No Limit Hold ‘Em remained elusive.  The University of Alberta made the first significant step in changing that when they planned to hold the first-ever No Limit category in their Poker Bot competition at the 2007 Association for the Advancement of Artificial Intelligence (AAAI) conference.  Coincidentally, shortly after this was announced, a friend of a friend named Teppo Salonen, who had won 2nd place the prior year in the limit competition, came up to my house for a game.  I joined others in pestering him to enter the no-limit category, since the competition would never be softer (if it’s possible to consider competition offered by universities such as Carnegie Mellon and the University of Alberta to be “soft”).  I knew a thing or two about beating bots at poker, since I had downloaded and beat up on the best bots that were available at the time, so Teppo invited me to serve as his strategic advisor and sparring partner.  Months later, after many iterations, (and after Teppo overcame a last-minute technical scare) BluffBot was ready to go and entered the competition.  And WON.  What we had done didn’t really sink in until I read the response from one of the researchers who were blind-sided by Bluffbot:

They are going up against top-notch universities that are doing cutting-edge research in this area, so it was very impressive that they were not only competitive, but they won… A lot of universities are wondering, “What did they do, and how can we learn from it?”

The following year, the world once again made sense again as the University of Alberta team took the title.  Things were pretty quiet for a few years until mid 2013, when PokerSnowie hit the market. As a former backgammon player, seeing the name “Snowie” in the title got my attention, so I was one of the first to buy it and enter the “Challenge PokerSnowie” to establish its playing strength.  PokerSnowie categorizes its opponents based on their error rates and was handily beating every class of opponent with the sole exception of “World Class” players Heads Up.  I was one of the few who managed to eek out a victory over it (minimum of 5,000 hands played), but could tell that it was significantly stronger than any other bots I’d played against.  It was recently announced that the AI has been upgraded, and I suspect that it may be enough to push the bot out of my reach and possibly anyone else’s.

It appears that it’s time for a 5,000 hand rematch against the new AI to find out if it has passed me up as I suspect it has.  I’ll periodically post my results and let you know if, at least for a little longer, the poker machines can still be held at bay.  See results below!

game_ais

http://xkcd.com/1002/

Round 1 of 10: after 500 hands (0.5/1.0 blinds, 100 BB buy-in cash game, with auto-rebuy at 70%)…

Jay +$580.50 (+$1.16 per hand)

Error Rate: 7.15 (“world class”)

Blunders: 13

Notes: I’m on a huge card rush, prepare for regression to the mean.

 

Round 2 of 10: after 1000 hands (0.5/1.0 blinds, 100 BB buy-in cash game, with auto-rebuy at 70%)…

Jay +$609.00 (+$0.61 per hand)

Error Rate: 7.51 (“world class”)

Blunders: 25

Notes: I extended my lead a bit, but not surprisingly, my winrate did regress towards zero.  My error rate also crept higher (despite one fewer “blunder”) and pushed me closer to the threshold for “expert”, which is 8.  I’m including the error rates so that if Snowie makes a sudden comeback, it should be clear whether or not it was due to the quality of my play suddenly taking a turn for the worse or Snowie finally getting some cards.

 

Round 3 of 10: after 1500 hands (0.5/1.0 blinds, 100 BB buy-in cash game, with auto-rebuy at 70%)…

Jay +$322.00 (+$0.21 per hand)

Error Rate: 7.19 (“world class”)

Blunders: 31

Notes: PokerSnowie took a big bite out of my lead in round 3 despite a drop in my error rate as well as my blunder rate (only 6 in the last 500 hands).  As the Swedish proverb says: “luck doesn’t give, it only lends.”

 

Round 4 of 10: after 2000 hands (0.5/1.0 blinds, 100 BB buy-in cash game, with auto-rebuy at 70%)…

Jay +$362.50 (+$0.18 per hand)

Error Rate: 7.49 (“world class”)

Blunders: 42

Notes: Despite making a slight profit in the last 500 hands, my error rate increased and my average winnings per hand has continued to drop.  The match is still a statistical dead heat.

 

Round 5 of 10: after 2500 hands (0.5/1.0 blinds, 100 BB buy-in cash game, with auto-rebuy at 70%)…

Jay +$497.50 (+$0.20 per hand)

Error Rate: 7.51 (“world class”)

Blunders: 49

Notes: My error rate crept slightly higher, but I was able to raise my winnings per hand for the first time.  Snowie’s going to have to catch some cards in the last half of the match to get that $500 back.

 

Round 6 of 10: after 3000 hands (0.5/1.0 blinds, 100 BB buy-in cash game, with auto-rebuy at 70%)…

Jay +$40 (+$0.01 per hand)

Error Rate: 7.55 (“world class”)

Blunders: 62

Notes: Wow, what a difference a round of 500 hands can make!  Practically my entire lead was wiped out, despite only a slight uptick in my error rate.  Just as I was starting to write Snowie off as too passive, it handed me a nice beating.  With four rounds left, the match is truly up for grabs.

 

Round 7 of 10: after 3500 hands (0.5/1.0 blinds, 100 BB buy-in cash game, with auto-rebuy at 70%)…

PokerSnowie +$29.50 (+$0.01 per hand)

Error Rate: 7.88 (“world class”)

Blunders: 72

Notes: Snowie took the lead for the first time in the match, but I’m glad not to be losing by much more.  I was down $400 until about 100 hands ago, but after betting a few big draws that hit, I almost pulled back to even.  More concerning is the fact that my error rate increased by a big amount this round, almost demoting me to “expert” status.  It turns out my biggest blunder led to one of my biggest pots: Snowie says I made a huge mistake betting my small flush on the turn instead of checking.  The outcome was great, however, since Snowie happened to hit a set (pocket pair that matched a card on the board) and check-raised me all-in.  I called and my flush held up.  The last card was a scary fourth heart, so I doubt I would have gotten as much from the hand if I had checked.  I’m not sure why PokerSnowie was so sure betting my flush was a mistake (maybe to control the size of the pot in case my flush was already behind or got counterfeited on the river?)  Could be a sign PokerSnowie knows something I don’t.

 

Round 8 of 10: after 4000 hands (0.5/1.0 blinds, 100 BB buy-in cash game, with auto-rebuy at 70%)…

Jay +$583 (+$0.15 per hand)

Error Rate: 7.99 (“world class”)

Blunders: 82

Notes:  I hit back with a rush of cards and had my best round so far (in terms of results).  Unfortunately, my error rate crept higher again, putting me at the border between “world class” and “expert” in PokerSnowie’s eyes.  I would hate for the program to lose respect for me, so I’m going to have to start making better decisions.  Of course, PokerSnowie could just be punishing me, since one of my biggest “errors” was calling its pot-sized all-in on the river with one measly pair.  It turned out that I caught PokerSnowie bluffing and won $234 on the hand.

 

Round 9 of 10: after 4500 hands (0.5/1.0 blinds, 100 BB buy-in cash game, with auto-rebuy at 70%)…

Jay +$869.50 (+$0.19 per hand)

Error Rate: 7.85 (“world class”)

Blunders: 85

Notes:  Only 3 blunders this time and I extended my lead.  It’s looking bad for PokerSnowie, as it will need an epic rush in the last 500 hands to pull out the match.

 

Round 10 of 10: after 5000 hands (0.5/1.0 blinds, 100 BB buy-in cash game, with auto-rebuy at 70%)…

Jay +$1749 (+$0.35 per hand)

Error Rate: 8.03 (“expert”)

Blunders: 100

Notes:  When I posted an “epic rush” would be necessary for PokerSnowie to win, I didn’t actually believe it was possible for $870 to change hands between us in the last 500 hands with $0.50/$1 blinds.  Incredibly, it happened, although in my favor.  I hit practically every draw and if I didn’t know any better, I’d say the machine went on tilt, as it repeatedly bluffed off its chips like it was a fire sale.  The program did get some revenge, however, by demoting my rating into the “expert” range by crediting me with 15 blunders during this round.  Let’s look at a few of the big ones:

1. I raised to $3 with QJ on the button and Snowie re-raised to $9.  I called and my QJ became the best possible hand when the flop came 89T rainbow (no two cards matching suits).  Snowie bet $9 and I only called, reasoning that I didn’t have to worry about any flush draws (and also couldn’t represent them).  I also didn’t want to end the fire sale if Snowie was going to keep bluffing at pots.  My decision was tagged by Snowie as a huge error.  Then, on the turn, a completely harmless offsuit 2 came.  Snowie bet again, this time $18, and again I only called for the same reasons.  This was also flagged as a major blunder.  The river brought another 2, and Snowie continued with a bet of $72.  I conservatively called, thinking that an all-in for its last $77.50 here might only get called by a full house (Snowie finally liked my decision, although it said to mix in an all-in 4% of the time).  It turns out that Snowie had AA (was evidently value-betting and assuming that I would continue calling with only a pair?) and lost the $216 pot.

2. Snowie raised to $3 pre-flop and I called with K8.  The flop came 83T, all diamonds, which was great, since I had 2nd pair and my King was a diamond.  I checked, hoping Snowie would continuation bet, but Snowie checked as well.  The turn card was a 9 of hearts and I bet $3, which Snowie called.  The river card was a 9 of diamonds, giving me the flush, but also pairing the board.  I bet $6 and Snowie raised with a huge overbet of $54.  It was certainly possible that it was trying to get value for its full house or ace-high flush, but it just didn’t smell right.  If it had the ace of diamonds, why hadn’t it bet on the flop or raised on the turn to build a pot?  And with a paired board, what would it do if I re-raised its huge overbet on the river?  On the other hand, if it had flopped a set (which turned into a full house on the river to justify the huge bet), would it really have checked on the flop and only called on the turn and not made me pay to see a fourth diamond appear?  Anyways, I called its bet and won the $120 pot when it flipped over J9 for three of a kind, which it had decided to turn into a massive bluff-raise.  Fire sale!  Snowie labeled my call as a huge blunder (ranking the king-high flush’s showdown strength as only a 0.59 out of 2.00).

3. In this hand, I had AK on the button and raised to $3.  Snowie re-raised to $9, which I re-re-raised to $18.  Snowie re-re-re-raised to $54 and I called.  Snowie flagged my call as a huge mistake, saying I should have raised yet again.  The flop came 658 with two clubs we both checked.  The turn brought a third club and Snowie bet $54 and I folded.  Evidently, it had a wider range than I imagined with all of that pre-flop raising, as it turned out to only have a KQ (with the king of clubs) which it had turned into a semi-bluff on the turn to take the pot.  I can’t say I’ve seen many people 5-bet before the flop with King-high, so I’m still not sure about the “6-bet with AK” idea.

I’m happy to have defended humanity’s honor once again, but my confidence that PokerSnowie will take over the world was shaken a bit by its performance.  If the “fire sale” strategy is a big part of its gameplan, it may still be a few years and AI upgrades before it can take down a top human.