End In Sight?

So first of all, some good news: it looks like the daily death rates in Italy and Spain have finally reached a peak.

I’m not an epidemiologist, but just eye-balling the chart above, it looks like Italy took about a month to get there. So the estimate that the United States will hit its peak around April 15, a month after deaths started taking off here, looks very reasonable.

Here’s Italy by itself, reaching 100 total deaths on March 5 and seemingly hitting its peak now.

Here’s the United States, hitting 100 deaths 13 days later on March 18…

Below are the total confirmed COVID-19 deaths by country, with the United States skyrocketing and on its way to catch up to and surpass Spain and Italy (more on that below)…

The bad news is that, in terms of the things getting worse in the last week, the United States is on top of the world. Below is a chart showing the increase in total deaths relative to one week ago (100% means the number of deaths has doubled since then). The United States had only 801 total deaths by March 25 and reached 4,079 today, so the relative increase is (4,079 – 810) / 810 = 409%.

The appearance of Brazil and India on this list is also very concerning, given their huge populations. While India’s share of population over 70 is only 3.41% compared to the United States’s 9.73%, there are four times as many people living there. So, while it looks like the United States is on track to easily set the record in terms of the total number of deaths in the world, India could certainly surpass it if things get out of control.

The reason I expect the United States to set the bar for total death count is because not only is its daily death count increasing the most, it also posted the largest absolute number of deaths in the world yesterday, surpassing even Spain and Italy at their peaks.

So if we’re already losing 900 per day, how many people will be dying in the United States when we reach our peak around mid-April? At first, the estimated number of daily deaths in the United States of around 2,000 seemed low to me, since we have five times the population of Italy and it has 800 deaths per day right now. However, the experts are probably taking into account the fact that 16.24% of Italy’s population is over 70 compared to the United States 9.73%. You’ve probably seen why that fact is so important…

Also, the chart above shows that Italy’s elderly death rates are higher than anywhere else, presumably due to the over-crowded hospitals. It’s hard to believe that losing 2,000 people to COVID-19 on Easter could be a good thing for the United States, but it would probably be a sign that we’ve successfully stocked our hospitals with ventilators and will soon be on our way to recovery. It may be a long tunnel, but there does appear to be a light at the end of it.

The Scientific Mindset

Did you miss my rousing keynote at the Claremont Data Science conference? You’re in luck! Below are my slides, and every somewhat witty and thought-provoking thing I said…

Hi, I’m Jay. I’m super excited to be here, especially because this is the first time I’ve ever done a talk like this! My friend told me that this experience would be good for my next book, the 9 pitfalls of public speaking. Funny guy! My co-author Gary is sitting over there. You do know he’s the one with a PhD from Yale, right? You had a 50/50 shot at hearing from a genius today! Don’t worry, though, I’m not a total clown. I graduated from Pomona College with a degree in math and worked as a software developer for 11 years before following my inner data-wonk to the Analytics department of a booming Internet company. I would take the Metrolink from Claremont to downtown L.A. every day and those years of grand successes and epic failures taught me the value of scientific rigor.

I had a manager once who liked to say “up is up”, which I took as meaning that data speaks for itself. I strongly disagree. Data needs an interpreter. One who knows things. And machines don’t know anything. They’ll crunch random data and find statistical significance everywhere.

On top of that, as you can see here, it’s not even always clear what “up” is! We had all the data you could ever want at my work. Billions of rows piled into a Netezza database. So we tried to use that wealth of data to answer a simple question: what’s more profitable: 1-click pages or 2-click pages? The answer we got back is that overall, 2-click pages are better. Then we asked, okay, what about in the U.S.? One-clicks are better there. How about outside of the U.S.? One-clicks are better there too. Which “up” are we supposed to believe? This is Simpson’s Paradox in all of its glory. In this case, the weighted average looks like that because we had U.S. traffic mostly on 2-clicks and International traffic mostly on 1-clicks. It’s even worse than that! The reason most U.S. traffic was on 2-clicks was because we had run randomized A/B tests that showed 2-clicks are better here, so even that top line is backwards! We decided to stick with the experiments so we don’t get fooled by confounding variables. 

Data science is often said to be about extracting knowledge from data. Well, as you can see from this example, if you’re talking about historical data as opposed to data produced by an A/B test, you need to be very careful to ensure that what you’ve extracted is knowledge and not nonsense. Data science is less about extracting knowledge than creating useful data that can provide knowledge. Up is definitely not always up.

I hope to convince you today that for data science to work, you need to work like a scientist.

When people talk about what data scientists do, they always mention statistics and computer programming and also might say how important it is to have domain or subject knowledge. What they tend to forget is the “science” part. I’m here to tell you that the scientific mindset: the critical thinking, the skepticism, the willingness to put your predictions to the test, and make sure you’re not fooling yourself, is essential.

Rather than just go through all the pitfalls of data science, I’d like to talk about four ways that a scientific mindset can avoid them. (1) You can effectively interpret data (does it mean what you think it means?) (2) You can identify which features might be useful for making predictions. Machines can’t really do that for you because if you include too many nonsense variables, they crowd out the real ones. (3) You’ll be able to evaluate evidence and develop a Spidey Sense and avoid being fooled by the “silent evidence of failures”. Are you seeing the whole picture or is someone showing you the statistical hits and hiding the misses? The last one is that you run experiments whenever possible, because it’s the strongest evidence out there.

Okay, so let’s put your critical thinking to the test. What’s this data saying? This is earthquake data from the United States Geological Survey showing an alarming increase in the number of major earthquakes worldwide over the last century. Is the apocalypse approaching? Is the earth breaking apart? Or is something wrong with this data?

Don’t worry. These are the earthquakes that were recorded each year, not the number that occurred. There is now a far more extensive network of seismometers than in the past, so many earthquakes that went unnoticed decades ago now get monitored and logged.

If the data tells you something crazy, there’s a good chance you would be crazy to believe it.

Too easy? Give this one a shot. At Berkeley, I was on a group that analyzed data for over 7,000 patients with sepsis at a Chicago hospital to find a way to predict the chances of being readmitted to the hospital after being discharged. You can see here that we found a strong relationship between the pH level of the patient’s blood (normally between 7.35 to 7.45) to the hospital readmission rates.

There is a clear positive relationship, indicating that patients with high pH levels are more likely to return to the hospital soon after being discharged. A low pH signals that a discharged patient is unlikely to be readmitted. The correlation is 0.96 and data clowns would call it a day. “Up is up”!

However, my teammates and I were not clowns, so we made sure to run this by a doctor to see if it made sense. When he saw this figure, a puzzled look came across his face: “That’s strange; the relationship is backwards. If you have a low pH level, you’re probably dead,” but the chart implied that having a very low pH level was a sign of health. This stumped us until we realized that the data included patients who died during their hospital stay! We had simply found that the patients least likely to be readmitted are the ones who were discharged to the mortuary.

This figure shows that, once we removed the deceased patients, the pattern reversed. Now there is a negative relationship, just as the doctor expected.

This one shows the clear danger of acidic blood by comparing pH level with the likelihood of death. Patients with pH values below 7.2 are not in good health, they are in serious danger. In this case, the data spoke, but it was talking about something else.

In this case, only the scientific mindset saved us from embarrassment.

It gets even trickier. How can you dismiss patterns that repeat consistently? After Oklahoma won 47 straight college football games, Sports Illustrated ran a 1957 cover story proclaiming, “Why Oklahoma is Unbeatable.” Oklahoma lost its next game and people started noticing that other athletes or teams who appear on the cover of Sports Illustrated tend to perform worse afterward. The Sport’s Illustrated Jinx was born. More recently, we have the Madden Curse, which says that the football player whose picture appears on the cover of Madden NFL, a football video game, will not perform as well the next season. The Sports Illustrated jinx and the Madden Curse are extreme examples of regression toward the mean. When a player or team does something exceptional enough to earn a place on the cover of Sports Illustrated or Madden NFL, there is essentially nowhere to go but down. To the extent luck plays a role in athletic success, the player or team that stands above all the rest almost certainly benefited from good luck—good health, fortunate bounces, and questionable officiating. Good luck cannot be counted on to continue indefinitely, and neither can exceptional success. There’s a Swedish proverb which states “Luck doesn’t give, it only lends.”

We ran into this at work. My company specialized in maximizing profit for domainers, or people who collect websites in order to show ads. We designed and conducted experiments to find the best design. So for example, a web visitor comes in and we generate a random number to determine which page they go to. When we then compared how the various pages performed, we knew we could trust the results because no possible confounding variable could be correlated with a random number. If we had just used a different layout each day, the results might be muddled by the nature of web traffic on the different days—for instance, people are typically more likely to click on ads on a Monday than over the weekend. So anyway, we know what the most profitable design was and used it all over the place.

Anyway, our customers have collections of domain names and, of course, some names do better than others. Some would ask us to work on their “underperformers” and see if we could get revenue up. So my friend in Analytics would change the web page design or the keywords and every single time, revenue would go up by 20% the next day. He was a hero and they were like “you should do this full time!” We made the point that, to be scientifically rigorous, we should really only be working on a random half of the names in order to have a control for comparison, but they thought we were being ridiculous. A twenty percent revenue lift the day after he made changes and we’re nitpicking the process?

Well, one day he forgot to make the changes and the next day, revenue went up for the names by 20% like they always did. It was like an anti-jinx! Instead of the best performers getting on the cover of Sports Illustrated, this was the worst performers getting emailed to Analytics. Someone came by his desk to congratulate him again and he said “I didn’t get around to it yet” and they said “well, whatever you did worked!” Now, we knew for sure that we had to hold back a control, because there was no way to know if what he was doing was helping or hurting!

It turns out that Regression toward the Mean is everywhere. Let’s quickly go through a few more examples…

Why is there a sophomore slump? It’s because you’re looking at the best freshmen. Whether you’re looking at the best batting averages or any other statistic, the top performers will almost always do worse the next time.

Why are movie sequels typically worse than the originals? Well, if you’ve been paying attention, you know how to fix this one. If you want sequels that are better than the originals, make sequels to the worst movies!

Why does punishment seem to work better than reward? Johnny does something exceptionally well and you give him a treat. Then, he does worse. Johnny does something exceptionally badly and you whack him. Then, he does better. The same thing would have happened without the treats or the whacking.

There was a study showing that kids who underperformed on the SAT did better the next time if they were on a drug called propranalol to relax them. As soon as you heard, “underperformed”, I hope your Spidey Sense tingled. They’re expected to do better if they underperformed! The kids almost certainly did worse on the drug than they would have without it, but you’d never know, because they didn’t use a randomized control.

Now, to be clear, this is not the even-steven theory from Seinfeld. Your luck will not reverse, you will just become less lucky.

So you can see why people believe in jinxes! By the way, if you think this talk is exceptionally interesting right now, I have bad news about the rest of the presentation (I knock on wood).

So interpreting data requires the scientific mindset, but what about finding good predictive features? Here’s a tough one for you. What is this thing?

Did anyone get it? Of course! This is easy for humans. Even if the sign were bent, rusty, or has a peace sticker on it, we would still know what it is. Not so with image-recognition software. During their training sessions, Deep Neural Net algorithms learn that the words “stop sign” go with images of many, many stop signs. Because they look at individual pixels, computer programs can be led astray by trivial variations. People can exploit this and intentionally cause a misidentification with tiny changes, called an adversarial attack.

Gary did a quick-and-dirty test by putting a peace sign on an image of a stop sign to see what a DNN would conclude. It misidentified the image as a first-aid kit. In 2018, the organizers of a machine-learning conference announced that they had accepted 11 papers proposing ways to thwart adversarial attacks like this. Three days later, an MIT graduate student, a Berkeley graduate student, and a Berkeley professor reported that they had found ways to work around 7 of these defense systems. There is clearly an AI arms race going on.

So how can thinking like a scientist possibly help a neural net work better? I talked to one to find out! “The Retinator”, Dr. Michael Abramoff, invented an autonomous AI system to diagnose diabetic retinopathy (DR), which is the leading cause of blindness in working-age adults. He said it took him decades, but he eventually succeeded in building a neural net AI system that performed as well as a doctor

So, if neural nets can be confused about stop signs, how did the Retinator keep his AI from being fooled?

His approach was to classify images of the eye the same way retinal experts do, by looking for specific signs of DR. He developed multiple detectors to look for known predictive features such as hemorrhages and other biomarkers. He also wanted his results to be comprehensible so that doctors and patients understand the diagnosis. If his system failed to recognize a case of DR, he wanted to know why it failed. He said “If I give clinicians an image with a bunch of hemorrhages, they’ll say ‘This is likely DR.’ If I start taking those hemorrhages away, eventually they’ll say ‘there’s no disease here.’” His biomarker AI system works the same

He wouldn’t settle for a black box, blank slate approach, because he knew that would risk catastrophic failure. In theory, letting computers teach themselves which characteristics are useful might find important features that clinicians didn’t know about. However, much of the data is irrelevant, so many features found to be correlated statistically with a DR diagnosis will be spurious. As with the stop sign detector, errors can arise when algorithms are put on auto-pilot. In the case of DR, there might be a bias due to the color of the retina, a different kind of background, or even part of the border around the image. A black-box model can fail with new images, with no one knowing why it failed.

Here you can see an example of a catastrophic failure of a black box algorithm. Its diagnosis is so fragile that you don’t even need a peace sign; changes in pixels that humans can’t even perceive can completely change the prediction. The Retinator’s system wasn’t tricked because it only considers the small number of features that make sense. In healthcare, the possibility of these adversarial images is particularly concerning because of the ability to make fraudulent claims by exploiting automated diagnoses.

In April 2018, Dr. Abramoff’s Idx-DR system became the first FDA approved autonomous AI diagnosis system.

Feature selection isn’t just a problem with neural nets. In 2011, Google created a program called Google Flu that used search queries to predict flu outbreaks. They reported that their model had a correlation of 0.975 with the actual number of flu cases from the CDC. Their data-mining program looked at 50 million search queries and identified the 45 queries that were most closely correlated with the incidence of flu. It was pure-and-simple data-mining. A valid study would use medical experts to specify a list of relevant query phrases in advance, and then see if there was an uptick in these queries shortly before or during flu outbreaks. Instead, Google’s data scientists had no control over the selection of the optimal search terms. The program was on its own, with no way of telling whether the search queries it found were sensible or nonsense. Google Flu may have been simply a winter detector. When it went beyond fitting historical data and began making real predictions, Google Flu was far less accurate than a simple model that predicted that the number of flu cases tomorrow will be the same as the number today. After issuing its report, Google Flu overestimated the number of flu cases by an average of nearly 100 percent. Google Flu no longer makes flu predictions.

Now I want to be clear: this type of automated data-mining is not what helped Google take over the world. It’s the thousands of rigorous A/B tests that they run that allowed them to do that. Having a huge amount of data to analyze for patterns is not enough, and Google knows that.

Compare Google Flu with how Wal-Mart stocks its shelves when a hurricane is on the way. Customers don’t just buy water and flashlights; they also buy strawberry Pop-Tarts and beer. Since historical data was analysed, this appears at first glance to be more mindless data mining. However, it is actually more like a controlled experiment! Recall that one major downside of data mining is the possibility of confounding variables. However, since hurricanes only affect a few stores out of many, Wal-Mart had a natural experiment that eliminates confounding influences like the day of the week or season of the year. This is almost as good as letting mad scientists randomly choose cities to be blasted by hurricanes and then comparing the shopping habits of the lucky and unlucky residents. The scientific method is alive and well at Wal-Mart. Another problem with data mining is that correlation can get confused with causation. It is highly unlikely that customers stocked up on Pop-Tarts in threatened cities for some reason other than the hurricanes. Also, unless buying Pop-Tarts causes hurricanes, the relationship clearly goes in the other direction. We might no know exactly why people buy these products, but we do know that hurricanes caused the increase in demand. An additional reason to believe in the Pop-tart / hurricane connection is that the association makes sense. Pop-tarts don’t have to be cooked and last practically forever. Taking advantage of natural experiments like this is something a scientists would think of.

So data doesn’t speak for itself and features don’t select themselves. Let’s shift a bit and talk about the “silent evidence of failures”. There are actually two versions of what’s called the Texas Sharpshooter Fallacy. The first one is that in order to prove what a great shot I am, I paint a thousand targets on a barn wall, fire my gun at it, and what a surprise, I hit a target! And then I go and erase all the other targets. And, of course, it’s meaningless, because, with so many targets, I’m bound to hit something… So the Texas Sharpshooter Fallacy #1 is testing lots and lots of different theories and reporting the one that seems confirmed by the data and not telling anybody that you tested many other theories. This fallacy contributes to the replication crisis in science, because there’s a publication bias towards significant findings. You’ll see the hits but not the misses.

Texas Sharpshooter Fallacy #2 is the picture here. You just fire your gun blindly at the wall and then you go and draw a target around the bullet hole and pretend that’s what you were aiming for. That’s like looking at the data and finding some coincidental little thing in there and pretending that’s what you were looking for in the first place.” I think there should be a third one, where there’s only one target, but you just keep shooting until you hit it. Then, you hide all of the bullet holes outside of the target and show what a good shot you are. This is what they call p-hacking, which is a reference to testing again and again until hitting that magical statistically significant p-value of 0.05.

Here’s a simple example: Derren Brown is a mentalist who said he could flip ten heads in a row with a fair coin. This is an astonishing claim since there is only a 1 in 1024 chance of doing that. Brown backed up his claim with a video filmed from two angles. There were no cuts in the video, it wasn’t a trick coin, and there were no magnets or other trickery involved. Is your Spidey Sense tingling?

In a later video, he gave away his secret: he had simply filmed himself flipping coins for nine hours until he got ten heads in a row. The video seemed magical, but it was a tedious trick. Brown’s prank is a clear example of how our perception can be distorted by what Nassim Taleb called the “silent evidence” of failures. If we don’t know about the failures, how can we evaluate the successes? As you develop your scientific Spidey Sense, you’ll notice that a lot of evidence starts to look like videotaped hot streaks.

So that was about interpreting evidence. Now I’d like to talk about creating compelling evidence. Suppose I want to convince you that poker is a game of skill and that you’re skeptical of my claim. Let’s say you think poker is a game of chance because the cards are randomly dealt and poker professionals often go broke. What evidence would convince you?

Well, regarding pros going broke, let me tell you about an interesting gambling experiment. Participants were given $25 and challenged to make as much money as they could in 30 minutes. They would do this by betting on a virtual coin that lands on heads 60 percent of the time. Clearly, betting on heads is a winning strategy, but how much should you bet? It turns out that something called the Kelly criterion gives an elegant answer: bet the “edge.” The edge is the difference between winning and losing chances so the edge for this game is 60–40, or 20 percent. If you bet 20 percent of your money on each flip, you can expect to make more money than you would by following any other strategy. I still remember the day Art Benjamin taught me that (he was in the audience). Most people in the study bet much more than this, and 28 percent lost all their money, even though they were winning 60% of the time. Despite the fact that the results of this experiment depend on coin tosses, I would consider this a game of skill because people who know what they are doing can expect to make more money than those who don’t. I would argue that broke poker pros are like the 28% who lost their money here. This experiment shows that betting on the right outcome is a different skill than knowing what stakes you can afford.

To create the strongest evidence that poker is a game of skill, I ran my own experiment. I predicted I could beat online poker and I put my own money on the line to prove it. I invested a whopping $50 into an online poker site and gathered data on how players respond to an immediate but small all-in bet. I downloaded the data from these experimental games and analyzed it. It turns out that people call too often. I used the data to determine which hands would win the maximum amount of money per hour assuming my opponents didn’t adjust. I called my strategy “the System”.

As you can see, when I started using the system, my opponents were no match for data science and they probably wish lucked played a bigger role in poker than it does.

This is not to say that the element of luck doesn’t shake things up once in awhile. It may not look like much on the chart, but at one point, my bankroll dropped $1800. Fortunately, I knew about the Kelly Criterion, so I never played at stakes that were so high that I could go broke. Of course, it was possible that my opponents had finally adjusted to my strategy so I analyzed the data again. I found out that it was just bad luck that had taken a bite out of my bankroll. I continued playing the System and my luck turned around as expected. By showing you results that would be exceedingly unlikely to happen by chance, now you almost have to believe my claim that poker is a game of skill. Data science isn’t just about analyzing data correctly, it’s about presenting your findings in a compelling way. And nothing is more compelling than experimental results.

You’ve waited long enough, so here are the 9 Pitfalls of Data Science. I want to you look down the list and notice something they have in common. That is that these are problems that can’t be solved by automation; they’re job security for data scientists! The Google Flu and Wal-mart Pop-Tarts stories describe different ways of analyzing historical data and show that the less mindless the approach, the better. Analysis on auto-pilot doesn’t work because data doesn’t speak for itself and up is not always up.

Similarly, the Retinator’s autonomous AI system got approved by the FDA because it was more than a black box stop sign identifier. People still have an important role in focusing computers on the features that matter.

The final take-away is that the way around the pitfalls is to follow the path of scientists. Be critical thinkers because data and features don’t always make sense. Be skeptics, because anyone can torture data until it backs them up. Be experimenters, because the strongest evidence is the evidence that could have gone against you. Put the science into data science. Be scientists.

Thank you so much for listening!


(As an Amazon Associate I earn from qualifying purchases!)

Quiz Time! Count the Pitfalls

I recently received an email from a financial advice firm about “rational decision making” that had a promising intro: “We’ve discovered five common investor biases and emotions that can lead to below-average returns.” Biases, I don’t like those. Emotions hurting returns, those aren’t good either. I’m listening!

            It describes loss aversion (irrationally holding a losing security in hopes that it will recover) and anchoring (relying too much on an initial piece of information) before offering this description of hindsight bias…

            Hindsight bias – Many investors believe that an asset will perform well in the near future because it has performed well in the recent past. As a result, some investors are constantly chasing returns. If we reflect on this, it is actually counterintuitive. The number one rule of investing is buy low and sell high. Take a look at the S&P 500 chart above. If you have not owned the S&P 500 Index over the last nine years, is now the time to buy after it is 300 percent more expensive than it was nine years ago?

            Okay, how many problems can you find in that last sentence?

            I count three!

  • “If you have not owned the S&P 500 Index …” Why mention this? It is a sunk cost fallacy to consider whether you bought something in the past. It’s either a good investment or it’s not.
  • “…over the last nine years…” This is classic cherry-picking. Where did the number nine come from? You can bet it came from the last time the S&P hit a low point.
  • “…is now the time to buy after it is 300 percent more expensive than it was nine years ago?” This is the gambler’s fallacy. It’s rational to expect something that’s done extremely well to do less well (regression toward the mean), but it’s not rational to imply that it’s now a bad investment due to its recent history. There is no force of nature that requires all good returns to be balanced out by bad returns. There is irony in providing this comment after explaining the anchoring bias to the reader.

Beware of people advising to “buy low and sell high” as if they know what low and high are. If it were that easy, the firm should just send out an email that says “LOW” or “HIGH” in the subject line so its customers can act accordingly and beat the market.

If you spotted the data science pitfalls in that financial advice, congratulations, you’re well on your way to becoming a skeptical and savvy consumer of data!

It’s Normal to be Irrational

In response to my blog on Roundup, I received this email (and no, it wasn’t addressed “dear Wonk”)…

“This is an excellent article and presentation of how the desired outcome affects the stated conclusion of a study. Based on news headlines, I have vilified Roundup. This is a convincing reminder to dig deeper before taking a position. 

What I see as a similar debate is the one about the risks and efficacy of vaccines. It is a particularly hot topic here in Oregon as the state legislature is working on a bill that removes the option for a non-medical exemption from vaccination for school children. If the child is not vaccinated and does not have a medical exemption, that child will not be allowed to attend public school. 

I find this similar to the Roundup issue because I have been told that there are studies that support both conclusions: vaccines do cause autism and other auto-immune disease and vaccines do not cause these conditions. I have not done any research myself. I understand that the linchpin study supporting the harmfulness of vaccines has been retracted. What is the truth?

I have friends looking to move out of the state if this bill becomes law. 

I would like to understand the science before addressing the personal liberty issue of such a law. 

Thanks for waking up my critical thinking skills.”

The study referenced was indeed retracted, but only after 12 long years. Even after it became known that the lead author failed to inform anyone that the study “was commissioned and funded for planned litigation” and that he had falsified data, his report continued to cause a decline in the vaccination rates. While there is always a slight risk of some severe allergic reaction, there is no evidence for a link between vaccines and disease, and the dangers they prevent are far greater than any that cause. By creating a belief in a false link between vaccines and autism that have directly led to lost lives, the retracted study may go down as one of the most damaging cases of fraud in history.

By the way, if you’re ever in need of evidence for the efficacy of vaccines, look no further than these visualizations

From a scientific standpoint, this issue looks like a slam dunk, so why are there still so many people trying to get out of vaccinations? For one, many are well aware of the evils of Big Pharma and profit-driven healthcare: the $30 billion budget for medical marketing in the US has brought us the opioid crisis, unaffordable drugs, and medication overload (40% of the elderly are taking five or more medications). It’s hard to imagine that public health is valued nearly as much as profit in this country. However, given the phony autism study above, created in the hope of winning lawsuits, maybe people who are watching out for Big Pharma simply need to learn to also watch out for Big Lawya.

I’m sure that awareness wouldn’t hurt, but it isn’t enough. Debunking studies misrepresenting the dangers of vaccines and ushering in piles of evidence about their benefits will probably have little effect on someone who wants to opt out. So what is it that’s actually causing these people to leave their kids and others vulnerable to potentially deadly diseases?

I’m thinking it’s the misguided fear of regret. In Michael Lewis’s book The Undoing Project, he mentions a paper about decision analysis called “The Decision to Seed Hurricanes.” There was a new technique available to the government (dumping silver iodide into storms) which could reduce the amount of damage done by hurricanes or alter their paths. However, while the government would not be given credit for what had been avoided (since nobody would know for sure), it would certainly be blamed for the damage the storm inflicted on its new path. This asymmetry between credit and blame causes a bias towards non-intervention, which struck me as similar to a parent’s decision for or against a vaccination. Their child may or may not have been on a path towards an infectious disease and if the vaccine turned out to be a life-saving factor later on, nobody would know.

Behavioral economists often model people as rational decision-makers who don’t seek risk or avoid risk, they weigh risk. They are expected to maximize their utility, which is a clever metric used to standardize the value of various outcomes so that math can be used to find the choices with the best expected value. However, psychologists Kahneman and Tversky found that rather than maximizing utility, people seek to minimize regret. In a memo to Tversky, Kahneman wrote “The pain that is experienced when the loss is caused by an act that modified the status quo is significantly greater than the pain that is experienced when the decision led to the retention of the status quo. When one fails to take action that could have avoided a disaster, one does not accept responsibility for the occurrence of the disaster.”

If you point out that people are irrational, they take it personally in a way that they don’t if you pointed out that they’ve been fooled by an optical illusion. What psychologists have discovered is that it’s normal to be irrational when faced with particular types of problems, so we shouldn’t take it personally. We should just learn when those times are, and resist the pull towards bad decision-making. We shouldn’t be angry when governments require us to become vaccinated, we should be thankful. They are saving us from our tendency to make bad decisions.

You may have read about the recent public health concern for moviegoers at a theater showing the Avengers a few weeks ago who may have been exposed to someone with measles. Here’s my takeaway idea to help people overcome their cognitive blind-spots on this: those who were at that theater and remained healthy should publicly thank their parents for vaccinating them. “I didn’t get measles the other day, thanks mom!” When we can, let’s make a point to acknowledge the moments when disaster didn’t strike and give credit where it’s due.

Reasonable Doubt: The Roundup Debate

A little bird sent me this request: “How about an analysis of the Roundup thing?”

I’d read about the multimillion-dollar lawsuits recently won against Monsanto, an agricultural biotechnology best known for its public relations challenges and questionable ethics. However, I wasn’t aware that Bayer had purchased the company last year and can now look forward to over 11,000 lawsuits! It certainly appeared as if the verdict was in about the dangers of Roundup, but jurors aren’t known for their ability to evaluate scientific evidence. While Monsanto has a financial incentive to try to influence studies, lawyers also do well in times of public hysteria (“mesothelioma” was the top-paying Internet ad at my last job). So let’s take a crack at getting beyond all of the perverse incentives at play and focus on the main question: Does Roundup cause cancer?

With controversial topics like these, it’s important to first look for a scientific consensus. In this case, the EPA, European Food Safety Authority, Food and Agriculture Organization, European Chemicals Agency, Health Canada, German Federal Institute for Risk Assessment and others have concluded that, at the levels people are exposed to glyphosate, the active ingredient in Roundup, it does not pose a risk of cancer. However, the consensus on glyphosate is not unanimous; there is one organization, the International Agency for Research on Cancer (IARC) which classified glyphosate as a “probable carcinogen.” Is this the only agency to escape Monsanto’s influence or is there another explanation?

It turns out that the IARC evaluates risk in a different way than the other agencies. It determines if the substance can cause cancer with exposure levels far more extreme than any that would be found in the real world. Practically everything is dangerous in high amounts, (including water) and the IARC, accordingly, has only found one out of the hundreds of agents they have evaluated as being “probably not carcinogenic.” I’m not accusing the IARC of practicing pseudoscience, but let’s just say that I’m sleeping better now that I know they’re the ones behind the California Prop 65 cancer warnings at fast food restaurants. I figure that as long as I don’t ingest 400 chalupas per day, I’ll probably be okay.

Due to the consensus of worldwide regulatory agencies (and with IARC’s conclusion put into context) I would already feel comfortable concluding that there is not sufficient evidence showing that Roundup causes cancer. However, let’s go down a level to the studies themselves and see what we find. The reason I didn’t start here is because individual studies can be very unreliable, especially when it comes to epidemiological studies (as opposed to controlled experiments). That said, one of the strongest experimental designs for these types of studies is the “prospective cohort study”, which follows a population of people with various exposure levels to the chemical over time and, only later, determines whether or not the groups show significant differences in health. While they can have their conclusions reversed due to unconsidered confounding variables (“Oops, people living close to power lines tend to be poorer and have less access to healthcare”), these types of studies at least avoid the problem of selective recall that plagues case-control studies: (“Hmm, I didn’t know what caused my tumor, but now that you mention it, I DO spend a lot of time on the cell phone!”). Following the surprising IARC conclusion, a study revisited and followed up on data from the large Agricultural Health Study (AHS). It found, in agreement with earlier conclusions, “no association was apparent between glyphosate and any solid tumors or lymphoid malignancies overall, including NHL and its subtypes.”

It certainly is looking like the evidence against Roundup is weak. However, a recent study in the journal Mutation Research threw a monkey wrench into things and associated weed killing products with non-Hodgkin lymphoma (NHL). It used the same AHS data above and combined it with a few less reliable case-control studies to conclude that people exposed to glyphosate have a 41% higher likelihood of developing NHL.

I’m a bit uncomfortable with the fact that it used the same data from a study that found no significant risk, added in less reliable data, and then concluded that there IS a risk. That seems like taking advantage of post-hoc wiggle-room. Another problem is that the 20-year time lag is the only one mentioned in the analysis. Why not report the results of the 15-year or 10-year lag since exposure? The 20-year lag was the only one that showed a relative risk greater than 0%. Coincidence? Read my upcoming book and you’ll suspect that this is Pitfall #5: Torturing Data. The study reports a 95% confidence interval as if they had a hypothesis, tested it, and found an increase in risk that would be unlikely if Roundup weren’t dangerous. In reality, when they skipped over data points that didn’t support their conclusion before landing on the one that did, the likelihood they would find something increased significantly. I can’t help but wonder if they would have even bothered to combine data from the less reliable studies if the AHS data showed significance on its own. I get the impression they found the result they did, because they went looking for it, and accordingly, their conclusion should be taken with a grain of salt. It would be analogous to asking “what are the chances I found an Easter egg in this particular place, given there were 20 possible places to search?” and not mentioning that you had searched a few other places before you found it. This may seem nit-picky when only a few results weren’t mentioned, but their whole conclusion of “statistical significance” hinges on it!

Observational studies like this are unreliable in the best of circumstances. They have the burden of showing that higher doses lead to a higher likelihood of illness (dose-response relationship). They have the burden of controlling for variables such as age, family history, body weight, and other things that may bias the results (confounding variables). For an extreme example, suppose there were a study that was much more compelling because it took blood samples of thousands of people and everyone with cancer had Roundup in their blood and everyone without cancer did not. A slam dunk case! Until later they find out that everyone with cancer also had DDT or some other chemical in their blood due to the fact that they were all farmers using a variety of insecticides. Suddenly, the case could fall apart.

Even if this study had carefully done everything possible and found that higher exposure to Roundup led to higher chances of developing NHL and also had a strong reason ahead of time that it would only show up after a 20-year lag, it would still be one observational study up against a consensus of health agencies around the world. People want to believe that science can confidently answer questions like “what causes cancer” by simply analyzing data. The truth is that, without randomized testing, science is severely hobbled. At best, it can draw tentative conclusions when data is carefully collected and analyzed by scientists trained in not fooling themselves and haven’t peeked at results before forming their hypotheses. Before you vote “guilty” in your next jury, remember that scientific consensus among multiple scientific organizations represents our best understanding of the world. In general, if you rely on “worldwide conspiracy and bribery” as the explanation for why scientific organizations are wrong, your argument is in trouble. No matter how compelling a conspiracy theory may be, the weight of scientific consensus should provide you with more than a reasonable doubt.

Disagree? Let me know what I got wrong and I’ll post updates. And keep those ideas coming for future blog entries!

https://www.motherjones.com/environment/2019/03/glyphosate-roundup-cancer-non-hodgkin-lymphoma-epa-panel-hardeman-lawsuit-jury-verdict/

https://www.skepticalraptor.com/skepticalraptorblog.php/glyphosate-linked-non-hodgkin-lymphoma-analysis/

https://www.science20.com/geoffrey_kabat/paper_claims_a_link_between_glyphosate_and_cancer_but_fails_to_show_evidence-236698

Artificial Unintelligence

The AI Delusion
by Gary Smith
Oxford University Press, 256 pp., USD $27.95.

“I for one welcome our new computer overlords” – Ken Jennings, Jeopardy champion

            In The AI Delusion, economist Gary Smith provides a warning for mankind. However, it is not a warning about machines, it is about ourselves and our tendency to trust machines to make decisions for us. Artificial Intelligence is fantastic for limited and focused tasks but is not close to actual general intelligence. Professor Smith points out that machines, for which all patterns in data appear equally meaningful, have none of the real-world understanding required to filter out nonsense. Even worse is the fact that many of the new algorithms hide their details so we have no way of determining if the output is reasonable. Even human beings, when not engaging their critical thinking skills, mistakenly draw conclusions from meaningless patterns. If we blindly trust conclusions from machines, we are falling for the AI delusion and will certainly suffer because of it.

The Real Danger of Artificial Intelligence

Speculators about the future of artificial intelligence (AI) tend to fall into one of two camps. The first group believes that, when hardware reaches the same level of complexity and processing speed as a human brain, machines will quickly surpass human-level intelligence and lead us into a new age of scientific discovery and inventions. As part of his final answer of the man vs. machine match against IBM’s Watson, former Jeopardy! champion Ken Jennings seemed to indicate that he was in this first camp by welcoming our computer overlords. The impressive AI system, which beat him by answering natural language questions and appeared to understand and solve riddles, made fully intelligent machines seem to be right around the corner.[1]

The second camp dreads an AI revolution. Having grown up on sci-fi movies, like the Matrix and the Terminator, they worry that superior intelligence will lead machines to decide the fate of mankind, their only potential threat, in a microsecond. Alternatively, and more realistically, they see a risk that AI machines may simply not value or consider human life at all and unintentionally extinguish us in their single-minded pursuit of programmed tasks. Machines may find a creative solution that people did not anticipate and endanger us all.

Gary Smith convincingly presents his belief that neither of these views is correct. If achieving true AI is like landing on the moon, all of the impressive recent advances are more like tree-planting than rocket-building. New advancements are akin to adding branches to the tree, and getting us higher off the ground, but not on the path towards the moon.

Humanity has turned away from the exceedingly difficult task of trying to mimic the way the brain works and towards the easier applications (such as spell-checkers and search engines) that leverage what computers do well. These new applications are useful and profitable but, if the goal is for machines to be capable of understanding the world, we need to start over with a new approach to AI. Machines gaining human-like intelligence is not something around the corner unless we start building rockets. 

The AI Delusion warns us that the real danger of AI is not that computers are smarter than we are but that we think computers are smarter than we are. If people stop thinking critically and let machines make important decisions for them, like determining jail sentences or hiring job candidates, any one of us may soon become a victim of an arbitrary and unjustifiable conclusion. It is not that computers are not incredibly useful; they allow us to do in minutes what might take a lifetime without them. The point is that, while current AI is artificial, it is not intelligent.

The Illusion of Intelligence

Over the years I have learned a tremendous amount from Gary Smith’s books and his way of thinking. It seems like a strange compliment but he is deeply familiar with randomness. He knows how random variables cluster, how long streaks can be expected to continue, and what random walks look like. He can examine a seemingly interesting statistical fluke in the data and conclude “you would find that same pattern with random numbers!” and then prove it by running a simulation. He uses this tactic often in his books and it is extremely effective. How can you claim that a pattern is meaningful when he just created it out of thin air?

The AI Delusion begins with a painful example for the political left of the United States. Smith points a finger at the over-reliance on automated number-crunching for the epic failure of Hillary Clinton’s presidential campaign in 2016. Clinton had a secret weapon: a predictive modeling system. Based on historical data, the system recommended campaigning in Arizona in an attempt for a blowout victory while ignoring states that Democrats won in prior years. The signs were there that the plan needed adjusting: her narrow victory over Bernie Sanders, the enthusiastic crowds turning out for Trump, and the discontent of blue-collar voters who could no longer be taken for granted. However, since her computer system did not measure those things, they were considered unimportant. Clinton should have heeded the advice of sociologist William Bruce Cameron: “not everything that can be counted counts, and not everything that counts can be counted.” Blindly trusting machines to have the answers can have real consequences. When it comes to making predictions about the real world, machines have blind spots, and we need to watch for them.

In contrast, machines are spectacular at playing games; they can beat the best humans at practically every game there is. Games like chess were traditionally considered proxies for intelligence, so if computers can crush us, does that mean that they are intelligent? As Smith reviews various games, he shows that the perception that machines are smart is an illusion. Software developers take advantage of mind-boggling processing speed and storage capabilities to create programs that appear smart. They focus on a narrow task, in a purified environment of digital information, and accomplish it in a way that humans never would. Smith points out the truth behind the old joke that a computer can make a perfect chess move while it is in a room that is on fire; machines do not think, they just follow instructions. The fact that they’re good at some things does not mean they will be good at everything.

In the early days of AI, Douglas Hofstadter, the author of the incredibly ambitious book Gödel, Escher, Bach: An Eternal Golden Braid, tackled the seemingly impossible task of replicating the way a human mind works. He later expressed disappointment as he saw the development of AI take a detour and reach for the tops of trees rather than the moon:

To me, as a fledgling [artificial intelligence] person, it was self-evident that I did not want to get involved in that trickery. It was obvious: I don’t want to be involved in passing off some fancy program’s behavior for intelligence when I know that it has nothing to do with intelligence.

A New Test for AI

The traditional test for machine intelligence is the Turing Test. It essentially asks the question: “Can a computer program fool a human questioner into thinking it is a human?” Depending on the sophistication of the questioner, the freedom to ask anything at all can pose quite a challenge for a machine. For example, most programs would be stumped by the question “Would flugly make a good name for a perfume?” The problem with this test is that it is largely a game of deception. Pre-determined responses and tactics, such as intentionally making mistakes, can fool people without representing any useful advance in intelligence. You may stump Siri with the ‘flugly’ question today, but tomorrow the comedy writers at Apple may have a witty response ready: “Sure, flidiots would love it.” This would count as the trickery Hofstadler referred to. With enough training, a program will pass the test but it would not be due to anything resembling human intelligence; it would be the result of a database of responses and a clever programmer who anticipated the questions.

Consider Scrabble legend Nigel Richards. In May 2015, Richards, who does not speak French, memorized 386,000 French words. Nine weeks later he won the first of his two French-language Scrabble World Championships. This can provide insight into how computers do similarly amazing things without actually understanding anything. Another analogy is the thought experiment from John Searle in which someone in a locked room receives and passes back messages under the door in Chinese. The person in the room does not know any Chinese; she is just following computer code that was created to pass the Turing Test in Chinese. If we accept that the person in the room following the code does not understand the questions, how can we claim that a computer running the code does?

A tougher test to evaluate machine intelligence is the Winograd Schema Challenge. Consider what the word ‘it’ refers to in the following sentences:

I can’t cut that tree down with that axe; it is too thick

I can’t cut that tree down with that axe; it is too small.

A human can easily determine that, in the first sentence, ‘it’ refers to the tree while, in the second, ‘it’ is the axe. Computers fail these types of tasks consistently because, like Nigel Richards, they do not know what words mean. They don’t know what a tree is, what an axe is, or what it means to cut something down. Oren Etzioni, a professor of computer science, asks “how can computers take over the world, if they don’t know what ‘it’ refers to in a sentence?”

One of my favorite surprises from the book is the introduction of a new test (called the Smith Test of course) for machine intelligence:

Collect 100 sets of data; for example, data on U.S. stock prices, unemployment, interest

rates, rice prices, sales of blue paint in New Zealand, and temperatures in Curtin,

Australia. Allow the computer to analyze the data in any way it wants, and then report the statistical relationships that it thinks might be useful for making predictions. The

computer passes the Smith test if a human panel concurs that the relationships selected by the computer make sense.

This test highlights the two major problems with unleashing sophisticated statistical algorithms on data. One problem is that computers do not know what they have found; they do not know anything about the real world. The other problem is that it is easy, even with random data, to find associations. That means that, when given a lot of data, what computers find will almost certainly be meaningless. Without including a critical thinker in the loop, modern knowledge discovery tools may be nothing more than noise discovery tools.

It is hard to imagine how a machine could use trickery to fake its way through a test like this. Countless examples in the book show that even humans who are not properly armed with a sense of skepticism can believe that senseless correlations have meaning:

  • Students who choose a second major have better grades on average. Does this mean a struggling student should add a second major?
  • Men who are married live longer than men who are divorced or single. Can men extend their lifespans by tying the knot?
  • Emergency room visits on holidays are more likely to end badly. Should you postpone emergency visits until the holidays are over?
  • Freeways with higher speed limits have fewer traffic fatalities. Should we raise speed limits?
  • Family tension is strongly correlated with hours spent watching television. Will everyone get along better if we ditch the TV?
  • People who take driver-training courses have more accidents than people who do not. Are those courses making people more reckless?
  • Students who take Latin courses score higher on verbal ability. Should everyone take Latin?

Many people incorrectly assume causal relationships in questions like these and unthinking machines would certainly do so as well. Confounding variables only become clear when a skeptical mind is put to use. Only after thinking carefully about what the data is telling us, and considering alternate reasons why there might be an association, can we come to reasonable conclusions.

Gary Smith’s specialty is teaching his readers how to spot nonsense. I’m reminded of a memorable speech from the movie My Cousin Vinny[2]:

Vinny: The D.A.’s got to build a case. Building a case is like building a house. Each piece of evidence is just another building block. He wants to make a brick bunker of a building. He wants to use serious, solid-looking bricks, like, like these, right? [puts his hand on the wall]
Bill: Right.
Vinny: Let me show you something.
[He holds up a playing card, with the face toward Billy]
Vinny: He’s going to show you the bricks. He’ll show you they got straight sides. He’ll show you how they got the right shape. He’ll show them to you in a very special way, so that they appear to have everything a brick should have. But there’s one thing he’s not gonna show you. [He turns the card, so that its edge is toward Billy]
Vinny: When you look at the bricks from the right angle, they’re as thin as this playing card. His whole case is an illusion, a magic trick…Nobody – I mean nobody – pulls the wool over the eyes of a Gambini.

Professor Smith endeavors to make Gambinis out of us all. After reading his books, you are taught to look at claims from the right angle and see for yourself if they are paper thin. In the case of The AI Delusion, the appearance of machine intelligence is the magic trick that is exposed. True AI would be a critical thinker with the capability to separate the meaningful from the spurious, the sensible from the senseless, and causation from correlation.

Data-Mining for Nonsense

The mindless ransacking of data and looking for patterns and correlations, which is what AI does best, is at the heart of the replication crisis in science. Finding an association in a large dataset just means that you looked, nothing more. Professor Smith writes about a conversation he had with a social psychologist at Sci Foo 2015, an annual gathering of scientists and writers at Googleplex. She expressed admiration for Daryl Bem, a social psychologist, who openly endorsed blindly exploring data to find interesting patterns. Bem is known, not surprisingly, for outlandish claims that have been refuted by other researchers. She also praised Diederik Stapel who has even admitted that he made up data. Smith changed the subject. The following day a prominent social psychologist said that his field is the poster-child for irreproducible research and that his default assumption is that every new study is false. That sounds like a good bet. Unfortunately, adding more data and high-tech software that specializes in discovering patterns will make the problem worse, not better.

To support the idea that computer-driven analysis is trusted more than human-driven analysis, Smith recounts a story about an economist in 1981 who was being paid by the Reagan administration to develop a computer simulation that predicted that tax revenue would increase if tax rates were reduced. He was unsuccessful no matter how much the computer tortured the data. He approached Professor Smith for help and was not happy when Smith advised him to simply accept that reducing tax rates would reduce tax revenue (which is, in fact, what happened). The effort to find a way to get a computer program to provide the prediction is telling; even back in the 80s people considered computers to be authoritative. If the machine says it, it must be true.

Modern day computers can torture data like never before. A Dartmouth graduate student named Craig Bennett used an MRI machine to search for brain activity in a salmon as it was shown pictures and asked questions. The sophisticated statistical software identified some areas of activity! Did I mention that the fish was dead? Craig grabbed it from a local market. There were so many areas (voxels) being examined by the machine that it would inevitably find some false positives. This was the point of the study; people should be skeptical of findings that come from a search through piles of data. Craig published his research and won the Ig Nobel Prize, which is awarded each year to “honor achievements that first make people laugh, and then make them think.” The lesson for the readers of AI Delusion is that anyone can read the paper and chuckle at the absurdity of the idea that the brain of a dead fish would respond to photographs but the most powerful and complex neural net in the world, given the same data, would not question it.

One of the biggest surprises in the book was the effective criticism of popular statistical procedures including stepwise regression, ridge regression, neural networks, and principal components analysis. Anyone under the illusion that these procedures will protect them against the downsides of data-mining is disabused of that notion. Professor Smith knows their histories and technical details intimately. Ridge regression, in particular, takes a beating as a “discredited” approach. Smith delivers the checkmate, in true Smithian style, by sending four equivalent representations of Milton Friedman’s model of consumer spending to a ridge regression specialist to analyze:

I did not tell him that the data were for equivalent equations. The flimsy foundation of ridge regression was confirmed in my mind by the fact that he did not ask me anything about the data he was analyzing. They were just numbers to be manipulated. He was just like a computer. Numbers are numbers. Who knows or cares what they represent? He estimated the models and returned four contradictory sets of ridge estimates.

Smith played a similar prank on a technical stock analyst. He sent fictional daily stock prices based on student coin flips to the analyst to see if it would be a good time to invest. The analyst never asked what companies the price history was from but became very excited about the opportunity to invest in a few of them. When Smith informed him that they were only coin flips, he was disappointed. He was not disappointed that his approach found false opportunities in noise but that he could not bet on his predictions. He was such a firm believer in his technical analysis that he actually believed he could predict future coin flips.

Automated stock-trading systems, similar to AI, are not concerned with real world companies; the buy and sell decisions are based entirely on transitory patterns in the price and the algorithms are tuned to the primarily meaningless noise of historical data. I wondered why, if stock trading systems are garbage, investment companies spend billions of dollars on trading centers as close to markets as possible. Smith explains this as well: they want to exploit tiny price discrepancies thousands of times per second or to front-run orders from investors and effectively pick-pocket them. This single-minded pursuit of a narrow goal without concern for the greater good is unfortunately also a feature of AI. The mindless world of high-frequency trading, both when it is profitable (exploiting others) and when it is not (making baseless predictions based on spurious patterns), serves as an apt warning about the future that awaits other industries if they automate their decision-making.

Gary Smith draws a clear distinction between post-hoc justification for patterns found rummaging through data and the formation of reasonable hypotheses that are then validated or refuted based on the evidence. The former is unreliable and potentially dangerous while the latter was the basis of the scientific revolution. AI is built, unfortunately, to maximize rummaging and minimize critical thinking. The good news is that this blind spot ensures that AI will not be replacing scientists in the workforce anytime soon.

There Are No Shortcuts

If you have read other books from Gary Smith, you know to expect many easy-to-follow examples that demonstrate his ideas. Physicist Richard Feynman once said “If you cannot explain something in simple terms, you don’t understand it.” Smith has many years of teaching experience and has developed a rare talent for boiling ideas down to their essence and communicating them in a way that anyone can understand.

Many of the concepts seem obvious after you have understood them. However, do not be fooled into believing they are self-evident. An abundance of costly failures have resulted from people who carelessly disregarded them. Consider the following pithy observations…

We think that patterns are unusual and therefore meaningful.

Patterns are inevitable in Big Data and therefore meaningless.

The bigger the data the more likely it is that a discovered pattern is meaningless.

You see at once the danger that Big Data presents for data-miners. No amount of statistical sophistication can separate out the spurious relationships from the meaningful ones. Even testing predictive models on fresh data just moves the problem of finding false associations one level further away. The scientific way is theory first and data later.

Even neural networks, the shining star of cutting edge AI, are susceptible to being fooled by meaningless patterns. The hidden layers within them make the problem even worse as they hide the features they rely on inside of a black box that is practically impossible to scrutinize. They remind me of the witty response from a family cook responding to a question from a child about dinner choices: “You have two choices: take it or leave it.”

The risk that data used to train a neural nets is biased in some unknown way is a common problem. Even the most sophisticated model in the world could latch on some feature, like the type of frame around a picture it is meant to categorize, and become completely lost when new pictures are presented to it that have different frames. Neural nets can also fall victim to adversarial attacks designed to derail them by obscuring small details that no thinking entity would consider important. The programmers may never figure out what went wrong and it is due to the hidden layers.

A paper was published a couple days ago in which researchers acknowledged that the current approaches to AI have failed to come close to human cognition. Authors from DeepMind, as well as Google Brain, MIT, and the University of Edinburgh write that “many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches.”[3] They conclude that “a vast gap between human and machine intelligence remains, especially with respect to efficient, generalizable learning.”

The more we understand about how Artificial Intelligence currently works, the more we realize that ‘intelligence’ is a misnomer. Software developers and data scientists have freed themselves from the original goal of AI and have created impressive software capable of extracting data with lightning speed, combing through it and identifying patterns, and accomplishing tasks we never thought possible. In The AI Delusion, Gary Smith has revealed the mindless nature of these approaches and made the case that they will not be able to distinguish meaningful from meaningless any better than they can identify what ‘it’ refers to in a tricky sentence. Machines cannot think in any meaningful sense so we should certainly not let them think for us.


[1] Guizzo, Erico. “IBM’s Watson Jeopardy Computer Shuts Down Humans in Final Game.” IEEE Spectrum: Technology, Engineering, and Science News. February 17, 2011. Accessed November 05, 2018. https://spectrum.ieee.org/automaton/robotics/artificial-intelligence/ibm-watson-jeopardy-computer-shuts-down-humans.

[2] My Cousin Vinny. Directed by Jonathan Lynn. Produced by Dale Launer. By Dale Launer. Performed by Joe Pesci and Fred Gwynne

[3] Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. “Relational inductive biases, deep learning, and graph networks.” arXiv preprint arXiv:1806.01261, 2018.

True Crime

There have been conflicting reports about the US crime rate in the media.  On one hand, President Obama said last year that “there’s been an incredible drop in violent crime” and “our crime rate today is substantially lower than it was five years ago, 10 years ago, 20 years ago, 30 years ago.”  On the other hand, the then presidential candidate Donald Trump stated that “violent crime has increased in cities across America.  The New York Times described ‘the startling rise in murders,’ in our major cities.”  So which is it?

It is true that the New York Times reported that “data … based on reports from more than 60 cities,  showed notable increases in murders in about two dozen cities in the first three months of the year compared to last year and about a 9 percent increase nationwide.”  However, the long-term trend is undeniable.  The murder rate is practically at an all-time low.

Crime Trends Chart

As the final project for my Data Visualization and Communication class in the Berkeley MIDS program, my team came up with some useful charts, fueled by data from the Chicago Police Department’s CLEAR system, where users can see for themselves what the trend is for a wide variety of crimes.  We categorized the types of crimes according to what we felt would be in the public interest and adjusted for the recent decline in population in Chicago by calculating the crime rate per 100,000 residents per year.  This also provides context so people can understand what the average risk of victimization is per year for any particular crime.

In addition to addressing the misleading statements made by politicians, we also wanted to provide a counterpoint to news broadcasting in this country, which has a financial incentive to focus on the sensational crimes that bring in viewers.  News watchers receive a steady diet of stories about homicide and home invasions without proper context.  Our chart also aims to dispel the perception that those are common types of crime and place them in perspective.  The news is entertaining but not very informative, when it comes to communicating the likelihood of various crimes taking place.

Interactive Crime Maps

While the Crime Trends chart are of interest to the general public seeking information about crime trends, our interactive crime maps are of particular use to someone, such as a home-buyer, interested in crime rates in specific community areas within Chicago.

Users can also select and isolate statistics for communities if they want to quantify the safety of nearby areas or determine the relative safety of traveling through a particular series of communities.  By toggling the selection of areas they wish to compare, they will be able to determine which places are best to avoid.

Enjoy!

–> Interact with our awesome charts here <–

Greatest Hits of RDADA

I just completed Berkeley’s enjoyable DATASCI W201 class “Research Design and Applications for Data Analysis” and figured I might as well share a few of my essays since I crafted them with such tender-loving care.  Enjoy!

 

Week 5 – Up Is Up by Jay Cordes – Tuesday, 07 June 2016, 09:22 AM

At my last job, there were many examples where decision makers took questionable facts for granted. “Up is up” was even a favorite saying of one of the executives, and I would often have back and forth exchanges with him in meetings when I thought he was being duped by data.

In one case, practically everyone was fooled by a case of regression towards the mean. My business unit designed and conducted experiments to determine which web page designs maximized ad revenue for our customers, who collect domain names. Invariably, some of their domains would perform worse than others, and it was decided that we should try to optimize those “underperformers” by hand (typically, we’d use a generic design across the board). Being scientifically rigorous and only working on a random half was considered overkill.

My friend Brad was the one tasked with actually making the modifications and was given the list of domain names each week that could use help. Every single time he hand-optimized a set of domains, the revenue on those sites would significantly increase the following day (around 30-40% if I recall). He was praised by customers and managers alike who started to wonder why he’s not hand-optimizing all of the domains on our system. Brad was savvy enough to be skeptical of results and smiled knowingly when I pointed out that there’s practically no way for him to fail since the worst domains would be expected to improve anyway just by chance.

Well, one time he forgot to work on the set of domains given to him. The next day, their total revenue skyrocketed by about 40% and the customer wrote “whatever you did worked!” Since the increase was at least as much as we’d seen with his hand-optimizing, it was now realized that he could actually have been harming revenue. If we hadn’t assumed that things could only get better and used a randomized control (selected from the domains he was told to optimize), we would’ve clearly seen whether or not his work was having a benefit. The moral to the story is: up is definitely not always up.

 

Week 7 – Phones Don’t Cause Cancer, The World is P-Hacking by Jay Cordes – Tuesday, 21 June 2016, 06:46 PM

The article, “Study finds cell phone frequency linked to cancer,” (http://www.mysuncoast.com/health/news/study-finds-cell-phone-frequency-linked-to-cancer/article_26411b44-290a-11e6-bffc-af73cd8bda5f.html) reports that a recent study shows that cell phone radiation increases brain tumors in rats. It also provides safety tips, such as keeping phones away from your ear.

The problem with the article is subtle: the failure to recognize the likelihood of spurious results due to repeated testing. “P-hacking” refers to when researchers hide or ignore their negative results. It’s similar to editing a video of yourself shooting baskets and only showing a streak of 10 consecutive makes. This study wasn’t p-hacked, but the result should still be regarded with skepticism due to the number of other studies that did not find similar evidence. In other words, global p-hacking is occurring when articles focus on this experiment in isolation.

Another way to think of p-hacking is in regards to “regression toward the mean”, which occurs because (1) luck almost always plays some role in results, and (2) it is more likely to be present at the extremes. A small amount of luck will lead to future results that are slightly less extreme. However, if the results are due entirely to luck, they will evaporate completely upon re-testing. With many failed attempts comes a higher likelihood that luck played a big role in the successes.

The article doesn’t recognize that results from other studies should weaken the confidence you should have in this one. It also mentions that the radiation is non-ionizing, but doesn’t note that only ionizing radiation (high frequency electromagnetic radiation including ultraviolet, x-ray, and gamma rays) has been shown to cause mutations. From a purely physical standpoint, cancer-causing cell phone radiation is very unlikely. Also, no mention is made of the fact that cancer rates haven’t skyrocketed in response to the widespread cell phone usage around the world.

The only criticism I have for the experiment itself is the fact that the results were broken down by male/female. What explanation could there be for why females were not affected by radiation? Would the result have been significant if the sexes were combined? A couple “misses” do seem to have been edited out.

I chose this blind spot to discuss because I believe it scores a perfect 10 in terms of its pervasiveness and negative impact on interpreting results. At my last job, we ran a randomized controlled test on web pages of a few different colors and found no significant difference in revenue. My boss then recommended that we look at the results by country, which made me wince and explain why that’s a data foul. An analyst did look at the results and found that England significantly preferred teal during the experiment. Due to my objections, we thankfully did not roll out the teal web page in England, but kept the experiment running. Over the next few weeks, England appeared to significantly hate teal. It pays to be skeptical of results drawn from one of many experiments.

 

Week 10 – Jay Cordes: Visualizations Can Affect Your Intelligence by Jay Cordes – Monday, 11 July 2016, 11:33 PM

The worst visualization I’ve ever seen actually makes you dumber for having seen it. It’s this one from Fox News evidently attempting to show that global warming is not a thing…

every-day-is-april-fools-at-fox-news-L-byleIS

Figure 1 – Wow, this is nonsense.

It’s not bad in the sense of Edward Tufte’s ”chartjunk” (there’s no graphical decoration at all). It’s also not violating the idea of data-ink maximization (there’s actually not enough ink used, since there’s no legend explaining what the colors represent). It’s bad because:

  1. It’s flat out nonsense. It turns out that the map on the left shows whether temperatures were higher or lower than normal the prior March, while the map it’s being compared to is the absolute minimum temperature on a particular day.
  2. Even if the chart were actually comparing what it pretends to be, it would still be meaningless. Can a one month exception cast doubt on a decades long trend?
  3. If the colors on the map on the left map actually meant what they appear to mean, then we are supposed to believe that Minnesota hit 100 degrees in March of 2012? When it did set the record for the hottest average March temperature in history, but it was only 48.3 degrees.

On the other hand, one of the best charts I’ve seen is this one from the Economist that we used in our presentation on Brexit.

20160702_FBC814

Figure 2 – Now we’re learning something

There’s a density of information that is presented very simply and efficiently: Which European countries believe they’d be better off outside the European Union? Is it more that they think their economics and commerce would improve or their migration controls? What are their population sizes? Why did England vote to leave? Unlike the Fox News chart, the data is presented truthfully and clearly. Answers to all of these questions can be quickly drawn from it. I also like the subtle details, such as making it clear what the colors of the circles represent by simply coloring the font of the axis labels. The grid lines are white on a light gray background to avoid cluttering up the chart. Also, the population sizes are rounded to one of three values to simplify comparison. This chart definitely follows the idea of data ink maximization.

Sources: http://mediamatters.org/blog/2013/03/25/another-misleading-fox-news-graphic-temperature/193247 http://www.economist.com/news/briefing/21701545-britains-decision-leave-european-union-will-cause-soul-searching-across-continentand http://files.dnr.state.mn.us/natural_resources/climate/twin_cities/ccmarch1981_2010.html http://shoebat.com/wp-content/uploads/2016/06/20160702_FBC814.png http://m5.paperblog.com/i/48/483755/every-day-is-april-fools-at-fox-news-L-byleIS.jpeg

Relax, the Crime Rate Isn’t Spiking

These days, if you listen to presidential candidates of various hand-sizes or watch a lot of news, you probably think that a crime wave is overtaking the United States.  However, if you happen to be a data science student who’s looking at crime data and working on homework assignments like these, you probably know better.

Here’s some recent data about crime in Chicago…

CrimeByDay

Pretty much every type of crime is dropping year after year and it’s been happening for quite awhile. If you insist on worrying about stuff, you should know that theft, battery, and criminal damage are the most common types of crime and make up about half of all arrests.  Also, in case you’re curious, those cycles in the figure above are due to the fact that more crime occurs when it’s hot.

CrimeByTemp

It’s incredibly consistent: as the temperature goes up, so does the crime rate.  So, if you’re really worried about crime, stay in your safe room during hot days.  Actually, it turns out that safe rooms aren’t that necessary either.  Home invasions, while they make good news stories, actually made up 0.085% of all crime in Chicago.  That’s not 8.5%, that’s 8.5 one hundredths of a percent.  Similarly, homicide, while making up about a quarter of the crime stories on the news actually represent less than 0.2% of all crime.  There should probably be some kind of a warning label on newscasts: “We’ve chosen to show you tons of murders only because it’s exciting and brings in viewers.  We are in no way implying that the Zombie Apocalypse is upon us.”

So, rest easy kids!  Put down the guns, because you’ll probably just hurt yourselves.  And whatever you do, don’t travel into the past, because that’s where the real danger lies.

Back to school! “Please don’t force push”

While I couldn’t have been more excited to start the MIDS program (Master of Information and Data Science) at UC Berkeley, the very first class just happened to conflict with the opportunity to meet this Guy…

BillNyeTheScienceGuy

I guess that proves that even more than a “data wonk,” I am at heart a “science freak.”  We ended up even getting to walk with Mr. Guy on the way to go see his buddy Neil deGrasse Tyson give an entertaining lecture about Sci-Fi movies and their scientific accuracy and/or lack of it, so it was a full-on geek-out that night.  But I digress!

Prior to the first live classes, we had access to a killer application that provided access to tons of pre-recorded lectures.  I have it on my desktop, my iPhone, and my iPad…

ISVCApp

You can stream the lectures or download them for offline viewing later and can adjust the speed of the video to really plow through them in record time.  If you go fast enough, you feel like you’re Neo downloading knowledge directly into your brain.  Before long, your screen has check marks by everything and then you look up and say to yourself “I know Python.”  You know you’re addicted to the app when you grab your iPad and are actually not sure whether you’d rather check to see what your classmates and professor are posting on the wall or play Plants vs. Zombies.  Admit it, just by reading this and you’re already starting thinking about studying for those GREs and going back to school!

Finally, it was time for live classes to begin…

FirstMIDSClass

Not your gramma’s graduate school!  It was a lively and interesting discussion and nobody sang the Brady Bunch theme song.  There also wasn’t any singing when the 2nd homework assignment was handed out and was surprisingly challenging: writing python files from within a bash script and getting up to speed on a code versioning app called git.  Being pretty new to git, I forgot to add a comment once at the end of a “git commit” command and was thrown into a nightmare of a text editor called vim that I COULD NOT ESCAPE FROM without doing web research.  Turns out that the command I was looking for was “:wq” to close the window.  As a former software developer, stuff like that drives me crazy.  Even if a programmer was sick enough to think that that was a good command for something as common as closing a window, did he/she not have a manager to say “Okay, you know that someone other than you might end up using this program, right?”

A friend of mine backed up my initial impressions of git’s user-friendliness: “Yes, git is overwhelmingly cryptic.  A basic workflow can be described in a more straightforward way, but God forbid anything go wrong.  I’ve spent many half-days trying to fix a mistake in git.  It’s kind of amazing to me that it has caught on as much as it has.”  He also supplied me with this helpful advice…

git

I ended up getting through the assignment without too much trouble, but evidently others went a bit astray, because the following day, the professor posted a message on the wall: “…Please don’t force push to the github playground … if you do that, that means you’re going to overwrite someone else’s changes.  If you do that, we can’t keep track of who all made changes.  That means we’ll also be modifying the assignment – we just want you to have one change in the github playground.”

After reading that, I naturally had to post the following to the wall…

ForcePush2

ForcePush

Good times!