Big Data/Analytics Zone is brought to you in partnership with:

John Cook is an applied mathematician working in Houston, Texas. His career has been a blend of research, software development, consulting, and management. John is a DZone MVB and is not an employee of DZone and has posted 171 posts at DZone. You can read more from them at their website. View Full User Profile

The Limits of Statistics

  • submit to reddit

When statisticians analyze data, they don’t just look at the data you bring to them. They also consider hypothetical data that you could have brought. In other words, they consider what could have happened as well as what actually did happen.

This may seem strange, and sometimes it does lead to strange conclusions. But often it is undeniably the right thing to do. It also leads to endless debates among statisticians. The cause of the debates lies at the root of statistics.

The central dogma of statistics is that data should be viewed as realizations of random variables. This has been a very fruitful idea, but it has its limits. It’s a reification of the world. And like all reifications, it eventually becomes invisible to those who rely on it.

Data are what they are. In order to think of the data as having come from a random process, you have to construct a hypothetical process that could have produced the data. Sometimes there is near universal agreement on how this should be done. But often different statisticians create different hypothetical worlds in which to place the data. This is at the root of such arguments as how to handle multiple testing.

You can debunk any conclusion by placing the data in a large enough hypothetical model. Suppose it’s Jake’s birthday, and when he comes home, there are Scrabble tiles on the floor spelling out “Happy birthday Jake.” You might conclude that someone arranged the tiles to leave him a birthday greeting. But if you are so inclined, you could attribute the apparent pattern to chance. You could argue that there are many people around the world who have dropped bags of Scrabble tiles, and eventually something like this was bound to happen. If that seems to be an inadequate explanation, you could take a “many worlds” approach and posit entire new universes. Not only are people dropping Scrabble tiles in this universe, they’re dropping them in countless other universes too. We’re only remarking on Jake’s apparent birthday greeting because we happen to inhabit the universe in which it happened.

Published at DZone with permission of John Cook, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)