Scientific research and false positives

I just read the following interesting article regarding the importance of replicating experiments to be sure that a result is scientifically valid: http://www.slate.com/articles/health_and_science/science/2014/07/replication_controversy_in_psychology_bullying_file_drawer_effect_blog_posts.single.html. This strikes me as an engaging way to introduce the importance of P-values to a statistics class. Among the many salient quotes:

Psychologists are up in arms over, of all things, the editorial process that led to the recent publication of a special issue of the journal Social Psychology. This may seem like a classic case of ivory tower navel gazing, but its impact extends far beyond academia. The issue attempts to replicate 27 “important findings in social psychology.” Replication—repeating an experiment as closely as possible to see whether you get the same results—is a cornerstone of the scientific method. Replication of experiments is vital not only because it can detect the rare cases of outright fraud, but also because it guards against uncritical acceptance of findings that were actually inadvertent false positives, helps researchers refine experimental techniques, and affirms the existence of new facts that scientific theories must be able to explain…

Unfortunately, published replications have been distressingly rare in psychology. A 2012 survey of the top 100 psychology journals found that barely 1 percent of papers published since 1900 were purely attempts to reproduce previous findings…

Since journal publications are valuable academic currency, researchers—especially those early in their careers—have strong incentives to conduct original work rather than to replicate the findings of others. Replication efforts that do happen but fail to find the expected effect are usually filed away rather than published. That makes the scientific record look more robust and complete than it is—a phenomenon known as the “file drawer problem.”

The emphasis on positive findings may also partly explain the fact that when original studies are subjected to replication, so many turn out to be false positives. The near-universal preference for counterintuitive, positive findings gives researchers an incentive to manipulate their methods or poke around in their data until a positive finding crops up, a common practice known as “p-hacking” because it can result in p-values, or measures of statistical significance, that make the results look stronger, and therefore more believable, than they really are.

I encourage teachers of statistics to read the entire article.

Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: