Home News Preserving the Integrity of Statistics

Preserving the Integrity of Statistics

November 1, 2009

November 1, 2009 (Vol. 29, No. 19)

Zachary N. N. Russ Bioengineering graduate student UC Berkeley

Students and Novice Researchers Need Guidance on How and When to Use This Essential Tool

Ronald Coase, the Nobel Prize-winning economist, is commonly credited with the adage, “If you torture the data long enough, it will confess.” I was recently reminded of Dr. Coase’s astute observation while relaxing with a group of bio-buddies after a long day of tending liquid cultures. The conversation worked its way around to common frustrations in the lab—dying colonies, promiscuous antibodies, and, surprisingly, the question of statistics.

One colleague complained, “I told my PI that we shouldn’t just average the data with an arithmetic mean, that you can’t use ANOVA for everything. But she said, ‘Statistics are misrepresentations anyway.’ And I shot back, ‘No they’re not! I took statistics. There is a proper way to report this data, and we’re not doing it.’”

I agreed with my friend; I too had taken a statistics course and chafe at some of the shortcuts being used to convert piles of data into succinct numbers and explanations. Sometimes we forget that statistics are a distinct science—a science that has its roots in mathematics, but a science nonetheless. As a science, it requires the same integrity that we would bring to any scientific inquiry.

This is what Richard Feynman discussed in his 1974 commencement address at Caltech, the famous “cargo cult science” speech. In what could be considered a companion piece to Coase’s quote, Feynman challenged anyone presuming to engage in scientific inquiry: “The first principle is that you must not fool yourself—and you are the easiest person to fool.”

And isn’t that the gold standard to which we aspire? What distinguishes scientists, mathematicians, physicians, engineers, and other fact-based professionals from the rest of humanity is our relentless attention to detail. It’s the exhausting, even loving care that we give to a real-world challenge to make sure that we are not fooling that first person, the easiest person to fool, the person with the highest risk exposure relating to job, reputation, and ego.

I find it odd that one would go through the intricate demands of the scientific method, only to drop the ball at the end when statistics come into the picture.

Biology further complicates life for the conscientious statistician. Oftentimes we have a shortage of data, or we can’t use a direct indicator of our variable of interest, so we compromise. We use biomarkers, such as fluorescent tags to detect molecules in a pathway we’re measuring or optical density to measure growth. But optical density can also reflect contaminants or different protein products—there’s the rub. The pitfall of biostatistics is finding an appropriate metric, the yardstick by which we can measure across different data types or populations to allow reasonable comparison of different data populations.

The paucity of data is another problem, one I have personally faced: When can you pool data to drag the best conclusions out of a limited number of trials and samples? One actually needs to analyze the variables to see if there are significant differences across trials, and sometimes there might even be significant differences within trials that may force you to discard more data. (Think edge effects of evaporation from a 96-well plate.) Even so, biological statistics are still statistics, and their problems can be solved with careful attention to the definition of variables, methodology, and assumptions.

Every statistic has at its core these three critical components: a definition, a methodology, and a set of assumptions. They correspond closely with the components of an experiment: a hypothesis, a method, and background. Just as in scientific inquiry, carelessness with any one component can create confusion and mistakes. That’s why we share these components in papers—so that others can check our work to see whether what we did was reasonable and recreate it if necessary. Sharing these components distinguishes a statistic from a pile of numbers, and an experiment from a magic trick. What makes these components so important?

The definition of a statistic explains what the number represents and how it can be used: is it a probability that suggests the results you got could only happen less than 5% of the time if your hypothesis were true? Or perhaps it is a quantity, which can be used to compare the properties of similar entities, like the binding constants of different antibodies or the efficacy of new drugs.

Methodology is equally important: How did you arrive at the end result? There are a variety of statistical tools to choose from, such as averaging all the data points, which might produce the most misleading statistic when some of the samples have been inadvertently adulterated. Cherry-picking, the practice of selecting only the samples that support one’s hypothesis, is just as questionable.

The only real way to address these problems is to share the details of how the data was collected (essentially an experimental method), how the data was manipulated (what mathematical operations were done to it, such as normalizing to a baseline), and what information was kept or set aside (such as throwing out bad samples). The methodology can be the greatest challenge because data points are often scarce, making pooling an attractive, though dangerous, option.

The final segment of a statistic is the set of assumptions, which form the foundation of a paper, the reasons why you did things the way you did. If you assumed that cells growing in one well would be completely independent of cells growing in another well (e.g., that they all had the same light and humidity and environment) and used this independence to state that it wouldn’t matter whether the cells were in the top right of the plate or the center, you made an important assumption—one which may or may not be valid.

Treating the data as if they are distributed on a bell curve rather than, say, an exponential curve, is a critical assumption, one that determines what kinds of statistical tests are appropriate to use. Finally, the most basic assumptions, those that you make when stating your conclusions, are that you have controlled for the greatest sources of error in your experiment and stomped out the lurking and confounding variables, those unseen specters that often mislead researchers, and that your statistic contributes relevant information to what you are trying to figure out. Otherwise, there would be no sense in reporting the statistic at all.

When done correctly, statistics are the ultimate realization of the scientific method: A hypothesis is established with the selection of a variable to examine, a methodology is established to collect data on this variable, and the entire experimental design and scope of the results depend on the assumptions used to collect and interpret the data. Controls are established to reduce the effect of hidden variables, and every effort is taken to remove confounding variables from the experiment (or statistic).

Certainly, biology is a complicated beast to measure, but the biosciences are not a place for those intimidated by challenges. Abusing statistics, either through carelessness or misapplication, is the exact opposite of the scientific method—more misleading than not using them at all. Perhaps our problem is that, without much emphasis on biostatistics in curricula or the workplace, students and novice researchers are not given an adequate understanding of when to use statistics, how to use them, and what they mean, yet are encouraged to use statistics whenever possible to lend credence to their conclusions.

My hope is that if we apply statistics appropriately and transparently, as a science applied in the service of other sciences, then statistics will never again be grouped in the company of lies and damned lies.

Zachary N. Russ

Zachary N. Russ ([email protected]) is a student at the University of Maryland. He is a Goldwater scholar and has done summer research at Rice University and UC-Berkeley.