◆ Powerful

Publication Bias and the File Drawer Problem

The studies that get published are not a random sample of all studies conducted. Positive findings are published; negative findings are buried. The result is a scientific literature systematically tilted toward conclusions that support the people funding the research — and a public making decisions based on a distorted picture.

Time: 15 minutes
Requires: Unit 3.10

Opening Hook

Between 1987 and 2004, pharmaceutical companies registered 74 clinical trials with the United States Food and Drug Administration to test twelve antidepressant drugs. Those trials were legally required to be registered before they began. The FDA received the data from all of them.

If you had gone to the published scientific literature to find out what those 74 trials showed, you would have found that 94 percent of them were positive. The drugs worked. The evidence was compelling. Case closed.

Except it was not. When Erick Turner and colleagues at the FDA obtained the data from all 74 registered trials and analysed them together, the picture looked very different. Only 51 percent of the trials, by the FDA’s own assessment, showed a positive result. The other 49 percent were either negative or so ambiguous as to be meaningless. Of the 36 non-positive trials, 22 were simply not published at all. They sat in file drawers. Of the remaining 14, most were written up in a way that made the results sound positive when they were not.

The published literature said 94 percent positive. The full picture said 51 percent.

That gap is not a minor rounding error. It is the difference between a drug class with strong evidence of efficacy and a drug class with results that hover just above chance. It is the difference between clinical guidelines built on solid ground and guidelines built on a carefully selected half of the evidence. Millions of prescriptions, hundreds of millions of pounds and dollars, and the treatment decisions of a generation of patients rested on a literature that was not a fair sample of what the research had actually found.

This is publication bias. It is not a theoretical concern. It is happening right now, across multiple fields, and the people who stand to benefit from the distortion are usually the same people funding the research.

The Concept

Publication bias is the tendency for studies with positive or statistically significant results to be published in journals, while studies that find no effect or a negative result are not. Journals prefer positive findings because they are more interesting to readers. Researchers know this and, consciously or not, pursue research that is more likely to find effects. Funders, many of whom are companies with a financial stake in the outcome, have an obvious interest in which results reach the public.

The file drawer problem, named in a 1979 paper by psychologist Robert Rosenthal, is the concrete mechanism through which publication bias operates. Imagine a research area where the true effect of some intervention is zero. Researchers around the world run studies testing this intervention. By chance alone, some of those studies will find a statistically significant positive result (we explored why in Unit 3.10 on p-hacking: even with a perfectly honest process, about one in twenty studies will clear the significance threshold just by chance). Those studies get written up, submitted to journals, and published. The studies that found nothing interesting get written up, rejected for being “boring,” and filed away. Over time, the published literature fills up with positive results that are mostly statistical noise, while the filing cabinets fill up with null results that were the honest signal.

The distortion compounds through systematic reviews and meta-analyses. A systematic review is a study of studies: a researcher collects all available evidence on a question and synthesises it into an overall conclusion. A meta-analysis does the same thing quantitatively, pooling results from many individual studies to calculate an overall effect size. Both techniques were designed precisely to overcome the limitations of any single small study. But if the pile of studies they are pooling is not a representative sample of all the studies ever run on the question, the synthesis inherits the bias of the underlying literature. Garbage in, garbage out at industrial scale.

The funnel plot is the main graphical tool for detecting publication bias in a meta-analysis. To understand it, you need to understand one key relationship: larger studies are more precise than smaller ones. A trial with 2,000 participants is less susceptible to random variation than a trial with 40. When you plot all the studies in a meta-analysis, with effect size on the horizontal axis and study size (or precision) on the vertical axis, a characteristic shape should appear in the absence of bias. Large, precise studies cluster near the top of the plot, close to the true effect size. Small, imprecise studies scatter more widely at the bottom, some a little above the true effect and some a little below, because small studies are noisier. The resulting shape, when the evidence is unbiased, looks like a symmetric inverted funnel: narrow at the top, wide at the bottom, with the average effect at the centre.

When publication bias is present, the funnel loses its symmetry. Small studies with negative results are missing: they never got published, so they are absent from the plot. The small studies that do appear are the ones with large positive effects, because those are the ones that made it through the publication filter. The left side of the bottom of the funnel is empty. The funnel looks like it has been truncated on one side.

Pre-registration is one structural response to this problem. A researcher pre-registers a study by publicly recording, before data collection begins, exactly what they intend to study, what their primary outcome measure is, and how they will analyse the results. This record is time-stamped and publicly accessible. It creates accountability: if the study is later written up and published, readers can compare what was promised against what was reported. More importantly, it creates a public trail of all registered studies, so even if the results are negative and the paper is never published, the study’s existence is known. Platforms like ClinicalTrials.gov in the United States and the ISRCTN registry in the UK now host hundreds of thousands of pre-registered trials.

Registered reports go further. In a registered report, a journal reviews and provisionally accepts a study based on the research question and methodology alone, before any data has been collected. The journal commits to publishing the results regardless of what those results turn out to be. This removes the publication filter entirely: the decision to publish is made before anyone knows whether the finding is positive or negative.

Why It Matters

The antidepressant case is the most carefully documented instance of publication bias in medicine, but it is far from the only one. The same pattern has been found in trials of statins, weight-loss drugs, and antipsychotic medications. In each case, the published literature overestimates efficacy because the studies showing the drugs do not work are missing from the picture the public sees.

The dietary supplement industry is, in some ways, an even clearer example. Studies of supplements are typically smaller, less regulated, and more often funded by manufacturers with a direct commercial interest in positive results. Meta-analyses of supplement trials consistently show asymmetric funnel plots, the statistical fingerprint of publication bias. The body of published evidence for many popular supplements looks far more persuasive than the full evidence base would justify.

The mechanism is not always overt dishonesty. Most of the researchers involved are not deliberately concealing results. Researchers submit papers with null results to journals and are told the work is not interesting enough to publish. Journals face their own incentive structure: positive findings attract more citations, which improve a journal’s impact factor, which determines where it sits in academic hierarchies. The individual decisions that produce publication bias can each seem reasonable. The systemic effect is that the published literature becomes a systematically biased sample of reality.

If you are making a decision based on the scientific consensus in a field, and that consensus is based on a literature that shows only the positive results, you are making your decision on incomplete evidence. You do not know it is incomplete, because the missing studies are invisible. This is why publication bias matters more than almost any other statistical manipulation covered in this curriculum: it distorts the entire information environment, not just one number in one article.

How to Spot It

The asymmetric funnel plot is the primary tell. When a published meta-analysis includes a funnel plot, look at it. If the small studies cluster mainly on the positive side of the effect, with few or no small negative-effect studies visible in the bottom left, you are looking at the fingerprint of publication bias. A symmetric funnel, with small studies scattered on both sides, is what honest evidence looks like.

Not all meta-analyses include funnel plots. If there is no funnel plot, ask whether there is any discussion of publication bias at all, and whether the authors made any effort to locate unpublished studies. Systematic reviews that take publication bias seriously will describe a search of clinical trial registries as well as published literature, attempt to contact researchers in the field for unpublished data, and may report a statistical test for funnel plot asymmetry such as Egger’s test (a regression of the effect estimate against its standard error, where a significant result suggests asymmetry). If a meta-analysis mentions none of these things, treat its conclusions with appropriate scepticism.

The antidepressant case gives us a clear documented example. Turner et al., published in the New England Journal of Medicine in January 2008, compared the published literature on twelve antidepressants against the complete set of trials registered with the FDA. The effect sizes reported in the published literature were inflated by 32 percent on average compared to the effect sizes when all FDA-registered trials were included. For individual drugs, the inflation ranged from 11 to 69 percent. Studies that the FDA had judged non-positive were approximately twelve times less likely to be published in a way that conveyed a negative result than studies the FDA had judged positive were to be published with a positive result. The funnel plot for the published data alone showed clear asymmetry. The funnel plot including all FDA-registered trials was considerably more symmetric.

Irving Kirsch’s parallel work, using data obtained from the FDA under freedom-of-information law, reached a similar conclusion about the magnitude of the distortion: when all submitted data was included rather than only published data, the overall drug-versus-placebo effect for antidepressants was below the threshold that most clinical guidelines would consider clinically meaningful. The published literature had been telling clinicians and patients one story. The complete evidence base told a different one.

Your Challenge

A meta-analysis combines 30 studies examining the effect of a dietary supplement on cognitive performance. The overall result is positive: pooled across all 30 studies, the supplement shows a statistically significant improvement in test scores. The funnel plot shows that the 15 largest, most precise studies are symmetrically distributed around the overall effect estimate. But among the 15 smaller studies, almost all appear on the positive side of the effect. Only two small studies show a near-zero or slightly negative effect; the rest show large positive effects.

What does the asymmetric funnel plot suggest about the published evidence? If the missing small studies were recovered and included in the analysis, in which direction would you expect the pooled effect estimate to move, and why? What would your overall level of confidence in the supplement’s effectiveness be, compared to a naive reading of the published result? What additional steps would you take before accepting the positive conclusion?

There is no answer on this page. That is the point.

References

Turner, E.H., Matthews, A.M., Linardatos, E., Tell, R.A., and Rosenthal, R. “Selective Publication of Antidepressant Trials and Its Influence on Apparent Efficacy.” New England Journal of Medicine 358 (2008): 252–260. https://www.nejm.org/doi/full/10.1056/NEJMsa065779

Kirsch, I., Deacon, B.J., Huedo-Medina, T.B., Scoboria, A., Moore, T.J., and Johnson, B.T. “Initial Severity and Antidepressant Benefits: A Meta-Analysis of Data Submitted to the Food and Drug Administration.” PLOS Medicine 5, no. 2 (2008): e45. https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0050045

Rosenthal, R. “The File Drawer Problem and Tolerance for Null Results.” Psychological Bulletin 86, no. 3 (1979): 638–641. The foundational paper naming the file drawer problem.

Egger, M., Davey Smith, G., Schneider, M., and Minder, C. “Bias in Meta-Analysis Detected by a Simple, Graphical Test.” BMJ 315 (1997): 629–634. The Egger test for funnel plot asymmetry. https://www.bmj.com/content/315/7109/629

Sterne, J.A.C., and Egger, M. “Funnel Plots for Detecting Bias in Meta-Analysis: Guidelines on Choice of Axis.” Journal of Clinical Epidemiology 54 (2001): 1046–1055. Methodology for interpreting funnel plots.

Centre for Open Science. “Registered Reports.” https://www.cos.io/initiatives/registered-reports. Overview of the registered reports model and participating journals.

Chambers, C.D. “Registered Reports: A New Publishing Initiative at Cortex.” Cortex 49 (2013): 609–610. The paper launching the first registered reports programme at a major journal.