Most Scientific Papers Are Wrong: The Need for Live Papers and Ongoing Peer Review


Most Scientific Papers Are Wrong: The Need for Unit Testing, Live Papers and Ongoing Peer Review by a Paper's Users

Lessons from Industry

 

Of the times I remember reading a scientific paper in depth, I think I have found errors more times than not. And this is without counting errors which are undetectable with the information available in the paper itself. Most of these errors are not easily detectable --they have come up when I have studied a paper to present at journal club, for example. I don't mean typos --the errors I am talking about include conceptual errors that make the conclusions of a paper wrong. And I don't mean papers published in unknown journals, but ones published in some of the most well-respected ones, like Nature.

 

Why are so many papers flawed? The increasing complexity of scientific methods calls for most contemporary papers to rely on a host of measures, and software programs which call software programs to compute these measures. This causes errors to be compounded, and increases the probability of errors in the final conclusions. In brains, propagation of errors is prevented by a host of active mechanisms of error correction (see The Digital Brain, coming soon). A number of interesting contrasts exist between science and industry in this domain. Several of the mechanisms used to correct errors in industry could do much for science:

 

  1. Unit testing: most complex programs rely on other programs which themselves rely on others. Error propagation is prevented by a process called unit testing, whereby each module, or unit, is tested with simulated inputs whose expected outputs are known. In science, however, this practice is not widely used, many procedures are developed for one-time use for a paper and never properly tested, and procedures carried out for which no expected result is known.
  2. Ongoing review: Scientific papers today use one-time peer review at the time of publication for quality control. In contrast, commercial products are subject to ongoing testing.
  3. Incentives for error correction. Referees of scientific papers have little or no incentive to spend time finding errors in a manuscript. A recent experiment by Nature with voluntary referees for manuscripts posted online was terminated after finding that few scientists took the time to post unsolicited reviews, much less meaningful ones. But this was *before* publication in the journal --and thus before a real user base emerged for any of the papers in question who were qualified and motivated to evaluate the correctness and significance of the paper. In industry, testing is carried out by QA experts, who get paid to detect errors, and clients, who have a strong incentive to get correct results.
  4. Patches and updates. When errors *are* detected in science, the corrections often get lost in the literature, and readers of the original paper continue to read the version with the error(s). In industry, in contrast, users get updates and patches with the latest and best known version.

 

To address this problem, I propose that scientists adopt some of these practices that have worked so well in industry:

  1. Each method used in a paper should get tested with unit tests that yield known results, and shown to yield the expected results before being used for situations where there is no expected result. Such unit tests should form part of the supplementary material presented online for publication.
  2. Review/refereeing should switch from a one-time event at the time of publication to publication without censure and continuous evaluation, reviews and comments for the life of the paper. In the time of free electronic dissemination, information overload and search engines, what matters is not dictating which papers can get published, but which the top-ranked papers in response to any given query should be.
  3. Reviewing and evaluation should be opened not to 2 or 3 competitors of the authors, but to everyone. In particular, the 'users' of a scientific work, i.e. those who try to use it for subsequent work, should be allowed to comment on the works' validity, since they have the greatest understanding of it and have the greatest incentive in ensuring its correctness.
  4. Static papers should be abandoned for live documents which contain the latest version of a work, including errata, comments by others and the most up to date evaluation by the community.

 

Alex Bäcker, Jan. 1st, 2007