Author: Dr Matt Alderdice, Head of Data Science | Reading time: 5 Minutes
What do we mean by reproducibility?
Have you ever been unable to reproduce a finding from another researcher's study? Reproducibility in the context of scientific experimentation is the ability of independent researchers to arrive at the same conclusions from an experiment by following the documentation provided by the researchers of the original study. Replicability and repeatability are two related terms which are often used in the context of biomarker discovery and development. For this article, we will focus on reproducibility as defined above.
Reproducibility is a key driver of scientific progress, as when a scientific discovery can be reproduced, it enables a new concept or paradigm to be accepted within the community. Unfortunately, when scientists do not provide enough information for others to reproduce their findings to an acceptable level, we can end up with scientific stasis.
This has now become a systemic problem and is known by many as the scientific reproducibility crisis. This article aims to explore the causes, impact and potential solutions to the scientific reproducibility crisis with a particular focus on scenarios involving biomarker discovery and algorithm development.
What is the reproducibility crisis?
It is widely accepted that we are experiencing a reproducibility crisis across many core scientific fields. Dr John Ioannidis is credited with coining the term in the 2010s to help raise awareness around the community's systemic failure to replicate others' scientific findings.
Initially discovered in psychological studies within social sciences, the term has gained traction elsewhere. There are now major efforts to rethink how scientific discovery is performed across all scientific fields.
A 2016 paper published in Nature stated, "More than 70% of researchers have tried and failed to replicate another scientist's experiments, and more than half have failed to reproduce their own experiments”. More recently, in 2021, The Reproducibility Project: Cancer Biology showed that of 193 experiments from 53 top papers about cancer published between 2010 and 2012, only 50 experiments from 23 papers could get replicated.
Moreover, it showed that the effect sizes of that fraction were 85% smaller on average than the original findings. These findings might go some way to explain why so many drugs don't make it to market. It may also explain why so many biomarkers fail to make it to the implementation stage.
What does the crisis mean for healthcare organisations?
For biotechnology and pharmaceutical companies, it is essential to reproduce their scientific discoveries. Without reproducible scientific studies throughout the discovery and validation phases, how can we be sure a drug is genuinely having the desired effect or whether a companion diagnostic is selecting the correct patients for therapy?
When clinical trials fail, we often see share prices plummet and damaging media scrutiny and a loss of confidence in the product. It's worth thinking about what could have been done earlier on to prevent such losses. Companies like Amgen Inc. have helped lift the curtain by creating a journal for researchers to publish their unsuccessful replication attempts to reproduce studies. This includes one done by its scientists some years before.
Analyzing replication failures is one way of understanding and learning from our mistakes. In the next section, we are going to explore some of the potential causes of the crisis.
Top Tips for Successful Biomarker Discovery & Development
For those beginning their biomarker journey - to those who have already embarked and need further guidance on overcoming common challenges.
What is causing the reproducibility crisis in biomarker discovery?
In this section, we will highlight some of the common pitfalls of scientific discovery leading to irreproducibility.
- Sample size - Biomarkers for Precision Medicine are often discovered and validated in a sample of selected patients. The sample size is often small, and the statistical power is lacking. This leads to results which represent overfitting to your data.
- Sampling Technique - The sample of the population you have taken may not be representative. Thus, the discoveries in the sample set may not hold in the actual population.
- P-hacking - is a questionable research practice also-known-as inflation bias or selective reporting. It may be intentional or unintentional and is characterised by the misuse of statistical tests to represent the strength of a relationship which may not exist or be less than reported.
- Pressure to publish - academic supervisors and eager stakeholders can pressure scientists to publish data that might be ‘undercooked’. Pressure to publish can equate to findings which are simply artefacts which can never be reproduced.
- Data and Documentation - Perhaps the biggest reason is the lack of documentation and data access during the discovery phase. Without these successful replication studies are pretty much impossible.
How can we help drive reproducible scientific discoveries?
- At the beginning of your project, perform a power analysis to determine whether you have enough samples to statistically power a validation study.
- Consult statisticians to understand and verify your choice of statistical analyses. Having your analysis plans reviewed by others adds another layer of rigor to your study.
- Define your target population and sampling technique to collect the right samples and validate the algorithm's intended use.
- Provide documentation for your data so that people understand why it was collected, who it was collected for and known limitations. Datasheets for Datasets - A seminal study published in 2018 by former Google AI ethicist Timnut Gebru highlights the need to provide documentation alongside your data. Whilst written for machine learning practitioners, the sentiment in this study holds true for biomarker discovery, and doubly so for projects spanning both.
- Pick the right metrics for success. In Precision Medicine, the right metric is found by speaking with clinicians and subject matter experts, those who understand the disease and where the risk lies for the patient. Don’t just pick the default metric; the accuracy of your classifier might look good on the surface, but what does it mean in the context of your data? This is where understanding the concepts and application of statistical analyses such as response/survival analyses, can allow the impact of your study to be modelled in the clinical context.
By addressing some of the known issues contributing towards the reproducibility crisis, we will undoubtedly drive breakthroughs in Precision Medicine. We will see more drugs get to market, the creation of advanced molecular stratification strategies for patients and hopefully long term, better outcomes.
Need some advice?
Meet our expert Biomarker team
End-to-end data solution for Biomarker Discovery
We have solutions to improve every stage of your pipeline lifecycle and an expert team excited to support you. Achieving excellence has never been easier.
- The reproducibility “crisis” - PMC
- John Ioannidis - Wikipedia
- Why Most Published Research Findings Are False - Wikipedia
- Replication crisis - Wikipedia
- 1,500 scientists lift the lid on reproducibility | Nature
- The fundamental principles of reproducibility | Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
- A manifesto for reproducible science