16 Free Bioinformatics Tools for Biomarker Discovery

16 Free Bioinformatics Tools for Biomarker Discovery

Sharing Knowledge

Free Bioinformatics Tools for Biomarker Discovery

Subscribe to stay up to date with the latest Sonrai content.

Author: Joy Kavanagh, Diagnostic Programme Manager | Reading time: 8 Minutes

In this article, we will explore:

The widening skills gap among research scientists
The reality for research scientists today
Open-source bioinformatics tools list

The Widening Skills Gap Among Research Scientists

Advances in molecular technologies and high throughput instrumentation have opened a vast array of possibilities for life scientists to generate large and complex data sets as they pursue a deeper understanding of human disease biology and the discovery of safer and more effective treatments.

At the same time, such advances sparked a data revolution, inexorably changing the requirements for biological data storage, management, analysis and interpretation. While technology has raced ahead, globally the training of sufficient skilled data scientists to meet the demand has lagged behind, resulting in a widening skills gap among life scientists [1], [2].

Observing this growing trend several groups embarked on surveys aimed at better understanding the widening skills gap among research scientists and seeking viable solutions to the problem. All found that common among respondents was a lack of confidence in the ability to use bioinformatics tools and statistical methods [3].

The resounding opinion, when faced with this evidence, was that in order to meet the ever-increasing demand for STEM researchers with a depth of knowledge and skill that a fundamental training shift was required starting from the bachelor level (or earlier) and upward to allow such skills to develop in tandem with learning in the underlying biology. Other high-profile articles highlighted the issue, stressing that in order to prevent research from stalling, there was a need for biologists to learn bioinformatics skills thus widening the pool of applied bioinformaticians [4].

Five years on and a shortfall of ~250,000 data scientists [5] means almost 50% of biomedical data scientist positions go unfilled in academia and industry. The reality is that creation or adaptation of training programs that integrate data science with biomedical sciences to narrow that skills gap is a long-term strategy that is yet to bear fruit [6].

Biomarker Tips

Top Tips for Successful Biomarker Discovery & Development

For those beginning their biomarker journey - to those who have already embarked and need further guidance on overcoming common challenges.

So, where does that leave researchers today?

The reality for many, who seek to break new ground in the understanding or treatment of human diseases, is a requirement for complex data analysis that is beyond their skill set or available resources. Sadly for some, this barrier can feel insurmountable, while for others with time and funding on their side, the hurdle can be overcome. The scenarios below will be all too familiar:

A data scientist colleague is available to collaborate on the project, however, the translation of your research objectives into tangible outputs requires significant time investment on both sides and in some cases many iterations of analysis methods to obtain the crucial results.
The data scientist in your group is already overwhelmed with projects and unable to assist (and most likely face challenge A above).
The data scientist in your group is already overwhelmed with projects and unable to assist (and most likely face challenge A above).
The budget required to fund the appropriate support is not available and the research is shelved.

How can we address the immediate problem? What if more life scientists were empowered to meet their own data analysis needs independently? Examples of the types of multimodal data processing and analysis you might need to perform:

Secure clinical data storage
Data cleaning, data mangling, data visualization
Analysis of processed data
Differential gene expression
Pathway Analysis
Machine Learning (Clustering and Classification)
Survival analysis e.g. Kaplan Meier
Multiomics analysis
Digital pathology analysis

Not surprisingly, many data analytics tools have emerged that automate one or many of these analyses, helping researchers to navigate data analysis independently. Each table below provides an overview of some open-source tools and their main capabilities.

Researchers: With access to such powerful tools, a large and multifaceted raw data set can be transformed into presentation-ready outputs from minutes to hours. This fast turnaround could make the difference in making an abstract or grant deadline that helps fund the next stage of the research or bring an important discovery one step closer to the clinic.

Academic and industry directors: Access to powerful, user-friendly, and secure data analysis tools removes barriers and enables groundbreaking research. Imagine the impact that fast, cost-effective multi-omics data analysis would have on your department’s outputs. 

Sonrai Can Help

If your team does not have the time or expertise to use the tools in the tables below - get in touch with Sonrai; we can perform all the functions in one easy-to-use cloud-based Biomarker Discovery Platform.

Selected open-source bioinformatics tools and their main capabilities


Data cleaning & Visualization

  Tool Summary Skill Requirement
  OpenRefine Previously Google Refine, OpenRefine is a well-known open-source data tool. Its main benefit is being open source. It lets you transform data between different formats and ensure that data is cleanly structured. While OpenRefine streamlines many complex tasks (e.g. using clustering algorithms) it does require a little bit of technical know-how.
  Tidyverse The tidyverse is a comprehensive collection of R packages which enables the generation of tidy data. A powerful toolkit but thus requires programming skills
  Pandas Similar to the R Tidyverse this python package is a stablemate for data cleaning and transformation tasks This requires knowledge of python to use
  matplotlib Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Technical users only
  R Shiny A web application framework for creating interactive dashboards and data visualizations. Programming skills needed

Bioinformatics & Machine Learning

Analysis Type Tool Summary Skill Requirement
Differential gene expression DESeq2
This tool enables users to perform differential gene expression analysis on data from a variety of platforms from microarray to next-generation sequencing. A hugely popular tool but requires basic programming skills to use
Survival  Analysis Log Rank Test Compare survival in different groups of patients. The Log-Rank test is automatically performed when you compare groups. The p-value and the survivals of each group are calculated and provided for inclusion in your article. Easy-to-use web interface but for advanced Survival analysis alongside molecular data technical skills are required
Pathway Analysis GSEA Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states. Another hugely popular tool with a basic user interface. R and Python packages are available for technical users.
Machine Learning scikit learn A python toolkit for performing classification, regression and dimensionality reduction techniques. Technical users only
Data Mining Orange
Open-source software for data mining and machine learning workflows. Programming skills needed
Deep Learning PyTorch PyTorch is an open-source machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook’s AI Research lab. Requires in-depth knowledge of python and concepts such as neural networks.


  Tool Summary Skill Requirement
  CBIO Portal Enables interactive exploration of multidimensional cancer genomics data sets. The goal of cBioPortal is to significantly lower the barriers between complex genomic data and cancer researchers by providing rapid, intuitive, and high-quality access to molecular profiles and clinical attributes from large-scale cancer genomics projects, and therefore to empower researchers to translate these rich data sets into biologic insights and clinical applications. Easy to use graphical interface however API user requires technical skills for advanced queries
  Firebrowse Developed at the Broad Institute of MIT and Harvard, Firebrowse was created to cull and analyze data generated by The Cancer Genome Atlas (TCGA), which characterizes and identifies genomic patterns in human cancer models. FireBrowse provides access to a variety of cancer genomics data, such as clinical annotations, DNA copy number, miR, miRseq, mRNA and mRNAseq; as well as a comprehensive suite of more than 100 interdependent analyses of those data, including correlations, clustering, and GISTIC and MutSigCV. Technical skills are required to make the most of the data on this platform

Digital and Computational Pathology

Tool Summary Skill Requirement

Designed by Pete Bankhead at the Queen’s University Belfast QuPath is designed specifically to analyze WSI. Its primary use is biomarker analysis/ IHC quantification (whole slides and tumour microarrays), but it has also been used for tumour analysis on H&E. Easy to use for image viewing however for computational pathology technical skills are required
ImageJ ImageJ is probably the best-known and longest-lived open-source software for biomedical image analysis. Even though the program is so widely used, ImageJ is an experimental system and NIH does not assume any responsibility for its use by other parties. For use in a regulated environment, separate validation and verification would be required by the end user.
ilastik Ilastik helps users perform segmentation and classification of 2, 3 and 4D images in a unified way. Through a random forest classifier, ilastik learns from labels provided by the user through a convenient GUI. Based on these labels, ilastik applies problem-specific segmentation. Easy-to-use. Created for users without expertise in image processing

Need some advice?

Meet our expert Biomarker team

We Listen to your problems
No hard sales conversations
We give you confidence to move forward


[1] BBSRC and MRC Review of vulnerable skills and capabilities, 2017.

[2] ‘Bridging the skills gap in the biopharmaceutical industry’, ABPI, 2015.

[3] Attwood, T.K. et al, Briefings in bioinformatics, 20(2), 2019, 398-404. 

[4] Chang, J. Core services: reward bioinformaticians. Nature 2015;520:151–2.

[5] The Data Scientist Shortage in 2020

[6] Bourne, P.E. PLoS Biol. 2021 Mar; 19(3): e3001165. 

[7] National Careers Service Data Scientist


End-to-end data solution for Biomarker Discovery

Get in touch

Like What You See? Let's Talk

No hard sales conversations
We Listen to your problems
We give you confidence to make your decision
Related Posts