AI-powered Single-cell analysis & automated cell annotations

Author: William Guesdon, Senior Bioinformatician at Sonrai Analytics

Updated: 26/02/2024

William Guesdon

William Guesdon

Senior Bioinformatician at Sonrai Analytics

William joined Sonrai Analytics as a Senior Bioinformatician in October 2022, bringing with him a wealth of expertise in bioinformatics. In his current role, he aids clients in designing studies with a focus on qPCR, Single-cell, and Flow cytometry data. He is also responsible for developing scripts and algorithms that deepen customer understanding by providing advanced biological data analysis. He contributes to innovation at Sonrai Analytics by developing proof-of-concept scripts for new applications on the Sonrai Discovery platform. Before joining Sonrai, William served in various capacities including as a Scientist/Bioinformatician at 4D Pharma, where he specialized in flow cytometry analysis; as a Senior Scientist at Pencil Biosciences; and as a Research Associate at King's College London. He earned his PhD in Immunology from the University François Rabelais of Tours and has amassed 12 years of experience in the field of biological data analysis.

Data types in this use case: Single-Cell RNA-Seq

Single-Cell Analysis in Drug Discovery and Development

Single cells represent a pivotal innovation with a high-impact potential for drug discovery and development. High-throughput single-cell sequencing offers unparalleled insights into the complexities of cellular functions and heterogeneity. By allowing the analysis of thousands of cells, it offers unparalleled insights into diverse cell types’ roles in health and disease. This cutting-edge approach facilitates a deeper understanding of disease mechanisms at an individual cell level, significantly contributing to the drug discovery process from biomarker discovery to mode of action study, clinical trial design through patient stratification, and treatment response monitoring.

Challenges of Single-Cell Analysis

Single-cell analysis poses different analytical challenges than bulk sequencing due to the sparsity of the data and specific sources of technical noises. There is also a risk of interpretation bias, leading to less reliable and less reproducible results.  

The main challenges of single-cell analysis are presented below.

  • High Dimensionality: Single cells dataset measures thousands to tens of thousands of genes per cell.
  • Technical Noise: Single-cell data can have a higher level of technical noise compared to bulk RNA-seq data.
  • Batch Effects: Variations in sample preparation and data generation across different batches can introduce unwanted variations that can confound analysis.
  • Cell Assignment and Clustering: Determining the identity of cells and clustering them into meaningful groups based on their gene expression profiles can be challenging, especially given the high dimensionality and noise inherent in single-cell data. Annotation of rare cell populations is also a specific challenge.

Accelerating Single-Cell Analysis with Sonrai's AI Technology

Single-cell analysis and cluster annotation can be challenging and time-consuming. Sonrai leveraged the latest advances in single-cell databases and large language models (LLMs) to automate this process and generate reports for easy validation by domain experts.

If you'd like to talk to Will or another expert from our team, reach out to us

Contact our friendly team for expert guidance and transformative insights.

Sonrai's Strategy

Using Scanpy, LLMs and single-cell databases, we provided a workflow to normalize, cluster and annotate single-cell data for our customer. This can allow the customer to save a significant amount of time spent on analysis, leading to faster insight generation.

Automated Cluster Annotation

Using LLMs, we generated automated annotations and reports for review by domain experts. The automated process enabled faster cluster annotation, reducing the time needed for single-cell analysis and minimizing interpretation bias.

Based on the gene list provided, the brown cell cluster can be labeled as “Mature B Cells”. LLM automatically generated the report.


MS4A1 (CD20), CD79A, CD79B, CD19, CD22, CD37: These genes encode proteins that are part of the B cell receptor complex or are involved in B cell signaling, supporting the conclusion that these are B cells.

IGHM: This gene encodes the mu heavy chain of the B cell receptor, again suggesting that these cells are B cells.

HLA-DRA, HLA-DQB1, HLA-DPA1, HLA-DRB1, HLA-DQA1, HLA-DRB5, HLA-DPB1, HLA-DQA2, HLA-DQB2: These genes encode components of the MHC II complex, which is characteristically expressed on antigen-presenting cells, including mature B cells.

CD74: This gene is involved in antigen presentation and is typically expressed in B cells.

PAX5, BCL11A, IRF8: These are transcription factors involved in B cell development, consistent with the identity of mature B cells.

BANK1, FGD2, ADAM28, FCRL1, CD83: These genes are associated with B cell signaling and function, further supporting the conclusion of these cells being B cells.

Cluster Analysis: Responder vs Non-responder

Cluster Analysis

Using our automated clusters annotation workflow, we could quickly extract biological insights on patients’ response to Immune Checkpoint inhibitor therapy by comparing key cell proportion and key gene expression between responding and non-responding tumors.


Applying AI to single-cell analysis can accelerate the road to discovery by automating data processing and interpretation, as well as improving the reproducibility of results.  In addition, data can be easily shared and visualized, helping researchers interpret the data correctly and reliably.


Dataset from: Sade-Feldman, M., et al. (2018)

Sade-Feldman, Moshe, et al. “Defining T Cell States Associated with Response to Checkpoint Immunotherapy in Melanoma.” Cell, vol. 175, no. 4, 2018, pp. 998-1013.e20, doi:10.1016/j.cell.2018.10.038.

Discover how companies worldwide grow with Sonrai. Explore all our case studies.

Target Identification and Validation

Discover how Sonrai empowered a pharmaceutical company to identify, validate, and confirm therapeutic protein targets for precision medicine.

AI Biomarker Discovery Workflow

Explore an end-to-end AI biomarker discovery workflow using Sonrai Discovery. Discover just how easy it can be.

AI in Clinical Trials

How we help with AI in clinical trial support identifying a panel of predictive candidate biomarkers and the drug mechanism of action.

Get in touch

Like What You See? Let's Talk

We Listen to your problems
We give you confidence to make your decision