AI-powered Single-cell analysis & automated cell annotations

Author: William Guesdon, Bioinformatician at Sonrai Analytics

Single-Cell Analysis in Drug Discovery and Development

Single-cell analysis provides insight into cellular mechanisms, processes, and heterogeneity within cell populations. It’s an important analytical tool in the drug development process, from target identification and validation, MoA, biomarker discovery to clinical trial design through patient stratification, and to treatment response monitoring.

Accelerating Single-Cell Analysis with Sonrai's AI Technology

Single-cell analysis and clusters annotation can be challenging and time-consuming. Sonrai leveraged the latest advances in single-cell databases and large language models (LLMs) to automate this process and generate reports for easy validation by domain experts.

Challenges of Single-Cell Analysis

Single-cell analysis poses many challenges due to low throughput and inconsistency due to errors and mistakes, which can lead to important omissions. There is also a risk of interpretation bias, leading to less reliable and less reproducible results.  

The main challenges of single-cell analysis are presented below.

  • High Dimensionality: Single cells dataset measures thousands to tens of thousands of genes per cell.
  • Technical Noise: Single-cell data can have a higher level of technical noise compared to bulk RNA-seq data.
  • Batch Effects: Variations in sample preparation and data generation across different batches can introduce unwanted variations that can confound analysis.
  • Cell Assignment and Clustering: Determining the identity of cells and clustering them into meaningful groups based on their gene expression profiles can be challenging, especially given the high dimensionality and noise inherent in single-cell data.

Do These Challenges Sound Familiar?

If you need advice on the challenges you're facing, our team are ready to help

Sonrai's Strategy

Using Scanpy, LLMs and single-cell databases, we provided a workflow to normalise, cluster and annotate single-cell data for our customer. This can allow the customer to save a significant amount of time spent on analysis, leading to faster insight generation.

Automated Cluster Annotation

Using LLMs, we generated automated annotations and reports for review by domain experts. The automated process enabled faster cluster annotation, reducing the time needed for single-cell analysis and minimising interpretation bias.

Based on the gene list provided, the brown cell cluster can be labelled as “Mature B Cells”. LLM automatically generated the report.


MS4A1 (CD20), CD79A, CD79B, CD19, CD22, CD37: These genes encode proteins that are part of the B cell receptor complex or are involved in B cell signalling, supporting the conclusion that these are B cells.

IGHM: This gene encodes the mu heavy chain of the B cell receptor, again suggesting that these cells are B cells.

HLA-DRA, HLA-DQB1, HLA-DPA1, HLA-DRB1, HLA-DQA1, HLA-DRB5, HLA-DPB1, HLA-DQA2, HLA-DQB2: These genes encode components of the MHC II complex, which is characteristically expressed on antigen-presenting cells, including mature B cells.

CD74: This gene is involved in antigen presentation and is typically expressed in B cells.

PAX5, BCL11A, IRF8: These are transcription factors involved in B cell development, consistent with the identity of mature B cells.

BANK1, FGD2, ADAM28, FCRL1, CD83: These genes are associated with B cell signalling and function, further supporting the conclusion of these cells being B cells.

Cluster Analysis: Responder vs Non-responder

Cluster Analysis

Using our automated clusters annotation workflow, we could quickly extract biological insights on patients’ response to Immune Checkpoint inhibitor therapy by comparing key cell proportion and key gene expression between responding and non-responding tumors.


Applying AI to single-cell analysis can accelerate the road to discovery by automating data processing and interpretation, as well as improving the reproducibility of results.  In addition, data can be easily shared and visualised, helping researchers interpret the data correctly and reliably.


Dataset from: Sade-Feldman, M., et al. (2018)

Sade-Feldman, Moshe, et al. “Defining T Cell States Associated with Response to Checkpoint Immunotherapy in Melanoma.” Cell, vol. 175, no. 4, 2018, pp. 998-1013.e20, doi:10.1016/j.cell.2018.10.038.

Get in touch

Like What You See? Let's Talk

We Listen to your problems
We give you confidence to make your decision