Advanced Data Management and Analytics in Clinical Trials


This use case below showcases the capabilities of Sonrai Discovery through a synthetic, first-in-human Phase I clinical trial, focusing on Tyzolimab, a hypothetical anti-PD-L1 antibody, for treating Advanced Esophageal Squamous cell carcinoma (ESCC). Designed to mimic a real-world clinical trial, it involves 40 patients split into dose escalation and exploration groups, testing Tyzolimab alongside standard chemotherapy.

The primary objectives are assessing the drug’s safety, tolerability, maximum tolerated dose, pharmacodynamics, and pharmacokinetics. This synthetic trial not only highlights the platform’s ability to handle complex clinical data but also serves as an educational example of efficient trial design and data analysis using modern technology. It underscores the importance of advanced data management in medical research, offering insights for researchers and clinicians in developing new therapeutic strategies.

Synthetic Test Case:

 First-in-human, phase I study of Tyzolimab in patients with advanced esophageal squamous cell carcinoma (ESCC)


The drug, data, and study design in this workflow is purely fictitious, and has been developed to demonstrate the capability of the Sonrai Discovery platform to handle equivalent real-world clinical data.

Introduction: Study Design

This synthetic phase I data is a first-in-human Phase I study of Tyzolimab, a novel subcutaneous single‐domain anti‐PD‐L1 antibody, in patients with advanced esophageal squamous cell carcinoma (ESCC). This trial consisted of 2 groups composed of 40 patients, all patients were on the standard care for ESCC, which included a cisplatin (CP) and 5-fluorouracil (5FU)-based regimen. A potential dosage regimen for these could be cisplatin (80 mg/m2) administered on day 1 and 5-fluorouracil (800 mg/m2) administered continuously on days 1-5, every 4 weeks. The median age was 60.5 years (range: 36-79 years). 

Our endpoints included safety and tolerability, maximum tolerated dose,  pharmacodynamics and pharmacokinetics.

Dose escalation (20 patients):

The dose escalation group followed a modified version of a traditional “3+3” design. This was a single ascending dose trial, with each set of patients only getting the dose listed below once. If however, a grade 2 drug related adverse event was observed, an additional 2 patients would be enrolled and administered the same dose. The doses were as follows, and given once every 4 weeks:

Dose (mg/kg)


















Table 1. Theoretical breakdown of our dose escalation patient numbers

Dose Exploration (20 patients):

The dose exploration was a fixed dose arm consisting of administering 300 mg subcutaneously (SC) to 20 patients every 4 weeks to determine the impact of consistent administration of Tyzolimab via a subcutaneous injection. Whereas the escalation can be used to determine the impact of increasing the dose.


This tutorial is not intended as an introduction to the other functionalities of the platform (e.g. Role based access controls, the concepts of projects). However, this section is designed to help inform the user of the different elements, such as specific tabs on apps, and selection of different datasets throughout the workflow that the user will need. Firstly, the data in this project will belong to a “dataset” which in turn belongs to a parent “project”. We will launch analytics from a project, and from here, select different datasets. This is illustrated below:

Figure 1. The project page for the phase I synthetic data, which is associated with a dataset. The data within the dataset associated with the project can be interrogated by clicking on the Analytics button on the top right portion of the projects page.

Having selected our project, in this case, “Phase I synthetic datasets’ we can click on “Analytics”. In the top right corner, which will bring us to a blank storyboard webpage. From here, we can click on the blue button in the top right corner titled “Manage  Settings” and once in here, navigate to the “Data” tab (fig. 2). This will allow us to select the right dataset prior to running our analysis. Each dataset required is mentioned in the respective section of the workflow.

Figure 2. This demonstrates the selection of the necessary datasets by navigating to the “manage settings” portion of the analytics web page and then the Data tab within this. The user will use 4 different datasets in the course of the main workflow.

Once we have selected our dataset, we can click anywhere on dark portion of the webpage to close the settings, and then click on “Build Storyboard” from which we can select different apps by checking the box for each app in the order we would like them presented (or click the image of the app for an explanation as to its function), we can then click the dark green “launch” button at the bottom right of our panel.

Figure 3. The storyboard that the user is greeted with upon clicking “Build Storyboard” having navigated to the analytics page from the original project. Each app desired (up to 4) should be checked in the order the user wants them to appear, and then click the launch tap to start the storyboard

When we launch these apps, there are a variety of tabs at the top right of each app which segregate different functionalities to improve usability. These are generally shared across the apps and often have tooltips to remind the user of their name and function. This tutorial will often refer to these tabs. Fig. 4 provides the ones used in this tutorial below

Figure 4. The different tabs that we may see at the top right corner of each app. Note that the number of tabs available will vary depending on the app. And will be adjacent to each other horizontally rather than vertically presented here.

Do These Challenges Sound Familiar?

Contact our friendly team for expert guidance and transformative insights.

Part 1: Example exploration of Pharmacokinetic Profiles

One of the main elements of a phase I clinical trial dataset a researcher may want to investigate are the pharmacokinetic profiles between patients or doses. Pharmacokinetic data provides valuable insight into optimizing the dose and route of administration, measuring excretion, individualizing treatment, and understanding safety. 

We can obtain an initial overview of our data using the table viewer app in our storyboard application to understand the contents of our data, in this case, we can see we have individual patients whose concentration of Tyzolimab in the blood is taken every 0.17 days (4 hours), the dose administered, if they belong to the escalation or exploration cohort, and if they belong to the exploration cohort, which group within this (1 or 2).

Figure 5. Visualization of the top 10 rows of our pharmacokinetic data using the table viewer app.

After an initial look at the tabular data, we can now deploy the line chart app to help visualize this data, choosing the variables tab (denoted by an asterisk) we can then select the X-axis as the Time_days and Y-axis as Concentration_ugL. After clicking apply, we obtain the following:

Figure 6. Initial visualization of the overall pharmacokinetic profile of our 40 patients in our clinical trial using the line chart app.

Although this plot gives us an initial understanding of the pharmacokinetic profile (for example, a visual approximation of Cmax at ~4 days, and half-life at around ~7 days), the researcher may want to perform more in-depth analysis using Sonrai discovery, such as:

  1. Investigate the pharmacokinetic profiles depending on dose: By navigating to the variables tab, we can select “Plot individual traces” under additional options, and in the new dropdown, select “Dose” before clicking apply. 
  2. Better determine the Cmax  and half-life: Due to the interactive nature of these plots, we can easily hover  over the portions of the curves which correspond to their Cmax values, and half-life values. 
  3. Add additional context to the plot (X and Y labels, and headings): By navigating to the Customize Tab, we can select a wide range of features to alter the aesthetics of our plots. Such as opacity, the points markers themselves, and axes. To add titles and labels, simply fill in the “Title and Axis Labels” section to to include headers and axis labels. 
  4. Apply a visual representation of the error or uncertainty: A key consideration in working with the pharmacokinetic data is understanding the spread of data (standard deviation) and variation of the sample mean from the true population mean (standard error). By going to the Additional Variables tab, we can select “Plot Error bars” under the Error Analysis section, and then choose to visualize sample standard deviation, population standard deviation, or standard error. In this case, I have opted to visualize standard error to understand how my curves might different from the population mean.  
  5. Perform statistical analysis on a subset of the dose groups: From the curves, we can see a distinction between the 2.5, 5, and 10 mg/kg groups in the escalation group. From here, we may wish to determine if there is a statistically significant difference in their curves, to do so – we can use the Statistical Tests tab and select, in this case, an ANOVA (assuming the statistical assumptions underlying this test are met) , and there are 3 groups. As we are interested in the dose, we choose this in the “select columns” box, and fill this in with our 3 dose groups. Lastly, we want to compare their concentrations. Therefore, in “Select Numeric” Value, we can choose “Concentration_ugL”, and click apply to get our ANOVA results, generating a table of key stats, such as p-value, f-statistic, and interquartile ranges.
  6. Save this report as a transportable, reproducible, and interactive report: Lastly, we want to export the plot we’ve generated for publication, or to share amongst other clinicians. To do this, we can click on our Download tab, and select “HTML” which will download this plot locally. We can also download the data behind it by clicking either the CSV or excel buttons, or save to a particular project with the “Save Report” Button.

The output of the workflow we’ve used above is illustrated below:

Figure 7. The output from our workflow above, demonstrating a line chart that illustrates the PK profiles for all dosage concentrations in our trial, the hover functionality to determine key metrics such as Cmax  (e.g. 20.72 ± 6.60 ug/L at 3.5 days for 10 mg/kg). In addition, the use of descriptive titles and labels, visualization of standard error for each dosage group, and statistical test metrics on a filtered subset of dosage groups to determine if they are significantly different. Lastly, We can also see the option to download this plot via the Download tab in @downloading pk report.webm

Part 2: Example exploration of Pharmacodynamic data

Another key component to our clinical trial is our pharmacodynamic data, although there are a variety of ways to measure this, we can use immune-cell mediated tumor control to understand which populations which exist in the tumor microenvironment (TME), both as biomarkers and indicators of how efficacious the treatment is. In this case, using flow cytometry, we can identify different subsets, the increase or decrease in which can be positively or negatively correlated with patient outcomes. For the sake of demonstration, these will include:

Increase in: 

  • NK cells
  • CD8 + cells
  • CD4 T Cells
  • M1 Macrophages

Decrease in: 

  • Tregs
  • M2 Macrophages 
  • Myeloid-derived suppressor cells (MDSCs)

As with the pharmacodynamic data, an initial exploration of the data via the table viewer may help to understand the data (however, as we have already demonstrated this, we will not visualize this for now). In order to plot the change in the proportion of immune cells over the course of 9 months (the length of our study) we can open the boxplot app, select “months” as our X-axis, and “proportion” as our Y-axis alongside the “group” checkbox in Single/Group plot. Lastly, as we want to look at each cell type, we select “cell_type” under “Color by variable” to group by each population, and then select apply.

This plot provides us an initial overview of how our 12 measured populations change over the course of 9 months, with each individual point representing a cell population in a particular patient at a particular time.

Figure 8. Overall boxplot of flow cytometry data for 12 immune cell populations in the tumor microenvironments of 40 ESCC patients undergoing treatment with Tyzolimab. This plot demonstrates the box plots functionality to separate out the cell populations, innately perform an ANOVA to determine if there are significant differences between these populations, and allow interactive exploration of these different subsets of cells.

Although useful for an overview, we are hoping to answer a particular question – “how do the populations of particular cell types change between the start and end of our study?”. Firstly, we can help to clean up our plot by navigating to the filter tab, selecting “include”, and then only including the month’s corresponding to the start and end of the study (1 and 9) and clicking apply, we will need to navigate back to the variables tab, and select “change numbers to categories” before hitting apply again (this will also need done when we apply a second filter in the next step). 

Having now got the data for the first and last months (1 and 9 respectively), we can click on the cell types on the right hand side of the plot which we don’t wish to appear in our plot, this is the quickest way of obtaining a quick view of our data with modifying the underlying data. However, to have our filtering persist, we can go back to the filter tab, click “add additional filter”, select “cell_type” and then enter the cell types we listed above. Lastly, to fully exploit the boxplot’s utility, we can navigate to the chart icon, and select “hovertool display” as dose, which will tell us the proportion of immune cells and dosage group each datapoint came from.

Figure 9. Fine-grained exploration of flow cytometry data for 12 immune cell populations in the tumor microenvironments of 40 ESCC patients undergoing treatment with Tyzolimab in the first and last months of treatment. This plot is an extension of plot 4, and only includes the cell types and time points we are interested in.

From these results, we can see an overall positive trend, as indicated by an increase in CD4, CD8 and NK cells, which in this context, are correlated to good prognosis, and an approximately steady population of M1 macrophages. Conversely, we can see decreases in both M2 macrophages and MDSCs, which are associated with poor prognosis, as well as decrease of Tregs into two seemingly distinct, but smaller populations, indicating an additional avenue of exploration as to what these subgroups of patients with differing Treg populations might be.

Part 3: Example exploration of survival curves

Disclaimer: This dataset has an artificially inflated number of patients (~300) to allow more robust kaplan-meier curves and log-rank statistics for the sake of demonstration. This number of patients  however might be reflective of a phase II or Phase III clinical trial.

Oftentimes in phase I immuno-oncology trials such as this, patientshave high-grade (III or IV) late stage cancers, and have exhausted other treatment options. Considering this, understanding how the administration of our drug influences patient survival is critical. To this end,a Kaplan-Meier curve can help us visualize and distinguish the impact of our different drug doses on our patients. 

We can select the survival app from our storyboard, and from here, fill in “Select event variable” with Event_Overall_Survival (this is the binary outcome on a per patient basis, where 0 is death, and 1 is survival). We can then “Select time variable” as Overall_Survival, which is the time measurement that tells us when our patients died or survived, and lastly, we can “Select grouping variable” as the “Dose”, once again allowing us to delineate how our dose might affect survival. We can then select “Confidence Interval” as being “on”, which shades each curve by our 95% confidence interval (The area we are 95% sure the true population mean lies within). 

Although the 95% confidence interval may provide a useful visualization to understand the spread of our data, we may also wish to get a statistical measure of how different our predicted survivals are. To do this,  we can use the log-rank test by selecting “Log Rank” as on. Although we can compare any set of survival curves using this test, testing two similar doses could help to determine if an increasing dose with a greater number of side effects (for example) is worth administering. For now, we can select the 2.5 mg/kg, and 5 mg/kg doses, and leave the Log Rank weightings and Significant Figures as default for now (Wilcoxon and 3 respectively). Following this, we can now click apply to produce the following plot:

Figure 10. Kaplan-Meier Survival curves of the different dosage groups in our clinical trial along with 95% confidence intervals using the survival app and log-rank test between 2.5 mg/kg and 5 mg/kg doses. This data clearly demonstrates the distinction between the predicted survival alongside increasing dosage, the difference between the 2.5mg/kg and 5 mg/kg being significantly different, with a p value of 6.06e-08. From the 95 CI’s, we can see no overlap between in the latter portions of the curves between 300mg, 5mg/kg and 10 mg/kg, suggesting that although they are not significantly different from each other, they are significantly different from the lower doses (2.5mg/kg and below), thereby suggesting that these higher doses improve predicted survival. To further this, the log-rank test could be reapplied to demonstrate these statistical differences between curves.

In this context, our survival curves show an ascending dose is well correlated to improved survival of our patients, especially at the higher doses of around 5 mg/kg, 10 mg/kg, and 300 mg. With respect to the static 300mg dose, considering the average patient may weigh anywhere from 40-80kg (likely lower than average consider all factors, due to difficulties involved with retaining weight in late stage cancer), then this likely correlates to a 3.75-7.5mg/kg dose depend on patient weight, and still lies within the higher doses of 5 and 10 mg/kg in the dose escalation.

Part 4: Exploration of adverse events

The primary objective of phase I clinical trials is to explore the safety and tolerability of a drug. In order to interrogate this, understanding the breakdown of what adverse events occurred, in which patients at what time, and their severity helps reveal the side effects our drugs may generate and how intense they are. 

From our data table, we can select a variety of columns which contain data collected during the clinical trial, and quickly visualize this to summarize the above. By selecting the variables plot app from the storyboard, we can navigate to the Variables Tab, and from the dropdown “Select column to plot”, we can select the column “name” (i.e. the type of adverse event that occurred) and click add graph. Add another graph, and select “ae_or_sae” (which distinguishes if an event was an adverse event – grade 2 or lower, or a serious adverse event – grade 3 or higher). 

We can also modify how this data is presented, using either a histogram, bar chart or boxplot by clicking on the icons above the graph. As we’ve used our boxplot earlier in this workflow, we will choose to visualize the cumulative sum of adverse events by grade in our 3rd chart via a bar chart (fig. 7: bottom right). Lastly, we can see the “attribution”, i.e a decision by a clinician to determine if the adverse event was caused by the clinical trial treatment.

Figure 11. A breakdown of adverse event data collected during our clinical trial using the variables plot app (top left). A pie chart representing the different types of adverse events recorded. with the classification of events as either adverse events (grade 2 or less) or serious adverse events (grade 3 or greater) (top right). Bar chart representing the sum of adverse events in each grading band (bottom left). The attribution of our different ae’s, i.e. how many events were thought to be unrelated, unlikely, possibly, or probably caused by treatment (bottom right).

The most important metric with respect to the adverse events is likely their cause and their severity. From our bottom two plots, we can hover over the plots and determine that there is 1 event designated as probably caused by our treatment – investigating this first is high priority. By launching our table viewer app, navigating to the filter tab, and selecting “attribution_” as the filter variable, we can chose to only look at entries where the probability is high, having filtered below and obtaining the result, we can see that the adverse event occurred at month 3 (from the column “month_ae_occured”), whereas the treatment was only administered at the start of month 4 (treatment_adminstered_start_of), thereby revealing that this event could not have been caused by Tyzolimab administration.

Figure 12. A table viewer interrogation building on our visual exploration of the adverse events data. This filtering of our original data allows us to identify the patient who had an adverse event attributed to the treatment whilst simultaneously disproving it, due the event occurring prior to the start of Tyzolimab treatment.  

Part 5: Code based exploration of additional data

Although parts 1 to 4 provide an initial overview of the capacity of Sonrai discovery’s storyboards and workflows, The amount and variety of clinical data and means to visualize this data is essentially limitless, and may include custom visualizations, such as Waterfall plots, Spider plots, and Swimmer plots.  To this end, Sonrai Discovery also supports the option of creating custom jupyter notebooks in both Python and R, with up to 16TB storage, 768 gb memory, and 192 CPUs that enable this visualizations to be created. Below is an example workflow that an individual with coding knowledge, has been given a script to execute, or wish to clone down a github repository could undertake.

Notebooks can either be launched from a project, or from the home page using the lighting bolt icon on the right hand menu. For this demonstration, we will assume the data exists within the notebook repository, which acts as a self – contained directory for storing our files. By clicking on the notebook, we enter into the working directory of that particular notebook, which includes all the .csv files, and python .ipynb notebook(s).

An example of launching a notebook, entering into jupyterlab (the improved user interface for jupyter notebooks in the context of scientific workflows), importing data, executing our scripts to generate interactive Plotly plots, and then export these as interactive html plots is demonstrated in the @phase 1 clinical trial jupyter notebook.webm.

This entire workflow has so far been no-code, or using Python. Arguably however, the most common language used in biological statistical analysis is R. To this end, we can also run a notebook in R to demonstrate R functionality via Jupyter. In this script, we can follow the above process, using the data frame used in part 4: Adverse events, and use the R package TableOne to create a summary table of the adverse events based on the different dosages. This video captures the process of generating and writing out this summary dataframe – @Creating a TableOne summary of adverse events in R.webm

Figure 13.The output from the execution of R script creating  a summarized view of the adverse events based on the dosage. This portion of TableOne output shows us the month in which the adverse event occurred (i.e. cycle 1 = month 1) as well as the type of adverse event which occurred according to each dosage.


The synthetic Phase I trial of Tyzolimab in Advanced Esophageal Squamous Cell Carcinoma (ESCC), as demonstrated through Sonrai Discovery, exemplifies the profound impact of advanced data management and analytics in clinical research. While this trial is a fictional construct, it effectively showcases how comprehensive data analysis and visualization tools can streamline complex clinical trials, enhancing the efficiency and accuracy of data interpretation.

Throughout the trial, from dose escalation to pharmacokinetic and pharmacodynamic analysis, and even in monitoring adverse events, Sonrai Discovery exhibited an exceptional ability to handle multifaceted clinical data. This approach not only facilitates a deeper understanding of the drug’s efficacy and safety profile but also underscores the importance of robust data management systems in the realm of clinical research.

The use case serves as a testament to the potential of sophisticated platforms like Sonrai Discovery in revolutionizing clinical trials. By providing a detailed, structured, and user-friendly environment for data analysis, it can significantly aid researchers and clinicians in making informed decisions, ultimately accelerating the development of new therapeutics and improving patient outcomes.

In conclusion, while Tyzolimab and the trial itself are hypothetical, the insights gained from this exercise are real and valuable. They highlight the necessity for and the benefits of integrating advanced data management and analytical tools in clinical research, paving the way for more innovative and effective healthcare solutions in the future.

Discover how companies worldwide grow with Sonrai. Explore all our case studies.

Biomarker Discovery

See how we enabled GenoMe Diagnostics to scientifically and reproducibly shortlist novel potential biomarkers and take forward the most promising.

Predictive Biomarker Discovery

Sonrai's integrated biomarker discovery toolkit unveiled predictive insights, enabling validation and enhancing clinical potential for precision medicine.

Code-Optional Discovery

Discover how to accelerate biomarker discovery with a code-optional machine learning technology for maximum productivity and efficiency.

Get in touch

Like What You See? Let's Talk

We Listen to your problems
We give you confidence to make your decision