Author: Dr Paul O’ Reilly | Reading time: 7 Minutes
Artificial Intelligence (AI) has captured imaginations across a wide range of disciplines and attracted funding from enterprises, funders, and investors. Nowhere is this seen more than in Healthcare, where barely a day goes by without the publication or announcement of an AI-based application which promises to simplify, speed up, augment or otherwise improve medical practice and subsequently patient outcomes and quality of life.
In any development project, the success or failure of that project is contingent on mitigating risks, and that applies to AI development and data science as much as to any other technical/software project. Given that AI systems are also mainly software, there are many common risks.
In this article, we will focus on those risks which apply specifically to the development and deployment of AI systems. With the relative immaturity of creating production AI systems, these are some of the main risks to delivering value and mitigating the risks of failed AI projects.
Risks to AI
Problem Domain Understanding
One of the main risks to AI projects is common to any engineering or software project – lack of, or incomplete, understanding of the problem domain or the business problem which needs to be addressed. Often AI developers and data scientists take a known problem in a domain and attempt to solve it, without understanding the full implications of what they are trying to achieve.
Whilst there is benefit in taking a fresh approach to a long-standing problem, tackling this without understanding the problem can result in solutions which may have limited application in the real world. An example of this is the many research projects applying AI to Covid19 datasets early in the pandemic – recent research has suggested that the majority of these solutions were inappropriate.
It is imperative that the correct stakeholders are identified early, and that they work collaboratively during all phases of an AI project.
It is our experience that this is best done within an agile process (subject to specific regulatory requirements, which may mandate otherwise) – allowing more fruitful interactions between all stakeholders and decreasing the risk that the solution does not adequately address the real problems. Specific attention should be paid to collaboratively agreeing on the scope of the project and its intended purpose.
If the scope is too ambitious this may have a severe impact on all ‘downstream’ phases of the project and unduly increase the time and cost of data acquisition, validation and regulatory approval (if required). If it is too limited, the solution may not provide adequate benefits to the users and be commercially unviable in the long term. Hence, spending some time and effort in this phase can determine the ultimate success or failure of the AI project.
Since most modern AI systems are based on learning patterns and meaning from data, it is critical to understand and manage that data – as with knowing the problem domain, it is important that the AI team understands their data landscape intimately.
This includes understanding the quantity and quality of the data available to them, the labelling and annotation of that data and any tradeoffs as regards cost of obtaining and managing the data.
One aspect of building AI systems is the amount of data used to train and validate the systems. There are a number of extremely large open source data sets available in many of the standard AI domains, e.g. ImageNet, Open Images Dataset, and if the problem being addressed maps to those domains, then these can be used. However, often the problem domain is sufficiently different that data sets need to be identified or built from scratch.
It is not just a matter of indiscriminately acquiring as much data as can be found – the cost and effort of acquiring data, and the need to understand its quality and preprocess/label/annotate the data (for supervised learning approaches) may mean that this approach is not viable.
This is where an agile, iterative approach can de-risk an AI project. Data sets can be built iteratively and incrementally as the solution develops from the proof of concept stage through to full maturity. Using approaches such as transfer learning from similar domains, a PoC can be built with a surprisingly small amount of data, which in consultation with stakeholders can be used to scope the data required to build and validate the final system.
We have found this approach is preferable to, and less expensive than, gathering all data up-front and then using this data to build the system, as it allows early identification of data issues, and subsequent data cleaning, annotation protocol improvement, and enrichment during the project, rather than at the end.
As mentioned above, annotation/labelling of data is a key part of building AI systems, particularly when applying supervised learning, although even when using unsupervised approaches, validation will typically be done against an annotated or labelled ground truth.
The majority of AI systems obtain labels and annotations from human experts, and as such may be prone to human errors and/or biases, and if multiple sources are used, may be inconsistently applied. These introduce risk in the form of error or bias in the final validated system, and so care must be taken in obtaining the ground truth data and controlling the quality thereof.
We have found benefit in working with stakeholders and domain experts to develop annotation protocols, which can then be used to train annotators and attempt to ensure consistency.
One subtle point is that the annotation protocol should be specifically directed towards building AI models and the solution scope. For example, when working with pathologists, we have found that their understandable tendency is to annotate classes which have clinical importance and identify morphologies on this basis. Whilst these can be used to develop computational pathology algorithms, we have found benefit in machine learning experts and pathologists working together to define the classes appropriate to the problem at hand.
Putting processes in place for data and annotation quality (and the people and/or systems to manage these processes) is obviously more ‘costly’ from a superficial perspective, but having the appropriate ones de-risks the delivery from a technical and business point of view. However, it will be a business decision as to whether this benefit is worth the added costs and complexity
Need some advice?
Meet Our Expert Diagnostics Team
Processes & Development
As mentioned above, there are a number of process aspects to AI projects which may mitigate risks to the success of the project.
One of these is the use of ‘waterfall’-like processes vs more agile iterative ones. In addition to the advantages of building data sets mentioned above, the use of an iterative approach allows new architectures and approaches to solving the problem to be tried and validated on an ongoing basis.
Given the evolution of state-of-the-art models and architectures, the ability to use new approaches throughout the project is paramount, and so the ability to make changes within an iteration can bring big benefits.
Whilst the ability to change architectures of the AI models within an iteration is valuable, it is important that this is not the be-all and end-all – the project should not be about ‘chasing the state-of-the-art’. Very often, new results are incremental improvements on existing results, and any incremental improvements may not be seen outside the exact domain and/or benchmark data.
It is more important to take a data-centric approach – controlling, managing and understanding existing and new data will often add more value than using new approaches on the data blindly. Hence data management is at the core of AI (as described above) and the processes should be set up to ensure this.
One aspect that will drive the processes within a project is its context - Is this a ‘research’ project, aimed at providing a Proof-of-Concept or output which is published, or is it a development/engineering project, the end goal of which is a product to be delivered to real users in a real-world workflow?
If the former, then the processes may be more flexible, but if the latter, especially in medical (or other regulated areas) AI, then processes will necessarily be stricter and more detailed. However, where possible, there needs to be the element of agile, iterative working that is typically difficult in the more traditional waterfall methodologies preferred by regulators.
This requires AI teams to follow Good Machine Learning Practices such as those outlined in guidance from the FDA and the UK MHRA, and to engage with Quality and Regulatory experts, both inside and external to their organisations, to come up with the best, most flexible processes that can be applied to the project.
Bias, Fairness & Reproducibility
The subject of Bias, Fairness and Reproducibility in AI Systems, specifically as applied in Medical AI is important enough to be outlined in another article, so it won’t be elaborated on here. But it is important that AI developers are aware that these should be addressed and monitored at all stages of the development process. Failure to do so can result in a product which is not fit for use and is a risk to the success of the project.
Perception & Acceptance
Even when an AI project is delivered successfully from a technical and scientific point of view, there is still a risk to its success in the form of resistance from stakeholders and/or users. In fairness, Ai practitioners have not been their own best advocates.
Statements such as Hinton’s infamous “We should stop training radiologists now, it’s just completely obvious within five years deep learning is going to do better than radiologists.” inevitably trigger resistance, especially since that statement is still provably false. As wilder claims are being made about AI (and particularly AGI), fear of replacement/redundancy creates resistance to the use of AI solutions among the ‘incumbents’.
One understandable aspect which increases resistance to the adoption of AI solutions is the perception of AI as being a black box, with a lack of explainability as to why the AI makes its decisions. Efforts must be taken to provide visualizations and other information so that the decisions made can be interrogated, and users can gain confidence in them.