Time series data is ubiquitous. The availability of this type of data is increasing, as is the need for automated analysis tools that can extract interpretable and actionable knowledge from it.

To this end, well-established and more interpretable time-series approaches remain competitive for many tasks. This data can be modeled by AI technologies to build diagnostic or predictive tools. Still, adopting AI technologies as black-box tools is problematic in various applied contexts.

Researchers from the University of Geneva (UNIGE), the University Hospitals of Geneva (HUG) and the National University of Singapore (NUS) have developed an entirely new technique to assess the interpretability of artificial intelligence (AI) technologies. door to more transparency and trust in AI-driven diagnostics and predictive tools.

The new method elucidates the mysterious inner workings of so-called “black box” AI algorithms, helping users understand how AI results are influenced and whether those results can be trusted. This is crucial in circumstances where human health and life are significantly impacted, such as when AI is used in healthcare.

Professor Christian Lovis, director of the Department of Radiology and Medical Informatics at UNIGE’s Faculty of Medicine and Head of the Department of Medical Information Science at HUG, who co-led this work, said: “The way these algorithms work is opaque to say the least. Of course, the stakes, especially financial, are extremely high. But how can we trust a machine without understanding the basis of its reasoning? These questions are essential, especially in industries such as medicine, where AI-driven decisions can affect people’s health and even lives; and finance, where they can lead to massive capital loss.”

Assistant Professor Gianmarco Mengaldo, director of the MathEXLab at the National University of Singapore’s College of Design and Engineering, who co-led the work, said: “Interpretability methods try to answer these questions by deciphering why and how an AI arrived at a certain decision and the reasons behind it. Knowing which elements tipped the scales for or against a solution in a specific situation, allowing for some transparency, increases the trust that can be placed in them.”

“However, current interpretability methods commonly used in practical applications and industrial workflows yield tangibly different results when applied to the same task. This raises the important question: which method of interpretation is correct, since there must be a unique, correct answer? Therefore, evaluating interpretability methods becomes just as important as interpretability itself.”

PhD student in the lab of Prof. Lovis and first author of the study Hugues Turbé explains: “Discerning data is critical when developing interpretable AI technologies. For example, when an AI analyzes images, it focuses on some characteristic attributes. For example, AI can distinguish between an image of a dog and an image of a cat. The same principle applies to analyzing time series: the machine must be able to select elements – for example peaks that are more pronounced than others – on which to base its reasoning. ECG signals mean reconciling signals from the different electrodes to evaluate possible dissonances that could indicate a particular heart disease.”

It can be difficult to select an interpretability approach from the many available approaches for a particular purpose. Even when used for the same dataset and task, different AI interpretability algorithms often generate substantially different results. The researchers created two new evaluation methods to help understand how the AI ​​makes decisions to meet this challenge: one for determining the most relevant parts of a signal and another for determining their relative relevance in relation to the final prediction. .

They hid some of the data to see if it was necessary for the AI’s decision making to assess interpretability. This method, meanwhile, occasionally led to inaccurate results. They trained the AI ​​on an enhanced dataset that contains hidden data to account for this and maintain the accuracy and balance of the data. The team then developed two metrics to assess the effectiveness of the interpretability approaches, showing whether the AI ​​was using the right data to make decisions and whether all available data was treated equally.

Hugues Turbe said: “In general, our method aims to evaluate the model that will be used within its operational domain, thus ensuring its reliability.”

Magazine reference:

  1. Turbé, H., Bjelogrlic, M., Lovis, C. et al. Evaluation of post-hoc interpretation methods in time series classification. Nat Mach Intell 5, 250-260 (2023). DOI: 10.1038/s42256-023-00620-w