AI healthcare research is prone to numerous flaws, pitfalls

The Yuan requests your support! Our content will now be available free of charge for all registered subscribers, consistent with our mission to make AI a human commons accessible to all. We are therefore requesting donations from our readers so we may continue bringing you insightful reportage of this awesome technology that is sweeping the world. Donate now

By Moein Shariatnia | Jun 09, 2023

Image courtesy of and under license from Shutterstock.com

The ever swifter, wider use of AI in healthcare research makes for many mistakes along the way and often yields inherently flawed research, explains ML developer and medical student Moein Shariatnia, who devises computer vision and NLP applications in healthcare with DL models.

TEHRAN -

Introduction

AI in healthcare research is rapidly evolving, with many new research projects in the field published every day. Unfortunately, there are also many mistakes and pitfalls in the datasets used or the methodologies implemented in these projects which limit their clinical usability despite their novelty, or the amount of money invested. Identifying these common pitfalls in AI research may go far toward overcoming them in future projects and building more reliable intelligent systems for clinical use.

Much of this article is based on the findings highlighted in a significant paper published in the journal Nature that investigated machine learning (ML) models that detect or prognosticate cases of coronavirus based on chest X-rays or computed tomography images.¹ The authors found none of the 320 papers they studied - of which 62 were included for their detailed analysis - of potential clinical use. This is an eye-opening finding and an important reminder for all researchers in ML and life sciences to be aware of the biases and mistakes that, if left uncorrected, could render their research projects partially or even completely useless.

Main findings

One of the most interesting points made by this study is that it excluded a vast number of papers from its detailed review either because of a failure to fulfill certain mandatory and important criteria, or because of a lack of sufficient documentation regarding those critical issues. A common cause of such failures was insufficient documentation regarding how the authors selected their best final model. In a typical ML or deep learning (DL) project, there is a validation set - or, when using cross validation, several validation sets - and the best performing model is chosen based on a pre-defined

The content herein is subject to copyright by The Yuan. All rights reserved. The content of the services is owned or licensed to The Yuan. Such content from The Yuan may be shared and reprinted but must clearly identify The Yuan as its original source. Content from a third-party copyright holder identified in the copyright notice contained in such third party’s content appearing in The Yuan must likewise be clearly labeled as such.

GET STARTED

- or -