The Yuan requests your support! Our content will now be available free of charge for all registered subscribers, consistent with our mission to make AI a human commons accessible to all. We are therefore requesting donations from our readers so we may continue bringing you insightful reportage of this awesome technology that is sweeping the world. Donate now
Multimodal Model Architectures May Enhance Clinical AI Performance
By George Mastorakos  |  Feb 07, 2022
Multimodal Model Architectures May Enhance Clinical AI Performance
Image courtesy of and under license from
George Mastorakos believes combining data types into what are called "multimodal models" may be the key to moving clinical artificial intelligence into the next phase of better performance and broader applicability of clinical decision-making.

SCOTTSDALE, ARIZONA - Healthcare encompasses a plethora of data types and sources: demographic data, lab values, scans, videos, speech studies, medication dosages, insurance coverage data, and wearable data, e.g., FitBit/Apple Watch, to name just a few.

Despite this diversity, most machine learning models in healthcare incorporate only one type of data source, whether that be image data, e.g., labeled magnetic resonance imaging (MRI) to detect brain tumors, or time series data, such as electrocardiograms to detect arrhythmias. Research on multimodal models, models that incorporate multiple types, or, mathematically speaking, have different modes, is sparse. Why aren’t multimodal models used as the standard for clinical artificial intelligence (AI)? 

Key Challenges

For multimodal models to work properly, several pre-processed, cleanly labeled, and relevant datasets must be readily accessible. This prerequisite unearths a few key obstacles. Firstly, compiling a new single database, let alone multiple ones, is sometimes a challenge. Much of patient data is scattered across the electronic medical record, image storage systems, e.g., picture archiving and communication systems, and other clinical data stores; is difficult to parse through, sort, and organize; is usually manually scraped by medical students or assistants conducting clinical research - an error-prone process. Even if the proper care and energy was put into organizing a single database, it may not have all the data types necessary for training a multimodal model. A typical cancer registry, e.g., may include patient characteristics, chemotherapy regimens, and treatment outcomes, but likely doesn’t include x-ray image files or specific lab values over time.

Secondly, labeling multiple data types requires extreme consideration toward the end use case scenario. A mo

The content herein is subject to copyright by The Yuan. All rights reserved. The content of the services is owned or licensed to The Yuan. Such content from The Yuan may be shared and reprinted but must clearly identify The Yuan as its original source. Content from a third-party copyright holder identified in the copyright notice contained in such third party’s content appearing in The Yuan must likewise be clearly labeled as such.
Continue reading
Sign up now to read this story for free.
- or -
Continue with Linkedin Continue with Google
Share your thoughts.
The Yuan wants to hear your voice. We welcome your on-topic commentary, critique, and expertise. All comments are moderated for civility.