Should we be More Open to Sharing Medical Data for AI Development?

The Yuan requests your support! Our content will now be available free of charge for all registered subscribers, consistent with our mission to make AI a human commons accessible to all. We are therefore requesting donations from our readers so we may continue bringing you insightful reportage of this awesome technology that is sweeping the world. Donate now

By Martin Willemink | Dec 23, 2021

Image courtesy of and under license from Shutterstock.com

Although many healthcare professionals understand that data sharing will offer opportunities to improve healthcare, these regulatory, ethical, and technological challenges prevent many institutions from sharing their data. Yet there are certain initiatives that offer opportunities to tackle these issues.

PALO ALTO, CALIFORNIA - Patients’ health can potentially be improved by leveraging healthcare data that is accumulated over time within hospitals. This can be done with novel technologies such as deep learning (DL) and machine learning (ML), but also with more traditional real-world evidence (RWE) studies. Large amounts of data are needed to allow for RWE research and DL and ML model development. The accuracy of DL and ML models is very dependent on the quality and amount of data that the algorithms are exposed to. Accuracy improves with larger numbers of training cases. Another important aspect is the heterogeneity of data, which improves the generalizability of DL and ML models. Heterogeneity of data can be achieved by gathering data from multiple sources from different geographical locations resulting in a diverse research population with a different prevalence of diseases.

Data usage in the healthcare setting can be divided into primary use and secondary use. Data that is needed to provide care in the routine clinical setting is considered primary data use. Data that is used for optimization purposes within the healthcare setting, e.g., research studies, is considered secondary data use. In general, hospitals and clinics collect and store data for primary data usage. Since these data may be available however, they potentially allow for secondary purposes such as the development of DL and ML models and conducting RWE studies. For secondary healthcare data usage, it is important the data is representative for the whole target population. If a DL or ML model is developed for European healthcare systems, it is essential that the model is trained, validated, and tested in a diverse European population. If this is not the case, there is a risk that the DL or ML model will not perform accurately in the whole target population. On

The content herein is subject to copyright by The Yuan. All rights reserved. The content of the services is owned or licensed to The Yuan. Such content from The Yuan may be shared and reprinted but must clearly identify The Yuan as its original source. Content from a third-party copyright holder identified in the copyright notice contained in such third party’s content appearing in The Yuan must likewise be clearly labeled as such.

GET STARTED

- or -