Dataset’s Critical Role in Creating Correct Predictive Models
By Jan Sevcik  |  Oct 21, 2021
Dataset’s Critical Role in Creating Correct Predictive Models
Image courtesy of and under license from Shutterstock.com
The digitization of electronic medical records, new machine learning techniques, and more robust hardware are creating opportunities to solve many unanswered questions in healthcare, but using the correct dataset is critical to creating predictive models which draw the correct conclusions. Even the most advanced AI techniques will yield poor results if the wrong dataset is utilized. Jan Sevcik discusses the variables’ conundrum and the problems scientists face.

CHATTANOOGA, TENNESSEE - Important considerations when choosing a dataset are the number of variables and volume of data. Because current artificial intelligence (AI) techniques allow more efficient analysis of datasets with many variables, in healthcare, data from electronic health records (EHRs) are becoming the norm for creating clinical models versus less robust datasets like claims databases.

A dataset has to contain a sufficient volume of data to make a predictive decision, but generally more important is for the dataset to contain robust variables. If the data do not contain correct variables a higher volume will not solve the problem. For a team to have clinical expertise in addition to data sources is also important.

An example of decisions made on incomplete data is something anyone is familiar with who has used a navigation application in a large metropolitan area during rush hour. A navigation app relies on many different data points, such as the distance between points, travel speed, accidents, or construction projects to suggest the most efficient route. Data points like weather are not included, however.

If a large storm with heavy rain is set to cross an interstate in a metropolitan area, it will reduce traffic speed and statistically result in more accidents, both of which will increase travel time. If a navigation app’s user happens to be at a decision point where a local road might be an option versus the interstate, but the storm has not yet crossed the interstate but already slowed traffic speeds, the app will still consider the interstate to be the most efficient route because at that time that is the correct decision based on available data.

If the app had incorporated a weather radar into the decision, it might have suggested a more efficient route base

The content herein is subject to copyright by The Yuan. All rights reserved. The content of the services is owned or licensed to The Yuan. The copying or storing of any content for anything other than personal use is expressly prohibited without prior written permission from The Yuan, or the copyright holder identified in the copyright notice contained in the content.
Continue reading
Sign up now to read this story for free.
- or -
Continue with Linkedin Continue with Google