What does it mean to be healthy? Currently, health is measured by an absence of visits to healthcare providers. But this results in a lack of data on patients, which is problematic for building machine learning for the healthcare system—this is the current challenge that Marzyeh Ghassemi, Professor at the University of Toronto in Computer Science and Medicine, faces.

Healthcare may not be the most conducive environment for machine learning. After all, it is an industry where only highly specialized professionals are employed.

“Doctors are hyper specialists; they train for a decade—or more—to become proficient in what they do with reasonable accuracy,” says Professor Ghassemi. “Even with all the fantastic advances machine learning has made in other domains, we don’t have another domain where you have to get [this degree of training].”

While there may be challenges to building machine learning in healthcare, there are many opportunities for machine learning to improve efficiencies; from speeding up workflow and automating administrative tasks to finding trends by analyzing thousands of records of treatment techniques and patient outcomes.

Currently, Professor Ghassemi is working to implement machine learning for health information systems—a massively disruptive task for settings that have long-established heterogeneous procedures. Hospitals and physicians have long relied on pagers and faxes, for want of better—but also secure—technology. She hopes machine learning algorithms can speed up processing times, where AI can assist doctors in finding information, allowing them to focus more on validating which will also increase accuracy of tasks such as diagnosis.

Another concern is how well a health-focused AI would perform on a general population when it can only reflect those who enter the healthcare system. Professor Ghassemi, who studies both unsupervised learning and representation, drew up examples of how machine learning can go wrong outside of healthcare, such as misclassification of gender or race by commercial facial recognition packages that are broadly used today.

“When you learn something in an unsupervised manner, the representation matters and robustly inspecting the representation also really matters. If you’re not aware of these [nuances], the AI can be deployed [with misrepresentations]. We’re all being graded by algorithms that have the sheen of objectivity, but it is all biased.”

“We need to learn how to work with ‘missingness’ in our model. You can’t just compute it away. Often, records have missing data for societally biased reasons. You need to be able to calibrate your model probabilities equally within people of different backgrounds.. You don’t want your model to be poorly calibrated.”

Also missing from modern healthcare and patient data are all the aspects of a person’s lifestyle that can affect their health. Operations, recovery times, and prescribed medications are not the only determinants of health, after all.

“There are many ways for a person to be healthy; people are so dissimilar and that’s okay,” says Professor Ghassemi. “We don’t have any data to say whether it matters if you walked a mile this week or that you just walked at all. Does it matter that you talked to many people this week, or that you spoke to one person you’re close with?”

“In a modern context, there are so many possible things a [doctor] could spend their time on, and they don’t have an understanding of all of that yet. Being able to tell someone ‘you seem to do best when you sleep early three nights a week and call your mother once a week’—that would be really valuable. But we need to learn all these things.”

Professor Marzyeh Ghassemi spoke about “Learning ‘Healthy’ Models in Machine Learning for Health” at the 2018 Machine Learning and the Market for Intelligence conference at the Rotman School of Management at the University of Toronto.  To view her full recorded talk, visit here.