April 3, 2023 in Viewpoint
How Federated Learning Is Helping to Overcome Machine Learning Obstacles
SHARE: PRINT ARTICLE:
https://doi.org/10.1287/LYTX.2023.02.10
Researchers are constantly devising novel applications for machine learning. Yet some industries have encountered seemingly insurmountable hurdles to implementing machine learning on a large scale, namely those industries dealing with sensitive or protected data.
That’s because machine-learning models require access to huge, disparate data sets for training and validation – and therein lies the problem. Much of the necessary data is coupled with rigorous privacy regulations, operating complexities and interoperability woes.
Consider the massive space and bandwidth required for storing and analyzing data gathered from multiple organizations – and then imagine the lack of data standardization. Data sourced from different institutions is often not entered in a common format, which is a nightmare for artificial intelligence (AI) developers and academics trying to combine disparate data sets.
These problems are compounded for industries such as medical research and healthcare, in which patient data must also be properly deidentified before it can be shared with researchers.
But a new approach to machine learning offers a solution to these myriad obstacles: federated learning. This new subset of machine learning allows developers and researchers to train and validate AI models on decentralized data, subverting the need to collate all that data on a single server.
But how exactly does federated learning work? And how is it able to bypass all of these complex hurdles?
Developing a Workaround
Federated learning helps academic researchers improve the quality of the data sets on which they train and validate their machine-learning models, all while protecting sensitive data. To maintain privacy, researchers and developers simply deploy copies of an AI model to the data, as opposed to moving the data itself.
The model then analyzes the data right where it is, reporting the results of the analysis back to a centralized database to which collaborating researchers are granted access. This means models can be trained on diverse data collected from global entities – boosting the quantity and quality of data available – which will improve the model’s accuracy when generalized to larger populations.
This also enables the development of a privacy-preserving data pipeline between partners and research institutions: Data can be exchanged between a multitude of entities without being migrated or replicated. The data stays protected behind the safety of its original firewalls.
In terms of interoperability – another hurdle in data exchange – recent protocols in place, in particular the U.S. Core Data for Interoperability (USCDI), have standardized data entry for institutions to more easily share data sets with other research entities. However, such protocols need to be enforced and also applied at the international level.
Use Cases
Federated learning has huge implications for medical research, perhaps more than in any other industry. Medical research is often obstructed by an umbrella of patient privacy regulations. Federated learning allows researchers to circumvent these regulations because no external researcher ever views or handles the raw data – only the model touches and analyzes the underlying data.
The use of federated learning is already being validated in medical research. Rhino Health cofounder and CEO Ittai Dayan coauthored a study that explored the creation of an AI algorithm capable of predicting the supplemental oxygen needs of people coming to the emergency room with symptoms of COVID-19. In the study, federated learning was proficient at predicting the supplemental oxygen needs of patients based on a distributed data set. The model analyzed data from 20 participating hospitals, examining vital signs, laboratory data and chest X-rays to better forecast patient outcomes.
In this case, federated learning facilitated a rapid data science collaboration without any actual data exchange and developed a model that could be generalized across heterogeneous unharmonized data sets.
That said, federated learning is not only being applied in healthcare. It’s also gaining ground in industries under fire for selling or marketing user data, particularly the tech giants. Google currently uses federated learning in its predictive iPhone keyboard, called Gboard. The data never leaves the user’s phone; rather, the model predicts the next word based on word associations it has learned on other people’s Internet of Things devices.
With this approach, federated learning is providing a bridge for AI developers to seamlessly connect diverse data pools from different sources into one model – something that traditional machine learning was not able to do.
Mathias C. Blom, M.D., Ph.D., M.Sc., is the partnerships vice president at Rhino Health and a graduate of Lund University and Harvard University. Mathias’ career has focused on innovating and advancing the technologies shaping the future of healthcare. From developing machine-learning pipelines to leveraging predictive analytics for accelerating innovation in healthcare applications and medical research, Mathias is passionate about making a positive change in all life science sectors to improve patient care.