Most AI models in medical machine learning need labeled or annotated datasets for training since it teaches the models to recognize diseases accurately. Since this method requires extensive annotation by human clinicians, which is frequently expensive and time-consuming, it is challenging for tasks involving the interpretation of medical images. For example, skilled radiologists would have to examine hundreds of thousands of X-ray images one at a time and individually annotate each one with the conditions found to label a chest X-ray collection. While more modern AI models have attempted to solve this labeling problem by learning from unlabeled data during a “pre-training” stage, they ultimately need fine-tuning on labeled data to achieve high performance.
To address this issue, Stanford University and Harvard Medical School researchers have collaborated to create an artificial intelligence diagnostic tool that can identify diseases on chest X-rays directly from natural-language descriptions found in accompanying clinical reports. Since most current AI models require arduous human annotation of reams of data before the labeled data can be fed to the model for training, this is seen as a significant step forward in clinical AI design. According to a study in the prestigious journal Nature Biomedical Engineering, the CheXzero model was as effective as human radiologists at spotting abnormalities on chest X-rays. Other academics can access the team’s code thanks to its open sourcing.
The researchers think that one of the earliest contributions to the next generation of flexible medical AI models is that they can learn from the text directly in their work. This new model is self-supervised, which means it can learn on its own without the use of annotated training or testing data. The only data used in the model are the English-language notes found in the reports and chest X-rays. For existing AI models to perform well, enormous volumes of manual annotation data are required. CheXzero does not require any of these disease-specific annotations in any case. A chest X-ray and associated radiologist report must be provided to the model as input. The model then discovers on its own how to match the chest X-rays with the corresponding input report. After that, it can finally figure out which ideas in the unstructured text correspond to the image’s visual patterns.
The model was trained using a publicly accessible dataset that included more than 227,000 clinical notes and 377,000 chest X-rays. Then, two distinct datasets of chest X-rays and associated notes gathered from two different institutions were used to assess its performance. To minimize bias, the institutions were located across several nations. The model fared equally well when exposed to clinical notes that may have used different wording to describe the same observation due in part to this diversity. CheXzero could recognize diseases that human clinicians had not expressly marked during testing. It fared better than other self-supervised AI technologies and had accuracy comparable to that of radiologists. Based on the findings, this method may be used with imaging modalities, including CT scans, MRIs, and echocardiograms that go beyond X-rays.
CheXzero demonstrates how good accuracy in complex medical image interpretation can now be achieved without using large annotated datasets. The model embodies the potential of avoiding the medical machine learning industry’s long-standing bottleneck caused by extensive labeling. CheXzero’s capabilities can be used in a wide range of medical contexts where unstructured data is the norm, although chest X-rays were used as the driving example.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, harvard article and reference article. Please Don't Forget To Join Our ML Subreddit
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology (IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.