Evaluating Human-in-the-loop Strategies for Artificial Intelligence-enabled Translation of Patient Discharge Instructions: a Multidisciplinary Analysis
According to the United States (U.S.) Census, more than 25 million individuals in the U.S. (≈ 7.5% of the U.S. population) speak English less than “very well.” Patients who use languages other than English have worse clinical and patient-centered outcomes, including less healthcare usage and more adverse events. A key driver of these inequities is barriers to high-quality translation.
An article published in npj Digital Medicine described the findings of a study that compared the quality of free-text hospital discharge instruction translations produced by ChatGPT-4o (Artificial Intelligence (AI)-generated) and human-in-the-loop (AI-generated with human oversight) relative to professional human translations across six languages—Arabic, Armenian, Bengali, simplified Chinese, Somali, and Spanish—as assessed by linguists, clinicians, and family caregivers.
The results revealed that the performance of ChatGPT-4o was inconsistent in comparison to professional translations, with the poorest ratings for digitally underrepresented languages. In contrast, human-in-the-loop translations achieved comparable and often better outcomes, including higher quality translation, than professional translations for all six languages. The authors concluded that human-in-the-loop strategies could enable safe, efficient, and equitable applications of AI-powered translation in clinical practice.
Download the PDF below.
Journal Article

