Main Session
Sep 30
PQA 09 - Hematologic Malignancies, Health Services Research, Digital Health Innovation and Informatics

3613 - Implementing Generative AI in Radiation Oncology: What Could Possibly Go Wrong?

04:00pm - 05:00pm PT
Hall F
Screen: 4
POSTER

Presenter(s)

Austin Faught, PhD Headshot
Austin Faught, PhD - The Ohio State University Wexner Medical Center, Columbus, OH

A. M. Faught, P. E. Klages, P. Sadeghi, J. M. Pakela, E. Lee, and A. S. Ayan; The Ohio State University Wexner Medical Center, Columbus, OH

Purpose/Objective(s): The rapid gains in generative artificial intelligence (AI) have resulted in a myriad of potential uses in the radiation oncology domain. We have been interested in using a large language model (LLM) as an advisor on group, departmental, and industry policies and guidelines. We sought to identify the risks associated with the tool through a failure modes and effects analysis (FMEA) study.

Materials/Methods: Six medical physicists were instructed to generate a list of failure modes (FMs) associated with using a LLM to offer guidance in clinical decision making for the medical physics group based on adopted standard operating procedures, departmental policies, and American Association of Physicists in Medicine (AAPM) guidance documents. The FMs were then redistributed to the physicists for independent scoring of probability of occurrence (O), severity (S), and lack of detectability (D). Average scores for each of the three metrics were calculated along with a final risk probability number (RPN). As an example of its proposed use, and as a method for further evaluating the risk, an LLM model was asked to perform on FMEA on its own list of ten FMs. The LLM was pointed to a directory of 32 AAPM task group (TG) reports, including TG-100, the report on risk analysis methods in radiation therapy, as a knowledge base for the task.

Results: The human scorers identified 22 unique FMs. Only one FM, a limited ability to handle updates to policy or procedures, was independently identified by all six physicists. A total of 11 of the 22 FMs were identified by at least two physicists. Fourteen of twenty-two FMs had an RPN greater than or equal to 125, the cutoff at which AAPM TG-100 suggests more attention is warranted for a FM. Of the ten FMs generated by the AI, six overlapped with FMs identified by the human scorers, including the FM identified by all six physicists. Nine of the ten AI generated FMs were scored with RPNs greater than 125, and the lone FM less than 125 had a severity of 10. The root mean squared error of the difference between AI RPNs and human scored RPNs in the overlapping FMs was 66, with differences as small as 0 and as large as 144. There were no instances of an overlapping human and AI generated FM having an RPN greater than 125 from one methodology and not the other.

Conclusion: Generative AI has the potential to be an invaluable tool in the field of radiation oncology. Specifically, the use of LLM models can help to enforce departmental consistency in practice, adherence to industry standards, and efficient means of referencing departmental guidelines. Even as an advisor, and not as a decision-making entity directly touching patients, the implementation of generative AI is not without risk. This work outlined those risks and highlighted the utility of the AI by having it perform its own risk assessment through an FMEA. Prospective risk mitigation strategies such as this exercise can be invaluable in both understanding risk and implementing preventions and barriers to reduce risk associated with adopting new technologies.