Purpose: The goal of breast cancer screening is to maximize earlier detection of breast cancer while minimizing the number of patients recalled. It is generally possible to achieve one while compromising the other, but challenging to achieve both simultaneously. Here, we tested the hypothesis that an AI-guided expert review process could be used to lower practice recall rates (RR) with minimal effect on cancer detection rate (CDR) by focusing the additional review only on a subset of exams where a recall was initially proposed and cancer is very rare.
Materials and Methods: A FDA-cleared AI was run on 194,268 eligible screening mammograms at a large outpatient screening mammography practice (69 sites, 55 radiologists) from 03/2023-10/2023. An AI-guided expert review process (ERP) was initiated, whereby exams in the least suspicious 75% of AI scores that were recalled by the initial interpreting radiologist were reviewed by an expert breast imaging specialist. The expert decided whether to consult with the initial interpreting radiologist to reconsider their recall decision. Two approaches were tested: from 03/2023-06/2023 reviews were eligible on all radiologists (“practice-wide ERP”), and from 06/2023-10/2023 reviews were eligible only for interpreting radiologists with a baseline RR above 12% (“targeted ERP”). To determine how effective each approach was, practice RR and 95% confidence interval were calculated and compared from the year before ERP (03/2022-03/2023) to the time during practice-wide ERP and targeted ERP. An analysis was also performed using only exams from radiologists with a RR above 12%.
Results: Implementation of ERP resulted in a decrease in RR, from 11.5% (11.4-11.6%) in the year prior to the program to 9.6% (9.4-9.8%) during practice-wide ERP, to 8.3% (8.1-8.4%) during targeted ERP. The benefit is magnified when looking at the 28 radiologists with RRs above 12%, who decreased from 16.0% (15.8-16.2%) to 12.8% (12.5-13.1%) during practice-wide ERP, and 9.0% (8.7-9.3%) during targeted ERP. The decrease in recall rate was driven by reduced recall of exams with low AI suspicion of cancer, which made up 51% of recalls before ERP vs. 32% during targeted ERP. Experts reviewed 4.0% of exams during practice-wide ERP, and just 1.2% of exams during targeted EPR.
Conclusion: ERP was effective at lowering practice-level RR while requiring minimal additional review of exams by expert radiologists. It was particularly effective when focused on radiologists whose RR is above clinical guidelines.
Clinical Relevance Statement: ERP may be an efficient way to direct efforts in practices looking to reduce RRs without reducing quality of care.