Purpose: Screening DBT exams read with AI support improve the cancer detection rate (CDR) and reduce false-positive findings. The purpose of this study is to evaluate the effect of AI on the interpretation of screening mammograms in a double reader study.
Materials and Methods: Our IRB-approved prospective study used an AI system trained utilizing model ensembling, with models based on 486,383 full-field digital mammography (FFDM)/DBT exams. This AI-DBT model produces a high-specificity threshold score for each breast. Screening mammograms interpreted as BIRADS 1/2 by Reader 1 but determined by AI model as suspicious (top 10% of suspicious cases and cases with >50% increase in AI score compared to prior mammogram) were selected for second read. Ten fellowship-trained breast radiologists served as second readers (mean experience = 10.7 years, range 2-33 years). Only Reader 2 was privy to the AI score (percentage of suspicion for malignancy) for each breast and bounding boxes highlighting suspicious findings.
Results: From 05/2023 - 09/2023, 15,820 screening DBT mammograms were interpreted at a health enterprise that includes academic and private practices. 2,278 (14.4%) mammograms were recalled (BIRADS 0) and 66 cancers were diagnosed, yielding a CDR of 4.2/1000. Of 13,542 initially assessed as BIRADS 1/2, 1,643 (10.4%) were deemed suspicious by AI and underwent a second read. 200/1,643 (12.2%) interpreted by a second reader received BI-RADS 0. 47 (23.5%) were recommended for biopsy (BIRADS 4,5). 37/47 of recommended ultrasound core/stereotactic biopsies have been performed, yielding 14 cancers in 13 patients. 7/13 patients had heterogeneously dense breasts and 6/13 had scattered fibroglandular tissue. The AI system correctly localized the cancer in all cases. Of 14 cancers, 6 were architectural distortion, 2 focal asymmetry, 1 mass, 5 calcifications. Of 14 cancers diagnosed on biopsy, 6 yielded IDC (3 - Grade 2, 3 - Grade 1), 5 yielded ILC (5 - Grade 3), 3 yielded DCIS (2 - intermediate nuclear grade, 1 - high nuclear grade). All invasive cancers were ER+/PR+/HER2- and all DCIS were ER+/PR+. The CDR of 1,643 exams double-read and interpreted with AI was significantly higher, 8.5/1000 (p < 0.001).
Conclusion: By deploying our AI model into clinical screening mammography workflow and performing a second read on a small fraction (10.4%) of exams, CDR can be significantly increased and more aggressive, high grade tumors can be detected.
Clinical Relevance Statement: Our AI model is able to identify subtle, high grade breast cancers that otherwise can be overlooked by experienced breast radiologists during routine screening mammography interpretation.