Purpose: Many artificial intelligence (AI) systems have historically been validated in Europe, which has a less racially diverse population than the United States. For this study, we evaluate the performance of an AI system on screening mammograms performed in a racially diverse patient population in New Jersey.
Materials and Methods: This retrospective study included 28,278 DBT screening exams and their diagnostic workups, collected from February to July 2022 from outpatient imaging centers within a private practice. The screening exams (Hologic) were evaluated by MQSA radiologists.. BIRADS 0 cases were recalled for additional evaluation and potential biopsy. All DBT screening exams were analyzed by an AI system (Transpara 1.7.1, ScreenPoint Medical), which assigned an exam score between 1-10 indicating an increasing likelihood of malignancy. The performance of the AI system was evaluated, overall and for each individual racial group.
Results: Out of 28,278 screening exams, 4,170 exams were labeled BIRADS 0, resulting in a recall rate of 14.8%. Of patients recalled, 3,531 returned to our facilities for diagnostic imaging. There were 581 diagnostic exams labeled BIRADS 4-5 and recommended for biopsy. Biopsy results were available for 331 cases at the time of analysis, revealing 70 cancers. Of the patients whose studies were labeled BIRADS 0 (4,170), 2,211 (53.0%) were white, 743 (17.8%) were Asian, 682 (16.4%) were African American, 439 (10.5%) were Hispanic, and 89 (2.1%) were of another race. For 6 patients (0.1%), race was not available.
Including all races, AI identified 59/70 cancers (84%) with an exam score of 10 and 65/70 cancers (93%) with an exam score 8-10. In white women, AI identified 37/45 (82%) cancers with a score of 10 and 42/45 (93%) of cancers with a score 8-10. In Asian women, AI identified 9/11 (82%) cancers with a score of 10 and 10/11 (91%) with a score 8-10. In African American women, AI identified 8/9 (89%) cancers with a score of 10 and 8/9 (89%) with a score 8-10. In Hispanic women, AI identified 5/5 (100%) cancers with a score of 10. There were no cancers detected in the women who identified as another race or for whom race was not available.
Conclusion: AI score 8-10 has a strong predictive value for cancer. The system performs similarly well across all racial groups.
Clinical Relevance Statement: The results of our study suggest that the AI system we evaluated could serve as a helpful tool to indicate the likeliness of cancer in a racially diverse screening population.