The illusory correlation between geometrical similarity and physician satisfaction in heart auto-segmentation revealed by a physician-blind test.

📚 期刊: Journal of applied clinical medical physics 📅 发表: 0000-00-00 🔬 PMID: 42298799 🔗 DOI: 10.1002/acm2.70642 👁️ 浏览: 4

👤 作者: Lee JC, Heo EJ, Cho SH, Lee DY, Chang KH, Shim JB, Lee NK, Lee S

心血管

📝 摘要

BACKGROUND: Auto-segmentation tools are essential in adaptive radiation therapy (ART). While evaluation typically relies on geometric metrics like Dice similarity coefficient (DSC), high scores do not always translate to clinical acceptability. PURPOSE: This study investigated the "illusory correlation" between geometric indices and physician satisfaction and aimed to identify critical anatomical substructures that dictate clinical judgment. METHODS: Heart auto-segmentation was performed for 30 left-sided breast cancer patients using pre-built and on-site trained fully convolutional dense network (FCDN) models. Geometric similarity was assessed via DSC, mean surface distance (MSD), and 95th percentile Hausdorff Distance (HD95). Clinical satisfaction was quantified through a physician-blind test involving 17 anatomical items. Pearson correlation coefficients (PCC) and conditional probability (P(B|A)) were used to analyze the relationship between substructure accuracy and overall clinical acceptance. RESULTS: Both models showed high geometric similarity (mean DSC ∼ 0.95), but clinical satisfaction differed drastically: the pre-built model had a 3.3% acceptance rate, while the on-site model achieved 93.3%. PCC analysis failed to show significant correlations between geometric metrics and satisfaction after multiple testing corrections. However, conditional probability analysis revealed that the cranial/caudal borders and the superior vena cava (SVC) were the primary determinants of satisfaction, with P(B|A) values up to 0.95. Conversely, while coronary arteries showed the greatest geometric improvement in the on-site model, their individual success did not guarantee overall clinical acceptance (P(B|A) ∼ 0.42-0.52), explaining the disconnect between standard metrics and clinical judgment. CONCLUSIONS: Geometric metrics alone are insufficient to validate auto-segmentation for clinical use. Clinical acceptance is driven by specific critical substructures, particularly boundary regions, rather than global volumetric similarity. We propose a "two-guideline strategy" that prioritizes these high-impact regions and establishes quantitative gateways for model commissioning, providing a more robust framework for quality assurance (QA) in ART.

← 返回心血管查看原文 →