Methodological quality of cardiac CT and MRI radiomics studies assessed using METRICS and RQS by human readers and ChatGPT 5.1 Thinking.

📚 期刊: European radiology experimental 📅 发表: 0000-00-00 🔬 PMID: 42319678 🔗 DOI: 10.1186/s41747-026-00756-5 👁️ 浏览: 3

👤 作者: Garello LF, Giannini V, Gatti M, Defeudis A, Cafaro D, Nicoletti G, Culasso NC, Faletti R, Veltri A, Cuocolo R

心血管

📝 摘要

OBJECTIVE: To assess the methodological quality of cardiac CT and MRI radiomics studies using the METhodological RadiomICs Score (METRICS) and Radiomics Quality Score (RQS), and to evaluate inter-rater reliability (IRR) of both scoring tools among human readers and ChatGPT 5.1 Thinking. MATERIALS AND METHODS: Cardiac CT and MRI radiomics studies published up to 28 February 2025 were scored by human readers with complementary expertise in cardiac imaging and radiomics using both scoring systems. IRR was evaluated in 30 randomly selected studies by two independent groups of secondary readers and ChatGPT 5.1 Thinking. RESULTS: Of 781 screened records, 154 were included. The overall median METRICS was 0.60 (IQR, 0.52-0.68), and the median percentage RQS was 0.36 (IQR, 0.19-0.42), corresponding to a median absolute RQS of 13 (IQR, 8-15). The scoring systems highlighted several methodological limitations, such as a lack of external validation, a prospective study design, and open data availability. Between human readers, IRR was good for METRICS (ICC, 0.77-0.88) and moderate to good for RQS (ICC, 0.59-0.82). Between human readers and ChatGPT 5.1 Thinking, IRR was moderate to good for METRICS (ICC, 0.70-0.85) but only poor to moderate for RQS (ICC, 0.46-0.56). CONCLUSIONS: Cardiac CT and MRI radiomics research quality was rated as good by METRICS, whereas RQS yielded lower scores. Human readers showed good reproducibility with METRICS and moderate to good reproducibility with RQS. ChatGPT 5.1 Thinking showed potential for automating the evaluation process, but its use requires caution due to potential discrepancies with human evaluations. RELEVANCE STATEMENT: Research quality in cardiac CT and MRI still suffers from substantial limitations. The application of METRICS and RQS using LLMs requires caution, given the limited reproducibility when compared with human assessments. KEY POINTS: According to METRICS and RQS, radiomic-based cardiac CT and MRI studies remain affected by substantial methodological limitations. Human readers achieved good reproducibility with METRICS and moderate to good reproducibility with RQS. ChatGPT 5.1 Thinking may be helpful for scoring radiomics research quality, but its results should be interpreted with caution due to potential discrepancies with human evaluations.

← 返回心血管查看原文 →