Title
题目
Performance of a Deep Learning Algorithm Compared
with Radiologic Interpretation for Lung Cancer Detection on Chest Radiographs in a Health Screening Population
在健康筛查人群中,胸部X线片上深度学习算法与放射学解读相比较的肺癌检测性能"
Background背景
The performance of a deep learning algorithm for lung cancer detection on chest radiographs in a health screeningpopulation is unknown.
深度学习算法在健康筛查人群的胸部X线片上用于肺癌检测的性能尚不清楚。
Method
方法
Out-of-sample testing of a deep learning algorithm was retrospectively performed using chest radiographs from individuals undergoing a comprehensive medical check-up between July 2008 and December 2008 (validation test). To evaluate the
algorithm performance for visible lung cancer detection, the area under the receiver operating characteristic curve (AUC) and diag nostic measures, including sensitivity and false-positive rate (FPR), were calculated. The algorithm performance was compared with that of radiologists using the McNemar test and the Moskowitz method. Additionally, the deep learning algorithm was applied to a screening cohort undergoing chest radiography between January 2008 and December 2012, and its performances were calculated.
深度学习算法的样本外测试是通过回顾性地使用2008年7月至12月接受全面医学检查的个体的胸部X线片进行的(验证测试)。为评估可见肺癌检测的算法性能,计算了接收器操作特征曲线下面积(AUC)和诊断指标,包括敏感性和假阳性率(FPR)。利用McNemar检验和Moskowitz方法比较了算法性能与放射科医生的性能。此外,深度学习算法应用于2008年1月至2012年12月进行胸部X线检查的筛查队列,并计算了其性能。
Results
结果
In a validation test comprising 10 285 radiographs from 10202 individuals (mean age, 54 years 6 11 [standard deviation]; 5857 men) with 10 radiographs of visible lung cancers, the algorithm’s AUC was 0.99 (95% confidence interval: 0.97, 1), and it showed comparable sensitivity (90% [nine of 10 radiographs]) to that of the radiologists (60% [six of 10 radiographs]; P = .25) with a higher FPR (3.1% [319 of 10 275 radiographs] vs 0.3% [26 of 10275 radiographs]; P , .001). In the screening cohort of 100 525 chest radiographs from 50070 individuals (mean age, 53 years 6 11; 28 090 men) with 47 radiographs of visible lung cancers, the algorithm’s AUC was 0.97 (95% confidence interval: 0.95, 0.99), and its sensitivity and FPR were 83% (39 of 47 radiographs) and 3% (2999 of 100478 radiographs), respectively.
在一个包含来自10202名个体(平均年龄54岁,标准差11岁)的10285张X线片的验证测试中,其中5857名男性,有10张可见肺癌的X线片,该算法的AUC为0.99(95%置信区间:0.97,1),其敏感性为90%(10张X线片中的9张),与放射科医生的敏感性(60% [10张X线片中的6张])相当(P = .25),但假阳性率更高(3.1% [10275张X线片中的319张] vs 0.3% [10275张X线片中的26张];P < .001)。在由50070名个体(平均年龄53岁,标准差11岁)的100525张胸部X线片组成的筛查队列中,其中28090名男性,有47张可见肺癌的X线片,该算法的AUC为0.97(95%置信区间:0.95,0.99),其敏感性和假阳性率分别为83%(47张X线片中的39张)和3%(100478张X线片中的2999张)。
Conclusion
结论
A deep learning algorithm detected lung cancers on chest radiographs with a performance comparable to that of radiolo gists, which will be helpful for radiologists in healthy populations with a low prevalence of lung cancer.
深度学习算法在胸部X线片上检测肺癌的性能与放射科医生相当,这对于在肺癌患病率较低的健康人群中的放射科医生来说将是有益的。
Figure
图
Figure 1: Flowchart of (a) validation test cohort and (b) screening cohort.
图1:验证测试队列(a)和筛查队列(b)的流程图。
Figure 2: Receiver operating characteristic curves of deep learning algorithm for (a) detectionof visible lung cancer on chest radiographs and (b) cancer-positive chest radiographs comparedwith board-certified radiologists in validation test. In validation test composed of 10285 chest radiographs (a), including 10 chest radiographs with visible lung cancer, the algorithm had an areaunder the receiver operating characteristic curve (AUC) of 0.99 (95% confidence interval [CI]:0.97, 1), and the radiologists showed a sensitivity of 60% and a specificity of 100%. In magnifiedillustration, red dot that represents radiologists’ performance is below receiver operating characteristic curve of algorithm. In validation test composed of 10289 chest radiographs (b), including14 cancer-positive chest radiographs, the deep learning algorithm had an AUC of 0.89 (95%CI: 0.79, 0.99). In comparison, three board-certified radiologists showed a sensitivity of 43% anda specificity of 100% for this task. In magnified figure, red dot that represents radiologists’ performance is below the receiver operating characteristic curve of the algorithm.
图2:深度学习算法在胸部X线片上检测可见肺癌的受试者工作特征曲线(a)和与董事会认证的放射科医生在验证测试中检测癌阳性胸部X线片的比较(b)。在包含10285张胸部X线片(a)的验证测试中,其中包括10张可见肺癌的胸部X线片,该算法的受试者工作特征曲线下面积(AUC)为0.99(95%置信区间[CI]:0.97,1),放射科医生显示出60%的敏感性和100%的特异性。在放大的图示中,代表放射科医生表现的红点位于算法的受试者工作特征曲线下方。在包含10289张胸部X线片(b)的验证测试中,其中包括14张癌阳性胸部X线片,深度学习算法的AUC为0.89(95% CI:0.79,0.99)。相比之下,三位董事会认证的放射科医生在这项任务中显示出43%的敏感性和100%的特异性。在放大的图中,代表放射科医生表现的红点位于算法的受试者工作特征曲线下方。
Figure 3: Representative case of deep learning algorithm correctly detecting visible lung cancer on a chest radiograph in a health screening. Images in a 56-year-old woman who underwent chest radiography as part of a comprehensive health check-up and screening. (a) Chest radiograph shows ill-defined lesion with a diameter of 3.5 cm (arrowhead) faintly identified in right upper lung apex, which is obscured by bony thorax. (b) Axial unenhanced chest CT scan taken on same day as chest radiograph shows 4.1-cm lung mass (arrowhead) with spiculated margin in right upper lobe apex. Right upper lobe lobectomy was performed, and the mass was pathologically proven to be invasive adenocarcinoma with an acinar and bronchioloalveolar pattern. © Deep learning algorithm provided a probability value of 0.85 for this being a positive case and correctly localized lesion in right upper lung apex (arrowhead). This lung mass was missed by a board-certified radiologist in the reader study
图3:深度学习算法在健康筛查中正确检测到胸部X线片上可见肺癌的代表性案例。这是一名56岁女性的影像,她作为全面健康检查和筛查的一部分接受了胸部X线检查。(a) 胸部X线片显示右上肺尖部有一个直径为3.5厘米的不规则病变(箭头),在骨盆腔的遮挡下隐约可见。(b) 在与胸部X线片同一天拍摄的轴位未增强胸部CT扫描显示右上叶尖部具有具有带状边缘的4.1厘米肺肿块(箭头)。患者接受了右上叶切除术,该肿块经病理证实为浸润性腺癌,呈腺泡和支气管肺泡型。© 深度学习算法为此提供了0.85的概率值,指示这是一个阳性病例,并正确定位了右上肺尖部的病变(箭头)。这个肺肿块在阅片研究中被一名董事会认证的放射科医生漏诊。
Figure 4: Receiver operating characteristic curves of deep learning algorithm for detection of lung cancer on chest radiographs in a health screening cohort. (a)Receiver operating characteristic curve of deep learning algorithm for classification of cancer-positive chest radiographs in a health screening. Area under the receiver operating characteristic curve (AUC) was 0.78 (95% confidence interval [CI]: 0.73, 0.83). (b) Receiver operating characteristic curve of deep learning algorithm for visible lung cancers on chest radiographs, with an AUC of 0.97 (95% CI: 0.95, 0.99). © Receiver operating characteristic curve of deep learning algorithm for detection of clearly visible lung cancers on chest radiographs. AUC of algorithm was 0.99 (95% CI: 0.99, 0.99).
图4:深度学习算法在健康筛查队列胸部X线片上检测肺癌的受试者工作特征曲线。(a) 深度学习算法在健康筛查中分类癌阳性胸部X线片的受试者工作特征曲线,曲线下面积(AUC)为0.78(95%置信区间[CI]:0.73,0.83)。(b) 深度学习算法在胸部X线片上检测可见肺癌的受试者工作特征曲线,AUC为0.97(95% CI:0.95,0.99)。© 深度学习算法在胸部X线片上检测明显可见肺癌的受试者工作特征曲线,算法的AUC为0.99(95% CI:0.99,0.99)。
Figure 5: Representative case of deep learning algorithm detecting clearly visible lung cancer on a chest radiograph in a health screening. Images in a 67-year-old man who underwent chest radiography as part of a comprehensive health check-up and screening. (a) Chest radiograph shows faintly visible lung mass (arrowhead) with diameter of 3.5 cm in left middle lung field. (b) Unenhanced chest CT scan taken on same day as chest radiograph shows 3.3-cm lung mass (arrowhead) with spiculated margin and air bronchogram in left lower lobe on axial plane. Patient underwent left lower lobe lobectomy, and this mass was pathologically proven to be squamous cell carcinoma. © Deep learning algorithm provided a probability value of 0.91 for patient having lung cancer and correctly localized lesion in left middle lung field (arrowhead).
图5:深度学习算法在健康筛查中检测胸部X线片上明显可见的肺癌的代表性案例。这是一名67岁男性的影像,他作为全面健康检查和筛查的一部分接受了胸部X线检查。(a) 胸部X线片显示左中肺野有一个直径为3.5厘米的隐约可见的肺肿块(箭头)。(b) 在与胸部X线片同一天拍摄的未增强胸部CT扫描显示在左下叶的轴位上有一个具有带状边缘和空气支气管征象的3.3厘米肺肿块(箭头)。患者接受了左下叶切除术,该肿块经病理证实为鳞状细胞癌。© 深度学习算法为患者患有肺癌提供了0.91的概率值,并正确定位了左中肺野的病变(箭头)。
Table
表
Table 1: Baseline Clinical Characteristics of Individuals and Chest Radiographs in Validation
Test and Screening Cohorts
表1:验证测试队列和筛查队列中个体和胸部X线片的基线临床特征
Table 2: Comparison between Diagnostic Performance of Deep Learning Algorithm and That of Three Board-certified Radiolo gists for Detection of Visible Lung Cancers on Chest Radiographs in Validation Test Cohort
表2:在验证测试队列中,深度学习算法与三位董事会认证的放射科医生在胸部X线片上检测可见肺癌的诊断性能比较
Table 3: Comparison between Diagnostic Performance of the Deep Learning Algorithm and That of Three Board-Certified Radi ologists for Detection of Cancer-Positive Chest Radiographs in Validation Test
表3:在验证测试中,深度学习算法与三位董事会认证的放射科医生在检测癌阳性胸部X线片方面的诊断性能比较
Table 4: Diagnostic Performance of Deep Learning Algorithm for Detection of Lung Cancers on Health Screening Cohort Chest Radiographs
表4:深度学习算法在健康筛查队列胸部X线片上检测肺癌的诊断性能