文档文献名称 与 第一作者：
Radiomics: Data Are Also Images
Objectives: Radiomics is the high-throughput analysis of medical images for treatment individualization. It conventionally involves the quantification of different characteristics of a region of interest such as a tumor delineated in the image. These characteristics can be intensity measurements (such as mean SUV), volume, geometrical shape and textural features. The lack of standardisation of image features and the use of different software implementations limits the reproducibility of radiomics studies. It is thus a major hurdle for potential clinical translation of radiomics applications. To address this limitation, an international collaboration of 19 teams from 8 countries (Image Biomarkers Standardization Initiative, IBSI, see https://arxiv.org/abs/1612.07003) was initiated to i) establish a comprehensive radiomics workflow description, ii) provide verified definitions of commonly used features and iii) provide benchmarking of features extraction and image processing steps, as well as reporting guidelines. Material and
Methods: Phase 1 of the initiative consisted in specifying and benchmarking across all participants more than 350 statistical, morphological and textural features (both in 2D or 3D) using a very simple digital phantom not requiring any image pre-processing steps. In phase 2, we added image pre-processing steps and features were benchmarked on 5 different configurations of a lung cancer patient CT image. Each configuration differed in the workflow of image processing steps, i.e. how the image stack is analyzed (2D: cases 1 and 2; 3D: 3 to 5), the interpolation method (none: 1; bi/trilinear: 2 to 4, tri-cubic: 5) and the grey-levels discretization approach (fixed bin size: 1 and 3; fixed number of bins: 2, 4 and 5). Both phases were iterative as the participants could compare their results with the other teams and update their workflow implementation accordingly. The most frequently contributed value of each feature was selected as its benchmark value. Agreement on a benchmark value was considered to exist if the value was produced by at least 50% of contributing teams (minimum 3), weighed by their overall accuracy in reproducing the benchmark values. Results: Twenty different software implementations across the 19 teams provided features values. In both phases, only a limited number of features were initially in agreement (phase 1: 12.3%, phase 2: 0.5 (0.0-1.4)%). The number of reliable features increased over time as problems were identified and solved, and agreement was achieved for most features (phase 1: 99.4%; phase 2: 96.4 (94.0-97.7)%). The remaining features for which no agreement could be reached were not commonly implemented.
Conclusions: We addressed the lack of standardization in radiomics features definition, implementation and image pre-processing steps by providing a digital phantom and reliable benchmark values for most features. Exploiting this provided standard to validate radiomics software used in future studies is recommended to increase the reproducibility of such studies.