本笔记来源于B站Up主: 有Li 的影像组学的系列教学视频
本节(29)主要讲解: 用pingouin包进行ICC的计算
 
 
1、ICC的wikipedia定义

In statistics, the intraclass correlation, or the intraclass correlation coefficient (ICC), is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other. While it is viewed as a type of correlation, unlike most other correlation measures it operates on data structured as groups, rather than data structured as paired observations.

 

2、导入包

pip install pingouin # for the first time

import pingouin as pg
import pandas as pd
import numpy as np
import os

 

3、调取内置数据集

data = pg.read_dataset('icc')
print(data)

 
数据格式大概是这个样子:
 

 
 
4、计算ICC

icc = pg.intraclass_corr(data = data, targets = "Wine", raters = "Judge",
                         ratings = "Scores")
print(icc)

 
输出结果如下:

 
 

5、实战

folderPath = "C:/Users/RONG/Desktop/ICC_calculation/"

data_1 = pd.read_excel(os.path.join(folderPath,"ICC_reader_1.xlsx"))
data_2 = pd.read_excel(os.path.join(folderPath,"ICC_reader_2.xlsx"))

data_1.insert(0,"reader",np.ones(data_1.shape[0]))
data_2.insert(0,"reader",np.ones(data_2.shape[0])*2)

data_1.insert(0,"target",range(data_1.shape[0]))
data_2.insert(0,"target",range(data_2.shape[0]))

data = pd.concat([data_1,data_2]) # make a data frame like the test data 

icc = pg.intraclass_corr(data = data, targets = "target", raters = "reader",ratings = "featureA")
print(icc)

以上 ICC_reader_1.xlsx 及处理后的数据形式为:

 

 

 

在影像组学实际应用中,应用for循环进行批量化计算各个feature的ICC值。

 
 

作者:北欧森林
链接:https://www.jianshu.com/p/9e79ac76fed6
来源:简书,已获授权转载

    5 个月 后

    强璐 楼主您好,当我使用pingouin计算每个特征的ICC时,发现计算出的ICC存在小于0的数,好像ICC只能是0-1这个范围吧,存在疑惑,向您请教一下!!!

      强璐 以下是我的代码和原始数据:
      import pingouin as pg
      import pandas as pd
      import numpy as np
      import os
      folderPath = ‘C:\Users\ltp-0810\Desktop\Radiomics\ICC\’
      data1 = pd.read_csv(os.path.join(folderPath, ‘reader1.csv’))
      data2 = pd.read_csv(os.path.join(folderPath, ‘reader2.csv’))

      data1.insert(0, ‘reader’, np.ones(data1.shape[0]))
      data2.insert(0, ‘reader’, np.ones(data2.shape[0])*2)

      data1.insert(0, ‘patient’, range(data1.shape[0]))
      data2.insert(0, ‘patient’, range(data2.shape[0]))

      data_inter = pd.concat([data1, data2]) ###组间

      ##for 循环计算每个特征的一致性
      ICC_inter = [] ##组间ICC
      for colName in data_inter.columns[3:]:
      ICC = pg.intraclass_corr(data=data_inter, targets=‘patient’, raters=‘reader’, ratings=colName)
      ICC = ICC.iloc[2, 2] ##选择 ICC3
      ICC_inter.append(ICC)
      min(ICC_inter_) ##结果发现ICC最小值有小于0

      reader1.txt
      1MB
      reader2.txt
      1MB

        ltp0810 没想通 怀疑是bug,可以把有问题的组挑出来 用其他软件 比如 SPSS做一下看看

          以下是我的代码和原始数据,运行后报错
          报错:
          _File “<stdin>”, line 2
          ICC = pg.intraclass_corr(data=data, targets=‘R’, raters=‘PAT’, ratings=colName)
          ^
          IndentationError: expected an indented block

          ICC.apend(ICC)
          Traceback (most recent call last):
          File “<stdin>”, line 1, in <module>
          AttributeError: ‘list’ object has no attribute ‘apend’_

          代码:
          import pingouin as pg
          import pandas as pd
          import numpy as np
          import xlrd
          import os
          fPath = ‘/Users/caiqian890307/Downloads’
          data = pd.read_excel(os.path.join(fPath, ‘phantom1_6.xlsx’))
          ICC = []
          for colName in data_.columns[3:]:
          ICC = pg.intraclass_corr(data=data, targets=‘R’, raters=‘PAT’, ratings=colName)
          ICC.apend(ICC)
          print(ICC)

          phantom1-6.xlsx
          70kB

          麻烦李老师帮忙看下代码错误原因,谢谢。


          [未知用户] 我用spss也有小于零的,这个可能跟正相关和负相关概念有关,但是网上也有的说你看绝对值就行,用icc的绝对值评价其稳定程度。


          import pingouin as pg
          import pandas as pd
          import numpy as np
          import xlrd
          import os
          fPath = ‘/Users/caiqian890307/Downloads’
          data = pd.read_excel(os.path.join(fPath, ‘phantom1_6.xlsx’))
          ICC = []
          for colName in data.columns[3:]:
          ICC = pg.intraclass_corr(data=data, targets=‘R’, raters=‘PAT’, ratings=colName)
          ICC.append(ICC)
          print(ICC)

          报错:
          _File “<stdin>”, line 2
          ICC = pg.intraclass_corr(data=data, targets=‘R’, raters=‘PAT’, ratings=colName)
          ^
          IndentationError: expected an indented block

            Hukui 你的数据里面评估者G,m评估数据有完全一样的,所以运行时会报警,以下是我弄得代码:

            ICC_ = [] #建一个空表名ICC__与下面ICC不同名,防止数据被覆盖
            for colName in data.columns[2:]: #你的数据从第3列开始是要进行比较的
            ICC = pg.intraclass_corr(data=data, targets=‘R’, raters=‘PAT’, ratings=colName)
            ICC = ICC.iloc[0, 2] ##选择ICC1时为[0,1], 选择ICC2时后面为[1,2]…
            ICC.append(ICC)
            print(ICC
            )
            实在不行你再看看单独比较一列数据时会不会报错

              ltp0810 代码修改如下:
              import pingouin as pg
              import pandas as pd
              import numpy as np
              import os

              data_inter = pd.read_excel(‘/Users/caiqian890307/Downloads/pythonforicc/phantom1_6.xlsx’)

              icc_inter = []
              for colName in data_inter.columns[2:]:
              ICC = pg.intraclass_corr(data = data_inter, targets = “R”, raters = “PAT”, ratings = colName)
              ICC = ICC.iloc[2,2]
              icc_inter.append(ICC)
              print(icc_inter)

              运行还是报错,主要报错点在于"ICC = pg.intraclass_corr(data = data_inter, targets = “R”, raters = “PAT”, ratings = colName)"。

              当我进行单个运行的时候,改代码为"ICC = pg.intraclass_corr(data = data_inter, targets = “R”, raters = “PAT”, ratings =

              • “LongRunLowGrayLevelEmphasis.3”)",是运行成功的。我打印colName及其类型的时候,显示为"LowGrayLevelEmphasis.3″
                <class ‘str’>
                “JointAverage.3”
                <class ‘str’>
                “SumAverage.3”
                <class ‘str’>
                “JointEntropy.3”
                <class ‘str’>
                “ClusterShade.3”
                <class ‘str’>
                “MaximumProbability.3”
                <class ‘str’>
                “Idmn.3”
                <class ‘str’>
                “JointEnergy.3”
                <class ‘str’>
                “Contrast.6”
                <class ‘str’>
                “DifferenceEntropy.3”
                我在想是不是读取列名的时候,数据类型发生转换了,把特征名称里面的数字和字母间自动添加了符号".“,导致循环时包含无效字符”.",发生的错误。我没有下载你上传的附件的权限,所以不知道你的文件里面特征名称后面有没有数字。我自己再排除看看

              ltp0810 我把所有特征名称改为"num”后,再print(colName),显示的是num.1
              num.2
              num.3
              num.4
              num.5
              num.6
              num.7
              num.8
              num.9
              num.10 一直到所有特征结束 num.419
              所以我再想是不是colName不是单纯的字符串结构,里面还包含了index内容,直接使用colName不符合pg.intraclass_corr()函数需要的字符串形式。

              [未知用户]
              [未知用户] 我把读取的原文件写入csv文件,打开发现文件里面的有一部分特征名称的后面被添加了“.2″等类似尾缀,比如“ZoneEntropy.3 SmallAreaLowGrayLevelEmphasis.3 Coarseness.3 Complexity.3 Strength.3 Contrast.7 Busyness.3″。麻烦你看下你打印你读取的列名看名称后面有没有被添加类似的尾缀。

                Hukui 你加一下我的qq:3043663828,咱共同学习一下吧

                  问题已经解决,感谢两位的指导,特别是感谢ltp0810

                    6 个月 后

                    ltp0810 谢谢,可以使用,引号需要改成英文,在您的基础上给出我的

                    import pingouin as pg

                    import pandas as pd

                    import numpy as np

                    import os

                    folderPath = r"E:\CT_ICGR15\feeature_selection\delete"

                    data1 = pd.read_csv(os.path.join(folderPath, “10.csv”))

                    data2 = pd.read_csv(os.path.join(folderPath, “11.csv”))

                    data1.insert(0,“reader”,np.ones(data1.shape[0]))

                    data2.insert(0,“reader”,np.ones(data2.shape[0])*2)

                    data1.insert(0,“patient”,range(data1.shape[0]))

                    data2.insert(0,“patient”,range(data2.shape[0]))

                    data_inter = pd.concat([data1, data2]) ###组间

                    print(data_inter.columns)

                    ICC_inter = [] ##组间ICC

                    for colName in data_inter.columns[3:]:

                    ICC = pg.intraclass_corr(data=data_inter, targets="patient", raters="reader", ratings=colName)
                    
                    ICC = ICC.iloc[2, 2] ##选择 ICC3 ##选择ICC1时为[0,1], 选择ICC2时后面为[1,2]…
                    
                    ICC_inter.append(ICC)
                    
                    print(ICC_inter) 

                    df=pd.DataFrame(ICC_inter)

                    df.to_csv(‘E:\\ICC_inter.csv’)

                    2 个月 后

                    楼主您好,我在使用for循环后只输出一个结果,是因为每次循环都覆盖了前一次的结果吗?怎么解决呢?以下是我的代码,感谢赐教!

                    import pingouin as pg

                    import pandas as pd

                    import numpy as np

                    import os

                    folderPath = “C:\\Users\\HP15-ab006TX\\Desktop\\ICC”

                    data1 = pd.read_csv(os.path.join(folderPath, “AA.csv”))

                    data2 = pd.read_csv(os.path.join(folderPath, “BB.csv”))

                    data1.insert(0,“reader”,np.ones(data1.shape[0]))

                    data2.insert(0,“reader”,np.ones(data2.shape[0])*2)

                    data1.insert(0,“patient”,range(data1.shape[0]))

                    data2.insert(0,“patient”,range(data2.shape[0]))

                    data_inter = pd.concat([data1, data2])

                    print(data_inter.columns)

                    icc_inter = []

                    for colName in data_inter.columns[2:]:

                    ICC = pg.intraclass_corr(data=data_inter, targets="patient", raters="reader", ratings=colName)

                    ICC = ICC.iloc[0, 2]

                    icc_inter.append(ICC)

                    print(icc_inter)

                      说点什么吧...