pandas 查询筛选(tcy)

1.1.方法：
1) df[条件]#多个条件必须加括号（不能用and or）
    df.A <5 or (df.A<5)& (df.A 
1.2.说明：
1) eval,query,比较
    相同点：计算表达式结果
    不同点：eval若表达式为逻辑，结果返回bool数组;query则返回bool数组的数据
    
2)Python字符串比较
    使用==等比较运算符来比较两个str内的value值是否相同
    is：比较两个字符串的id值 
实例1.1：df[条件]
df[(df.A>1) & (df['性别'] == '男')]#必须加括号
df[df.A.isin([3,5,9])]

df[[x.startswith('张') for x in df['姓名']]]#字符串查询（没有下面高效）
df['姓名'].map(lambda x: x.startswith('张'))#找到所有姓张的人的信息
 
实例1.2：比较运算符，逻辑运算符
import numexpr
df = pd.Dataframe(np.arange(12).reshape(3, 4), columns=list('ABCD'))
a,b,c=df.A,df.B,df.C

#以下全部等价
df[(df.A < df.B) & (df.B < df.C)]#python方式
df[df.eval('A 
 
实例1.3：==，!= 类似in/not in
df.query('B == [1,5]')
df.query('[1, 5] in B')
df[df.B.isin([1,5])]# pure Python B列包含1，5的数据

df.query('B != [1,5]')
df.query('[1, 5] not in B')
df[~df.B.isin([1,5])]

df.query('A in B and C < D')#df[df.A.isin(df.B) & (df.C < df.D)] 
实例1.4：
    
df = pd.Dataframe(np.arange(9).reshape(3,3), columns=list('ABC'))
df['Bool'] = df.eval('C>=5')
df.query('A < B< C and (not Bool) or Bool > 2')            #短查询语法
df[(df.A < df.B) & (df.B < df.C) & (~df.Bool) | (df.Bool > 2)]#等效于纯Python

rst1=df.query('not Bool')
rst2=(df.query('not Bool') == df[~df.Bool])
    
# df                            rst1                      rst2
   A  B  C  Bool             A  B  C  Bool            A      B    C    Bool
0  0  1  2  False        0  0  1  2  False        0 True  True  True   True
1  3  4  5   True                
2  6  7  8   True     
实例2：
u = df['C'].mean()                  #6.0
df[(df.A < u) & (df.B < u)]#等价下面
df.query('A < @u and B < @u')#@符合来标记本地变量等价 
实例3：多索引
实例3.1：列名
df.query('(A < B) & (B < C)')   #numexpr 方式 A,B,C为列名
    
实例3.2：单索引名+列名
df.query('a < B and B < C')    #a为单索引名，B,C为列名
df.query('index < B < C')       #index为单索引(非索引名)，B,C为列名
    
实例3.3：单索引名a与列名a相同
df.query('a > 2')                     # 用列'a',单索引名a与列名a相同列名称优先
df.query('index > 2')             #index为单索引(非索引名),单索引名a与列名a相同列名称优先
    
实例3.4：列名为index- 应该考虑将列重命名
df.query('ilevel_0 > 2')         #ilevel_0为单索引(非索引名) 
实例4：多索引MultiIndex

colors = np.random.choice(['red', 'blue'], size=6)
foods = np.random.choice(['eggs', 'meat'], size=6)
index = pd.MultiIndex.from_arrays([colors, foods], names=['color', 'food'])
df = pd.Dataframe(np.arange(12).reshape(6, 2), index=index)
df
               0    1
color  food
blue   meat    0    1
       eggs    2    3
       meat    4    5
red    meat    6    7
blue   meat    8    9
       eggs   10  11
    
 
实:4.1：索引名
df.query('color == "red"') 
 
实例4.2：索引无名称
df.index.names = [None, None]
df.query('ilevel_0 == "red"')  #ilevel_0第0级的索引级别
df.query('ilevel_1 == "meat"')#ilevel_1第1级的索引级别 
实例5：多数据df - 具有相同列名（或索引级别/名称）
df1 = pd.Dataframe(np.arange(12).reshape(4,3), columns=list('abc'))+10
df2=df1+10
expr = '19 <= a <= c <= 22'
rst=list(map(lambda frame: frame.query(expr), [df1, df2]))
    
# df1                   df2                        rst
   a   b    c            a   b   c      [     a    b    c
0  10  11  12        0  20  21  22         3  19  20  21,
1  13  14  15        1  23  24  25            a    b    c
2  16  17  18        2  26  27  28         0  20  21  22]
3  19  20  21        3  29  30  31
pandas 查询筛选(tcy)

Python相关栏目本月热门文章