对于更一般的解决方案,
account_status如果每个组中至少存在一个
active值,则仅删除每个组中的另一个值:
print (df) product product_id account_status0 prod-A 100 active1 prod-A 100 cancelled <- necessary remove2 prod-A 300 active3 prod-A 400 cancelled4 prod-A 500 active5 prod-A 500 active6 prod-A 600 cancelled7 prod-A 600 cancelleds = df['account_status'].eq('active')g = df.assign(A=s).groupby(['product','product_id'])['A']mask = ~g.transform('any') | g.transform('all') | sdf = df[mask]print (df) product product_id account_status0 prod-A 100 active2 prod-A 300 active3 prod-A 400 cancelled4 prod-A 500 active5 prod-A 500 active6 prod-A 600 cancelled7 prod-A 600 cancelled还可以与多个类别配合使用:
print (df) product product_id account_status0 prod-A 100 active1 prod-A 100 cancelled <- necessary remove2 prod-A 100 pending <- necessary remove3 prod-A 300 active4 prod-A 300 pending <- necessary remove5 prod-A 400 cancelled6 prod-A 500 active7 prod-A 500 active8 prod-A 600 pending9 prod-A 600 cancelleds = df['account_status'].eq('active')g = df.assign(A=s).groupby(['product','product_id'])['A']mask = ~g.transform('any') | g.transform('all') | sdf = df[mask]print (df) product product_id account_status0 prod-A 100 active3 prod-A 300 active5 prod-A 400 cancelled6 prod-A 500 active7 prod-A 500 active8 prod-A 600 pending9 prod-A 600 cancelled


