通过找到占用百分比来遮罩,即:
series = pd.value_counts(df.column)mask = (series/series.sum() * 100).lt(1)# To replace df['column'] use np.where I.e df['column'] = np.where(df['column'].isin(series[mask].index),'Other',df['column'])
要使用sum更改索引:
new = series[~mask]new['Other'] = series[mask].sum()Windows 26083iOS 19711Android 13077Macintosh 5799Other 832Name: 1, dtype: int64
如果要替换索引,则:
series.index = np.where(series.index.isin(series[mask].index),'Other',series.index)Windows 26083iOS 19711Android 13077Macintosh 5799Other 347Other 285Other 167Other22Other11Name: 1, dtype: int64
说明
(series/series.sum() * 100) # This will give you the percentage i.eWindows 39.820158iOS 30.092211Android 19.964276Macintosh 8.853165Chrome OS 0.529755Linux 0.435101Windows Phone 0.254954(not set) 0.033587BlackBerry 0.016793Name: 1, dtype: float64
.lt(1)等于小于1。这会根据该掩码索引为您提供一个布尔掩码并分配数据



