我看到您可能有两种方法可以执行此操作。
对于整个Dataframe
此方法删除整个Dataframe中很少出现的值。我们可以使用内置函数来加快处理速度,而无需循环。
import pandas as pdimport numpy as npdf = pd.Dataframe(np.random.randint(0, high=9, size=(100,2)), columns = ['A', 'B'])threshold = 10 # Anything that occurs less than this will be removed.value_counts = df.stack().value_counts() # Entire Dataframe to_remove = value_counts[value_counts <= threshold].indexdf.replace(to_remove, np.nan, inplace=True)
逐列
此方法删除每个列中不经常出现的条目。
import pandas as pdimport numpy as npdf = pd.Dataframe(np.random.randint(0, high=9, size=(100,2)), columns = ['A', 'B'])threshold = 10 # Anything that occurs less than this will be removed.for col in df.columns: value_counts = df[col].value_counts() # Specific column to_remove = value_counts[value_counts <= threshold].index df[col].replace(to_remove, np.nan, inplace=True)



