更新:原始答案计算包含子字符串的行。
要计算子字符串的所有出现次数,可以使用
.str.count:
In [21]: df = pd.Dataframe(['hello', 'world', 'hehe'], columns=['words'])In [22]: df.words.str.count("he|wo")Out[22]:0 11 12 2Name: words, dtype: int64In [23]: df.words.str.count("he|wo").sum()Out[23]: 4该
str.contains方法接受正则表达式:
Definition: df.words.str.contains(self, pat, case=True, flags=0, na=nan)Docstring:Check whether given pattern is contained in each string in the arrayParameters----------pat : string Character sequence or regular expressioncase : boolean, default True If True, case sensitiveflags : int, default 0 (no flags) re module flags, e.g. re.IGNORECASEna : default NaN, fill value for missing values.
例如:
In [11]: df = pd.Dataframe(['hello', 'world'], columns=['words'])In [12]: dfOut[12]: words0 hello1 worldIn [13]: df.words.str.contains(r'[hw]')Out[13]:0 True1 TrueName: words, dtype: boolIn [14]: df.words.str.contains(r'he|wo')Out[14]:0 True1 TrueName: words, dtype: bool
要计算出现的次数,您可以对布尔系列求和:
In [15]: df.words.str.contains(r'he|wo').sum()Out[15]: 2In [16]: df.words.str.contains(r'he').sum()Out[16]: 1



