您可以使用Dataframe API的 apply 方法:
import pandas as pdimport nltkdf = pd.Dataframe({'sentences': ['This is a very good site. I will recommend it to others.', 'Can you please give me a call at 9983938428. have issues with the listings.', 'good work! keep it up']})df['tokenized_sents'] = df.apply(lambda row: nltk.word_tokenize(row['sentences']), axis=1)输出:
>>> df sentences This is a very good site. I will recommend it ... 1 Can you please give me a call at 9983938428. h... 2 good work! keep it up tokenized_sents 0 [This, is, a, very, good, site, ., I, will, re... 1 [Can, you, please, give, me, a, call, at, 9983... 2[good, work, !, keep, it, up]
要查找每个文本的长度,请尝试再次使用 apply 和 lambda函数 :
df['sents_length'] = df.apply(lambda row: len(row['tokenized_sents']), axis=1)>>> df sentences This is a very good site. I will recommend it ... 1 Can you please give me a call at 9983938428. h... 2 good work! keep it up tokenized_sents sents_length 0 [This, is, a, very, good, site, ., I, will, re... 14 1 [Can, you, please, give, me, a, call, at, 9983... 15 2[good, work, !, keep, it, up] 6



