尝试这个:
from sklearn.feature_extraction.text import TfidfVectorizervect = TfidfVectorizer(sublinear_tf=True, analyzer='word', stop_words='english', tokenizer=tokenize, strip_accents='ascii',dtype=np.float16)X = vect.fit_transform(df.pop('Phrase')) # NOTE: `.toarray()` was removedfor i, col in enumerate(vect.get_feature_names()): df[col] = pd.SparseSeries(X[:, i].toarray().reshape(-1,), fill_value=0)更新: 对于Pandas 0.20+,我们可以
SparseDataframe直接从稀疏数组构造:
from sklearn.feature_extraction.text import TfidfVectorizervect = TfidfVectorizer(sublinear_tf=True, analyzer='word', stop_words='english', tokenizer=tokenize, strip_accents='ascii',dtype=np.float16)df = pd.SparseDataframe(vect.fit_transform(df.pop('Phrase')), columns=vect.get_feature_names(), index=df.index)


