我找到了答案。在此,我们必须生成 arff 文件。
在.arff文件中
@RELATION部分 将包含经过 预处理 后整个文档中存在的所有单词。每个单词都将是 实数 类型,因为 tfidf值
是实数。
@data节* 将包含在 预处理 期间计算的 tfidf 值。例如,第一个将包含 tfidf值的
第一个文档中出现的所有单词,最后将文档分类。 ***
@RELATION filename@ATTRIBUTE word1 real@ATTRIBUTE word2 real@ATTRIBUTE word3 real....so on@ATTRIBUTE class {cacm,cisi,cran,med}@data0.5545479562,0.27,0.554544479562,0.4479562,cacm0.5545479562,0.27,0.554544479562,0.4479562,cacm0.55454479562,0.1619617,0.579562,0.5542,cisi0.5545479562,0.27,0.554544479562,0.4479562,cisi0.0,0.2396113617,0.44479562,0.2,cran0.5545479562,0.27,0.554544479562,0.4479562,carn0.5545177444479562,0.26196113617,0.0,0.0,med0.5545479562,0.27,0.554544479562,0.4479562,med生成此文件后,您可以将此文件作为输入
InfoGainAttributeeval.java。这对我有用。



