目标布朗语料库是一个研究文体之间的系统性差异(文体学)的资源。
处理布朗语料库的新闻和言情文体,找出一周中最有新闻价值并且最浪漫的日子。
代码实现1.合并新闻和言情语料中的单词,统计总频率
import nltk
news_text = brown.words(categories='news')
romance_text = brown.words(categories = 'romance')
text = news_text + romance_text
days = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'] #用链表存储一星期中每天的名称
cfd = nltk.ConditionalFreqDist(
(day,'total') #二元组规定了表格的行列
for day in days
for word in text
if word == day #条件,条件是被统计次数的事件
)
cfd.tabulate(conditions = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'])
#用conditions规定条件的顺序
运行结果:
import nltk
days = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
cfd = nltk.ConditionalFreqDist(
(day,cat) #列标是天,行标是语料种类
for day in days
for cat in ['news','romance']
for w in brown.words(categories = cat)
if w == day
)
cfd.tabulate(conditions = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'])
运行结果:
周一和周日是最有新闻价值且浪漫的,其中周一更有新闻价值一点,周日更浪漫。



