栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

舆情实验2

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

舆情实验2

CNN News Story Dataset 任务简介

Summary: 给定长文本(story), 生成摘要(highlight), 数据集采用CNN News Story Dataset.

解压数据

$ cd data
$ unzip cnn_stories_tokenized.zip
数据加载
from load_data import load_stories

directory = 'data/cnn_stories_tokenized/'
stories = load_stories(directory, 10000)
print('Loaded Stories %d' % len(stories))

原始文本

# 原始文本
Atlanta -LRB- CNN -RRB- -- A young girl bravely stood to ask the Dalai Lama 's doctor a question , and he gave her an unusual answer .
Dr. Tsewang Tamdin , a world-renowned expert in Tibetan medicine , visited Emory University in Atlanta on Monday as part of his effort to reach more American medical practitioners . He wants to develop collaborative projects between the Tibetan medicine system , which is more than 2,500 years old , and Western medicine .
The little girl told Tamdin she suffered from asthma . She wanted to know if there was anything in Tibetan medicine that could help her get better .
Tamdin , who spoke through a translator for ...

# 参考摘要1
Tibetan medical experts want more collaborative projects with modern medicine
# 参考摘要2
Tibetan doctors sometimes prescribe kindness and compassion to cure illness

相关summarize工具

  • summarize(实现了TextRank的工具包)
from gensim.summarization.summarizer import summarize

依赖

  • gensim
  • sumeval
任务
  • 抽取式算法 (要求手写这部分代码)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EGOUuDsx-1637026519184)(assets/text_rank.png)]

  • 生成式算法 (了解)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ISIvGu7P-1637026519188)(assets/encoder-decoder.png)]

Summary eval

摘要的评测指标采用了Rouge和Bleu,使用python sumeval可实现评测,使用方法如下.

  • Rouge Metric
from sumeval.metrics.rouge import RougeCalculator


rouge = RougeCalculator(stopwords=True, lang="en")

rouge_1 = rouge.rouge_n(
            summary="I went to the Mars from my living town.",
            references="I went to Mars",
            n=1)

rouge_2 = rouge.rouge_n(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"],
            n=2)

rouge_l = rouge.rouge_l(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"])

# You need spaCy to calculate ROUGE-BE

rouge_be = rouge.rouge_be(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"])

print("ROUGE-1: {}, ROUGE-2: {}, ROUGE-L: {}, ROUGE-BE: {}".format(
    rouge_1, rouge_2, rouge_l, rouge_be
).replace(", ", "n"))
  • Bleu Metric
from sumeval.metrics.bleu import BLEUCalculator


bleu = BLEUCalculator()
score = bleu.bleu("I am waiting on the beach",
                  "He is walking on the beach")
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/529722.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号