栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

Python-如何下载NLTK数据?

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Python-如何下载NLTK数据?

要下载特定的数据集/模型,请使用

nltk.download()
函数,例如,如果你要下载
punkt
句子标记器,请使用:

$ python3>>> import nltk>>> nltk.download('punkt')

如果不确定所需的数据/模型,则可以使用以下数据和模型的基本列表开始:

>>> import nltk>>> nltk.download('popular')

它将下载“流行”资源的列表,其中包括:

<collection id="popular" name="Popular packages">      <item ref="cmudict" />      <item ref="gazetteers" />      <item ref="genesis" />      <item ref="gutenberg" />      <item ref="inaugural" />      <item ref="movie_reviews" />      <item ref="names" />      <item ref="shakespeare" />      <item ref="stopwords" />      <item ref="treebank" />      <item ref="twitter_samples" />      <item ref="omw" />      <item ref="wordnet" />      <item ref="wordnet_ic" />      <item ref="words" />      <item ref="maxent_ne_chunker" />      <item ref="punkt" />      <item ref="snowball_data" />      <item ref="averaged_perceptron_tagger" />    </collection>

已编辑
如果有人避免nltk从https://stackoverflow.com/a/38135306/610569上从下载较大的数据集而避免错误

$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite$ python>>> import nltk>>> dler = nltk.downloader.Downloader()>>> dler._update_index()>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.>>> dler.download('popular')

更新

从v3.2.5起,当

nltk_data
找不到资源时,NLTK会提供更多信息,例如:

>>> from nltk import word_tokenize>>> word_tokenize('x')Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize    sentences = [text] if preserve_line else sent_tokenize(text, language)  File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))  File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load    opened_resource = _open(resource_url)  File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open    return find(path_, path + ['']).open()  File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find    raise LookupError(resource_not_found)LookupError: **********************************************************************  Resource punkt not found.  Please use the NLTK Downloader to obtain the resource:  >>> import nltk  >>> nltk.download('punkt')  Searched in:    - '/Users/alvas/nltk_data'    - '/usr/share/nltk_data'    - '/usr/local/share/nltk_data'    - '/usr/lib/nltk_data'    - '/usr/local/lib/nltk_data'    - ''


转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/456431.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号