栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

在Javascript中抓取数据

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

在Javascript中抓取数据

已经准备好以json格式存储所需的所有数据。

Scrapy

shell
在编写蜘蛛程序之前提供了一个非常方便思想者访问网站的命令:

$ scrapy shell https://www.mcdonalds.com.sg/locate-us/2013-09-27 00:44:14-0400 [scrapy] INFO: Scrapy 0.16.5 started (bot: scrapybot)...In [1]: from scrapy.http import FormRequestIn [2]: url = 'https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php'In [3]: payload = {'action': 'ws_search_store_location', 'store_name':'0', 'store_area':'0', 'store_type':'0'}In [4]: req = FormRequest(url, formdata=payload)In [5]: fetch(req)2013-09-27 00:45:13-0400 [default] DEBUG: Crawled (200) <POST https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php> (referer: None)...In [6]: import jsonIn [7]: data = json.loads(response.body)In [8]: len(data['stores']['listing'])Out[8]: 127In [9]: data['stores']['listing'][0]Out[9]: {u'address': u'678A Woodlands Avenue 6<br/>#01-05<br/>Singapore 731678', u'city': u'Singapore', u'id': 78, u'lat': u'1.440409', u'lon': u'103.801489', u'name': u"McDonald's Admiralty", u'op_hours': u'24 hours<br>rnDessert Kiosk: 0900-0100', u'phone': u'68940513', u'region': u'north', u'type': [u'24hrs', u'dessert_kiosk'], u'zip': u'731678'}

简而言之:在你的Spider中,你必须返回

FormRequest(...)
上面的内容,然后在回调中从中加载json对象
response.body
,最后为列表中每个商店的数据
data['stores']['listing']
创建一个具有所需值的项目。

像这样:

class McDonaldSpider(baseSpider):    name = "mcdonalds"    allowed_domains = ["mcdonalds.com.sg"]    start_urls = ["https://www.mcdonalds.com.sg/locate-us/"]    def parse(self, response):        # This receives the response from the start url. But we don't do anything with it.        url = 'https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php'        payload = {'action': 'ws_search_store_location', 'store_name':'0', 'store_area':'0', 'store_type':'0'}        return FormRequest(url, formdata=payload, callback=self.parse_stores)    def parse_stores(self, response):        data = json.loads(response.body)        for store in data['stores']['listing']: yield McDonaldsItem(name=store['name'], address=store['address'])


转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/374349.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号