一、python和mongodb的交互
第一步 安装模块
pip install pymongodb
第二步 使用
(1) 导入模块:import pymongo
(2) 连接mongodb:
mongo_client = pymongo.MongoClient() 或者: mongo_client = pymongo.MongoClient(host='127.0.0.1',port=27017)
(3) 增删改查逻辑
然后我们在mongodb终端show dbs,可得:
二、用类的方法(即面向对象)实现增删改查
1、添加一个:
import pymongo
class MongoData():
def __init__(self):
self.client = pymongo.MongoClient(host='127.0.0.1',port=27017)
self.db = self.client['banji']['student']
# 定义添加的方法
def add_one(self,data):
result = self.db.insert_one(data)
print(result)
if __name__ == '__main__':
md = MongoData()
md.add_one({'name':'abc'})
运行之后,我们在mongodb终端查看,可得:
2、添加多个:
import pymongo
class MongoData():
def __init__(self):
self.client = pymongo.MongoClient(host='127.0.0.1',port=27017)
self.db = self.client['banji']['student']
# 定义添加的方法
def add_one(self,data):
result = self.db.insert_one(data)
print(result)
def add_many(self,data):
result = self.db.insert_many(data)
print(result.inserted_ids)
if __name__ == '__main__':
md = MongoData()
#md.add_one({'name':'abc'})
md.add_many([{'x': i} for i in range(2)])
终端结果:
3、查找一个:
import pymongo
class MongoData():
def __init__(self):
self.client = pymongo.MongoClient(host='127.0.0.1',port=27017)
self.db = self.client['banji']['student']
# 定义添加的方法
def add_one(self,data):
result = self.db.insert_one(data)
print(result)
def add_many(self,data):
result = self.db.insert_many(data)
print(result.inserted_ids)
# 定义查询的方法
def get_one(self,query=None):
if query is None:
return self.db.find_one() #find_one返回满足条件的第一个
else:
return self.db.find_one(query)
if __name__ == '__main__':
md = MongoData()
g=md.get_one()
print(g)
h=md.get_one({'x':1})
print(h)
4、查找多个:
遇到Cursor对象,用for循环:
学习的方法: 通过源码找实现的逻辑步骤。
三、案例(http://www.touxiao8.com/xkai/index.html)
爬虫文件:
# -*- coding: utf-8 -*-
import scrapy
from xiaohuadaquan.items import XiaohuadaquanItem
class Touxiao8Spider(scrapy.Spider):
name = 'touxiao8'
allowed_domains = ['touxiao8.com']
start_urls = ['http://www.touxiao8.com/xkai/index.html']
def parse(self, response):
r=response.xpath('//div[@]/ul/li/div[@]')
item=XiaohuadaquanItem()
for i in r:
item['title']=i.xpath('.//a/text()').get()
item['context']=i.xpath('.//p/text()').get()
yield item
#翻页
next_href=response.xpath('//div[@]/a[@]/@href').get()
#print(next_href)
if next_href:
yield scrapy.Request(next_href)
settings文件:
(1)ROBOTSTXT_OBEY = False
(2)LOG_LEVEL = ‘WARNING’
(3)把管道配置打开;
items文件:注册爬取字段
最重要的管道文件:
import pymongo class XiaohuadaquanPipeline: def __init__(self): self.mongo_client = pymongo.MongoClient() def process_item(self, item, spider): data=dict(item) self.mongo_client['xiaohuadaquan']['xihua8'].insert(data) return item
cmd中运行爬虫后,我们在mongodb终端可以看到:



