Python爬虫实现HTTP网络请求多种实现方式

1、通过urllib.requests模块实现发送请求并读取网页内容的简单示例如下：

#导入模块
import urllib.request
#打开需要爬取的网页
response = urllib.request.urlopen('http://www.baidu.com')
#读取网页代码
html = response.read()
#打印读取的内容
print(html)

结果：

b'nnn n nxe7x99xbexe5xbaxa6xe4xb8x80xe4xb8x8bxefxbcx8cxe4xbdxa0xe5xb0xb1xe7x9fxa5xe9x81x93#form .bdsug{top:39px}.bdsug{display:none;position:absolute;width:535px;background:#fff;border:1px solid 
………………（太多省略）

以上示例中是通过get请求方式获取百度的网页内容。

下面是通过urllib.request模块的post请求实现获取网页信息的内容：

#导入模块
import urllib.parse
import urllib.request
#将数据使用urlencode编码处理后，再使用encoding设置为utf-8编码
data = bytes(urllib.parse.urlencode({'word':'hello'}),encoding='utf-8')
#打开指定需要爬取的网页
response = urllib.request.urlopen('http://httpbin.org/post',data=data)
html = response.read()
#打印读取的内容
print(html)

结果：

b'{n "args": {}, n "data": "", n "files": {}, n "form": {n "word": "hello"n }, n "headers": {n "Accept-Encoding": "identity", n "Content-Length": "10", n "Content-Type": "application/x-www-form-urlencoded", n "Host": "httpbin.org", n "User-Agent": "Python-urllib/3.7", n "X-Amzn-Trace-Id": "Root=1-5ec3f607-00f717e823a5c268fe0e0be8"n }, n "json": null, n "origin": "123.139.39.71", n "url": "http://httpbin.org/post"n}n'

2、urllib3模块

通过urllib3模块实现发送网络请求的示例代码：

#导入模块
import urllib3
#创建PoolManager对象，用于处理与线程池的连接以及线程安全的所有细节
http = urllib3.PoolManager()
#对需要爬取的网页发送请求
response = http.request('GET','https://www.baidu.com/')
#打印读取的内容
print(response.data)

结果：

b'rnrnrntrntrntrntrntrntrntrntrntrntrntxe7x99xbexe5xbaxa6xe4xb8x80xe4xb8x8bxefxbcx8cxe4xbdxa0xe5xb0xb1xe7x9fxa5xe9x81x93rntrntrntrnt