python 使用 requests 库发送请求及设置代理

1、请求网页时，有的网页有防机器人，所以需要设置 user-agent 标识为浏览器，就不会被拒绝访问；

2、当访问国外需要翻墙的网页时，可以通过设置代理实现访问，分为 http 和 https 两种，另外需要注意 urllib3 版本不要高于 1.26.0，当请求出现错误（check_hostname requires server_hostname）或者一直没响应时，可以检查是否是 urllib3 版本问题；

3、

发送 get 请求时，传参使用参数名 params：requests.get(url, headers, params=params)

发送 post 请求时，传参使用参数名 data：requests.post(url, headers, data=params)

响应数据：

response.status_code # 响应的状态码
response.headers # 响应的头信息
response.text # 返回数据（字符串）
response.json() # 返回数据（json 格式）

response.content # 返回数据（二进制数据）

import requests
import random


def request_url(url):
    user_agent = [
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:10.0) Gecko/20100101 Firefox/10.0',
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
        'Chrome/94.0.4606.61 Safari/537.36',
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
        'Chrome/94.0.4606.71 Safari/537.36 Edg/94.0.992.38',
    ]
    # 频繁请求某个网址偶尔会报错请求超时，可采用下面方式降低失败率，设置重试次数为5，会话设置为不维持连接.
    requests.adapters.DEFAULT_RETRIES = 5
    ses = requests.session()
    ses.keep_alive = False
    # 随机获取 headers
    headers = {
        'User-Agent': random.choice(user_agent)
    }
    # 设置代理，分为 http 和 https
    proxies = {
        "http": "http://127.0.0.1:1080",
        "https": "https://127.0.0.1:1080"
    }
    try:
        response = ses.get(url=url, headers=headers, proxies=proxies)
    except requests.exceptions.RequestException as e:
        print(f'request url {url} occurs error')
        print(e)
    print(response.status_code, response.headers)

# 获取文件大小
def get_file_size(file):
	file_size = int(requests.head(file, headers=self.headers).headers['Content-Length'])
	return file_size

if __name__ == '__main__':
    request_url(url='http://www.baidu.com')

python 使用 requests 库发送请求及设置代理

Python相关栏目本月热门文章