涉及技术:
1.headers中设置user agent反爬机制
2.通过network抓包,分析ajax的请求和参数
3.通过for循环请求不同参数的数据
4.利用pandas 实现excel的合并与保存
# Pyhon爬取北京10年天气数据
import requests
import pandas as pd
import os
url = 'https://tianqi.2345.com/Pc/GetHistory'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
def craw_table(year,month):
""" 提供年份和月份爬取对应的表格数据 """
params = {
'areaInfo[areaId]': 54511,
'areaInfo[areaType]': 2,
'date[year]': year,
'date[month]': month
}
resp = requests.get(url,headers = headers,params = params)
# print(resp.status_code)
# print(resp.text)
data = resp.json()["data"]
# print(data)
df =pd.read_html(data)[0]
return df
# df = craw_table(2015,10)
df_list = []
for year in range(2011,2022):
for month in range(1,13):
print("爬取:",year,month)
df = craw_table(year,month)
'''合并多个dateframe'''
df_list.append(df)
if __name__ == '__main__':
os.chdir(r'C:UsersDELLDesktop')
data = pd.concat(df_list)
data.to_excel('北京10年天气数据.xlsx',index =False)
Pycharm终端安装虚拟环境的包:openxml、lxml
如何合并多个DataFrame方法:
df_list.append(df)
network抓包:



