1.在登录界面打开检查,从检查network里捕捉login的登录信息和login_url
登录信息即是from_data
2.将登录信息作为python字典data里元素
3.调用session = requests.session()
4.session.post(login_url,data)
post 或 get根据检查里信息看
获取cookie信息
5.在登录以后的界面找到要爬取的信息
刷新 页面,在检查里看network的preview找到文字内容后,在preview旁边的headers找到url
用resp = session.get(url)爬取信息,可以将爬取的信息存到字符串中
6.用xpath,re,bs4继续爬取信息
import requests
import re
login_url = "xxxxxxxxxxxx"
data = {
:,
:
}
session = requests.session()
session.post(login_url,data)
url = 'xxxxxxxxxxxxxxxxxxx'
resp = session.get(url)
#print(resp.text)
com1 = re.compile('"authorPenName":"(?P.*?)",',re.S)
com2 = re.compile('"bookName":"(?P.*?)",',re.S)
a_list = []
b_list = []
au_list = com1.finditer(resp.text)
bo_list = com2.finditer(resp.text)
for a in au_list:
a_list.append(a.group("a_name"))
for b in bo_list:
b_list.append(b.group("b_name"))
i = 0
for i in range(len(a_list)):
print(a_list[i]," ",b_list[i]) 


