截屏整个页面获得验证码坐标数据根据坐标数据抠图使用pytesseract模块进行验证 问题:
利用save_screenshot和PIL模块的crop截取验证码区域时,截取不到正确的二维码图片。
**原因:**坐标定位不准确,电脑默认缩放的值为125%。将其改为100%即可正确定位;或者将对应的左上角坐标x、y,图片宽高width、height分别乘以对应的比例1.25,同样可以准确截取图片。
图片验证码识别方案:
- 使用pytesseract库:该方法适合各种变形较少的验证码,对于扭曲的字母和数字识别率大大降低。调用showapi 提供的 图片验证码识别自己训练机器学习模型实现验证码识别。
import pickle
import string
import time
from lib.ShowapiRequest import ShowapiRequest
from PIL import Image
import os
import random
import base64
def get_code(driver, id):
# 获取验证码图片
t = time.time()
path = os.path.dirname(os.path.dirname(__file__)) + '\screenshots'
picture_name1 = path + '\' + str(t) + '.png'
driver.save_screenshot(picture_name1)
ce = driver.find_element_by_id(id)
left = ce.location['x']
top = ce.location['y']
right = ce.size['width'] + left
down = ce.size['height'] + top
dpr = driver.execute_script('return window.devicePixelRatio')
im = Image.open(picture_name1)
img = im.crop((left * dpr, top * dpr, right * dpr, down * dpr))
t = time.time()
picture_name2 = path + '\' + str(t) + '.png'
img.save(picture_name2)
with open(picture_name2, "rb") as fs:
# b64encode:编码,b64decode: 解码
base64_data = base64.b64encode(fs.read())
# base64.b64decode(base64_data)
r = ShowapiRequest("http://route.showapi.com/2360-2", "927638", "1a0681c845114fbf9beaa7a666c6eb82")
r.addBodyPara("file_base64", base64_data)
res = r.post()
text = res.json()['showapi_res_body']
code = text['pic_str']
return code
# 生成随机字符串
def gen_random_str():
rand_str = ''.join(random.sample((string.ascii_letters + string.digits), 8)) # 随机生成8位的字母+数字组合字符
return rand_str
def save_cookie(driver, path):
with open(path, 'wb') as filehandler:
cookies = driver.get_cookies()
print(cookies)
pickle.dump(cookies, filehandler)
def load_cookie(driver, path):
with open(path, 'rb') as cookiesfile:
cookies = pickle.load(cookiesfile)
for cookie in cookies:
driver.add_cookie(cookie)



