代码练习①——爬取【阿里巴巴】一家公司的信息,并存入数据库
import requests #引用requests库
import re#引用re库
import pymysql#引用PyMySQL库
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'}#模拟浏览器访问强求,在谷歌浏览器输入about:version即可获取
#自定义函数,提取并清洗数据
def baidu(company):
url='https://www.baidu.com/s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word='+company
res=requests.get(url,headers=headers).text
p_source = '新闻来源:(.*?)"'
source = re.findall(p_source, res, re.S)
p_date = '发布于:(.*?)"'
date = re.findall(p_date, res, re.S)
p_href = '
运行结果:
代码练习②——爬取5家公司的信息,并存入数据库
import requests #引用requests库
import re#引用re库
import pymysql#引用PyMySQL库
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'}#模拟浏览器访问强求,在谷歌浏览器输入about:version即可获取
#自定义函数,提取并清洗数据
def baidu(company):
url='https://www.baidu.com/s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word='+company
res=requests.get(url,headers=headers).text
p_source = '新闻来源:(.*?)"'
source = re.findall(p_source, res, re.S)
p_date = '发布于:(.*?)"'
date = re.findall(p_date, res, re.S)
p_href = '
运行结果:



