栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

python使用urllib爬虫‍♀️‍♀️‍♀️

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

python使用urllib爬虫‍♀️‍♀️‍♀️

#  -*- coding = utf-8 -*-
from bs4 import BeautifulSoup
from urllib.request import urlopen, Request, build_opener, HTTPcookieProcessor
from http.cookiejar import cookieJar
import csv

if __name__ == '__main__':
    url = "https://cs.5i5j.com/ershoufang/"
    req = Request(url, None, {'Connection': 'Keep-Alive',
                              'Accept': 'textml, application/xhtml+xml, */*',
                              'Accept-Language': 'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3',
                              'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko'})
    # 爬这个网站需要伪造cookie
    cj = cookieJar()
    opener = build_opener(HTTPcookieProcessor(cj))
    response = opener.open(req)
    # 存入本地慢慢爬
    with open('download/textInfo.html', 'wb') as f:
        f.write(response.read())
    # 读取
    with open('download/textInfo.html', 'rb') as f:
        data = f.read()
    # print(data)

    
    # HouseInfo.csv用来存爬下来的信息
    f = open('HouseInfo.csv', 'wt', newline='', encoding='utf-8')
    writer = csv.writer(f)
    # 存消息头
    writer.writerow(('synopsis', 'totalPrice', 'priceSquare'))

    soup = BeautifulSoup(data, "html.parser")
    ul = soup.find("ul", class_="pList")
    lis = ul.findAll("li")
    for i in range(len(lis)):
        li = lis[i]
        h3 = li.find("h3", class_="listTit")
        # 房屋简介
        synopsis = h3.find("a").get_text()
        # print(synopsis)
        jia = li.find("div", class_="jia")
        price = jia.findAll("p")
        # 房屋总价
        totalPrice = price[0].get_text()
        # 房屋每平方价格
        priceSquare = price[1].get_text()
        # print(totalPrice)
        # print(priceSquare)
        writer.writerow((synopsis, totalPrice, priceSquare))
    f.close()

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/321579.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号