栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

python爬取西安二手房信息

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

python爬取西安二手房信息

import requests
from pyquery import PyQuery as pq
import json
import pandas as pd
from multiprocessing.pool import Pool

columns = ['title', 'msg', 'price', 'per_meter']


def get_a_page(url):
    req = requests.get(url)
    doc = pq(req.text)
    ul = doc('.sellListContent')
    divs = ul.children('.clear .info.clear').items()
    count = 0
    titles = []
    places = []
    msgs = []
    prices = []
    per_meters = []
    for div in divs:
        count += 1
        title = div.children('.title a').text()
        place = div.children('.address .flood .positionInfo a').text()
        msg = div.children('.address .houseInfo').text()
        price = div.children('.address .priceInfo .totalPrice span').text()
        per_meter = div.children('.address .priceInfo .unitPrice').attr('data-price')
        dict = {
            'title': title,
            'place': place,
            'msg': msg,
            'price': price,
            'per_meter': per_meter
        }
        titles.append(title)
        places.append(place)
        msgs.append(msg)
        prices.append(price)
        per_meters.append(per_meter)
        print(str(count) + ':' + json.dumps(dict, ensure_ascii=False))
    datas = {
        'title': titles,
        'place': places,
        'msg': msgs,
        'price': prices,
        'per_meter': per_meters
    }
    df = pd.Dataframe(data=datas, columns=columns)
    df.to_csv('xaesf.csv', mode='a', index=False, header=False)


if __name__ == '__main__':
    pool = Pool(10)
    group = ([f'https://xa.ke.com/ershoufang/pg{x}' for x in range(1, 101)])
    pool.map(get_a_page, group)
    pool.close()
    pool.join()

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/350258.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号