栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

【Python爬虫】懂车帝

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

【Python爬虫】懂车帝

00.序言

就是想爬一下懂车帝的车型库页面....花了一晚上的时间....(好多坑,哭泣)
 

01.代码(全部)
import requests
import sqlite3


# 车型库
def Dongchedi(offset):
    url = 'https://www.dongchedi.com/motor/brand/m/v6/select/series/?city_name=%E6%AD%A6%E6%B1%89'
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36'}
    data = {'offset': '{}'.format(offset),'limit': 20,'is_refresh': 1,'city_name': '武汉'}
    response = requests.post(url, headers=headers, data=data).json()

    all_cak = response['data']['series']

    for caks in all_cak:
        caks_id = caks['concern_id']
        caks_url = 'https://www.dongchedi.com/auto/series/' + '{}'.format(caks_id)
        cak_name = caks['outter_name']
        print(cak_name, caks_url)

    return caks_id


# 详情页
def Detail():
    for offset in range(0, 2000):
        caks_id = Dongchedi(offset)
        datail_url = 'https://www.dongchedi.com/motor/car_page/m/v1/series_all_json/?series_id=' + str(caks_id) + '&city_name=武汉&show_city_price=1&m_station_dealer_price_v=1'
        headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36'}
        response = requests.get(datail_url,headers=headers).json()
        series_all = response['data']
        online_name = series_all['online']

        for data_name in online_name:
            name_info = data_name['info']  # 全部车型模块所有数据
            try:
                if name_info['brand_name']:  # 判断是否为总型号
                    series_name = name_info['name']  # 所有详细车型名称
                    car_name = name_info['series_name']
                    name = str(car_name) + '-' + str(series_name)
                    # 指导价, 经销商报价 -价格字典
                    price_info = name_info['price_info']
                    dealer_price = str(price_info['official_price'])  # 指导价
                    official_price1 = name_info['dealer_price']  # 经销商报价
                    official_price = official_price1.replace('万', '')  # 去除 ‘万’
                    # 车主参考价 -价格字典(页面上经常没有这个价格,如果为空,则返回‘-’)
                    try:
                        owner_price_summary = name_info['owner_price_summary']
                        naked_price_avg = owner_price_summary['naked_price_avg']  # 车主参考价
                    except:
                        naked_price_avg = '-'
                    # 保存数据库
                    data_info = [name, dealer_price, official_price, naked_price_avg]
                    SaveMysql(data_info)

            except Exception as error:
                print(error)
                pass


# 保存到数据库
def SaveMysql(data_info):
    conn = sqlite3.connect('movie.db')
    cursor = conn.cursor()  # 创建游标

    for index in range(len(data_info)):
        data_info[index] = '"'+data_info[index]+'"'
    insert_sql = 'insert into dongchedi(car_name,guide_price,dealer_price,owner_price) values(%s);' %','.join(data_info)
    print(insert_sql)

    cursor.execute(insert_sql)  # 写入操作
    print('save to mysql')
    conn.commit()  # 提交
    cursor.close()  # 关闭游标
    conn.close()  # 关闭数据库

if __name__ == '__main__':
    try:
        Detail()
    except Exception as error:
        print(error)
        pass



02.输入页面 

03.希望输出页面

04.总结

 

4.1 坑
  1. header一定要加,必须加!不然就拿不到数据!
  2. insert语句有两种生成方法,列表用循环加上引号,用join加上逗号,然后塞进insert语句里;另一种方法,元组和insert语句,在excute语句里相遇

 

4.2 知识归纳
  1. try...except
  2. sqlite3和mysql
  3. 爬虫三部曲:url-->request-->json()

 

参考链接:

Python爬虫+数据分析:爬一爬那个很懂车的网站,分析一下现阶段哪款车值得我们去冲_人生苦短, 还不用Python?-CSDN博客
Python 爬取懂车帝详情页“全部车型模块信息”!懂车帝就火起来了吗?_爬遍所有网站-CSDN博客_懂车帝python

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/655737.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号