栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

使用python爬取网站视频

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

使用python爬取网站视频

思路分析:

        1.确定爬取的视频地址 ,这个可以通过F12来对其进行分析

       2.确定爬取视频所需要的的参数 以及分析用哪类请求和header信息的获取

3.最好是提前拿到postman上进行初次访问然后再写代码

4.通过上述的调试发现接口存在且可以访问那么就开始愉快的写代码来解析出视频的所需信息进行

转化为本地视频了

import requests
import json
import urllib
import time
class getVideo(object):

        def getVideo(self):

            url = "http://haokan.baidu.com/videoui/api/videorec"
            header = {
                'cookie': 'BIDUPSID=33B8EE183D9B9E323939352A0E85B7A5; PSTM=1631859469; BAIDUID=33B8EE183D9B9E326F4F62C42A2E56F1:FG=1; __yjs_duid=1_9ee50ae1e2369325576fd487374d2be61632449615247; BDUSS_BFESS=xITzAwWXNCMFVOZFVVV3diRXV-MVhyfnFsdVVvdVVYQ0g2YktXNG0zemt5WFJoSUFBQUFBJCQAAAAAAAAAAAEAAACy~~joAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOQ8TWHkPE1hMW; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BDSFRCVID_BFESS=VdtOJeC627T5vvjHHeiNhbGZJkEmotOTH6aoyitjU2C5TJGLPT8XEG0P8f8g0K4-Nb29ogKK3gOTH4DF_2uxOjjg8UtVJeC6EG0Ptf8g0M5; H_BDCLCKID_SF_BFESS=tJCDoC8XtKL3fP36qRQEh-LDhUO-2I62aKDs_JoYBhcqEnTkLJob36J0y-7vtho3KevzaJ3cyIn8VxbSj4oTbTkXya7KLl_DaJvuBf5P-h5nhMJHb67JDMP0-45nQJby523iob3vQpPMshQ3DRoWXPIqbN7P-p5Z5mAqKl0MLPbtbb0xb6_0Djb-DaDJqTna--oa3RTeb6rjDnCrMxRTXUI82h5y05OqaJcqapn_ttJsHl3Ly4JvyT8sXnORXx7JB5vvbPOMthRnOlRKbpo6qfL1Db3JWhQGMgcTsR7yLbnoepvoDPJc3Mv3Q-jdJJQOBKQB0KnGbUQkeq8CQft20b0EeMtjW6LEtR4t_K0-fC03fP36q45H24k0-qrtetJyaR3p0PbvWJ5TMC_635o-54InyMo0Wtva3aQZBh5Ea-bcShPC-tnBMJ0q3H5mtl3Z32JT-In93l02VKnIe-t2yT3DXxKHq4RMW20e0h7mWIbUsxA45J7cM4IseboJLfT-0bc4KKJxbnLWeIJEjj6jK4JKDGttJ5bP; IMG_WH=2000_413; H_WISE_SIDS=110085_127969_131862_164869_177370_178384_178641_179346_179451_179620_181133_181489_181588_182243_182273_182531_183327_183626_184011_184267_184319_184360_184560_184794_184891_184891_185519_185880_186319_186587_186596_186636_186682_186743_186833_186841_187023_187042_187067_187086_187192_187292_187356_187433_187447_187542_187567_187669_187726_187816_187819_187929_187957_187992_188031_188182_188226_188353_188426_188467_188665_188669_188722_188733_188734_188748_188832_188842_189058_189071_189346_189391_189398_189414_189504_189680_189755; BDRCVFR[X_XKQks0S63]=mk3SLVN4HKm; BDRCVFR[dG2JNJb_ajR]=mk3SLVN4HKm; BDRCVFR[-pGxjrCMryR]=mk3SLVN4HKm; BAIDUID_BFESS=46B2ADF6599C4AC9A65B4AA239316277:FG=1; delPer=0; PSINO=6; BA_HECTOR=05agalal040k0080bd1gmsdcj0q; H_PS_PSSID=34068_31254_34712_34599_34584_34504_34832_34813_26350_34691_34675; Hm_lvt_4aadd610dfd2f5972f1efee2653a2bc5=1633663971,1634612649; HK_CH_EXPIRED_TIME=1634659199000; HK_CH_IS_CLICKED=0; HK_CH_REFRESH_TIMES=1; HK_SID=11796_2; COMMON_LID=496b0aa3b92a4dcfe85dee59d76b441f; Hm_lvt_77ca61e523cd51ec7ac7a23bc4d24edf=1634612656; Hm_lpvt_77ca61e523cd51ec7ac7a23bc4d24edf=1634612796; HK_CH_MAT_INDEX=0; PC_TAB_LOG=video_details_page; ab_sr=1.0.1_MDc5MmI3NzliM2ZjODcwNWY4ZjA3ZWRhNzc5NTQxMDBmMzhjYTM2YzM3ZTRmNWY4N2RmZjI0ODg3YzcxOTE4YmZhZWNlYzE1MGFhYzNlZmI4OGM3NWViMjRjNjRiMjFiMzYxYjc0YmUyY2I5ZmQyYjY0ODllNDc1ZDhiYjg4MmYxMjZjZmViNzViMWQ0NmYwNWJlNzgyMTkwY2QzZTNiOQ==; reptileData={"data":"03bd57de5d85cc095ec2dc325bf0bffa94ba3d7eb3b777760790a66a355a3a003b9c482e43ddf318dfbcd20d71f06f85bfe5301c24b6789788024518bfb4bf0174f252dd7b6b93e3d0fe276942b7c9d164246c4456e74438df7e078cb4216f84f4b592a5b8b4019767d59376f7e2052ebf5bf7776defa8fb84ff8ad015684a51","key_id":"30","sign":"7cec8433"}; Hm_lpvt_4aadd610dfd2f5972f1efee2653a2bc5=1634612832',

                'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Mobile Safari/537.36'
            }
            param = {
                'vid': 6809766516780878997,
                'title': '一首《纤夫的爱》送给你们',
                'pd': 'h5',
                'act': 'h5Rec',
                'referPd': ''
            }
            resInfo = requests.get(url=url, headers=header, params=param)
            resInfo.encoding = 'utf-8'
            data = resInfo.text
            print(data)
            resInfoText = json.loads(data)
            video = []  # 把 title 和 url都装进去  ["《纤夫的爱》好听吗",'https://haokan.baidu.com/v?pd=h5&vid=13772777109240759654']
            for index in range(len(resInfoText['data']['response']['videos'])):
                # print(type(resInfoText['data']['response']['videos'][index]))
                # print(resInfoText['data']['response']['videos'][index])
                for item in resInfoText['data']['response']['videos'][index].keys():
                    # print(resInfoText['data']['response']['videos'][index][item])
                    if item == 'title' or item == 'play_url':
                        video.append(resInfoText['data']['response']['videos'][index][item])
            movieName = ''
            for index in range(len(video)):
                if index % 2 == 0:
                    movieName = video[index]
                else:
                    time.sleep(3)
                    print("------------------" + "downloading" + "------------------")
                    urllib.request.urlretrieve(video[index], filename="D:/b/视频爬取/" + movieName + '.mp4')
                    movieName = ''
if __name__ == "__main__":
        allVideo = getVideo()
        allVideo.getVideo()





5.结果如下:

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/339873.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号