Python 微信公众号文章爬取简单记录一下

一，首先看网上各种资料，个人感觉通过微信公众号平台图来采集比较方便（因为自己有微信公众号）

二，各种借鉴网上前辈们的经验，这几年微信公众平台没啥大改变，接口啥的都没变。

三，爬去方法

1.微信公众号登录页面：微信公众平台

登录只需要设置headers（当然需要账号密码）登录后获取到cookies

  header = {
        "HOST": "mp.weixin.qq.com",     
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"
        }

2.搜索微信公众号的接口地址：https://mp.weixin.qq.com/cgi-bin/searchbiz?

打开搜索微信公众号接口地址，需要传入相关参数信息如：cookies、params、headers

cookies取自第一步 headers 同第一步 params设置：

params有三个变量 token random query

random 为随机数可以import python自带的random 就能解决

query 是你需要爬取的公众号名称

token：登录后就能获取token

params= {
        'action': 'search_biz',
        'token' : token,
        'lang': 'zh_CN',
        'f': 'json',
        'ajax': '1',
        'random': random.random(),
        'query': query, 
        'begin': '0',
        'count': '5'
        }

这一步获取到公众号的 fakeid，

3.微信公众号文章接口地址：https://mp.weixin.qq.com/cgi-bin/appmsg?

这个url需要传入cookies headers params ，

cookies heades 同上 params 为下图

其中token同上，random是随机数 fakeid 为上一步获取到的fakeid

berig 为起始页需要分页爬取的可以写个循环。

 params = {
            'token': token,
            'lang': 'zh_CN',
            'f': 'json',
            'ajax': '1',
            'random': random.random(),
            'action': 'list_ex',
            'begin': 起始页，
            'count': '5',
            'query': '',
            'fakeid': fakeid,
            'type': '9'
            }

这个url 传完这几个参数就能获取到文章了。

四，源代码地址

Python微信公众号文章爬取-Python文档类资源-CSDN下载

Python 微信公众号文章爬取 简单记录一下

Python相关栏目本月热门文章

Python 微信公众号文章爬取简单记录一下