栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

【数坊人群计算2.0】

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

【数坊人群计算2.0】

0、前言

前段时间写了个简单的人群计算程序数坊自动人群计算1.0,最近时间比较充裕,所以决定完善一下,增加了多标签的人群计算,增加了广告、已有人群、属性等常用标签,虽然BUG百出,但亲测勉强能用

其实计算逻辑很简单,上传人群逻辑到接口就会返回人群大小,比较复杂的地方主要在于人群逻辑表格的搭建与表格转为可上传的JSON数据,自动圈包也类似只不过接口不同而已

源码下载在最后
    • 0、前言
    • 1、搭建逻辑
    • 2、构建卡片DATA
      • 2.0 获取所需ID
        • 2.0.1 获取品牌&类目ID
        • 2.0.2 获取广告ID
      • 2.1 购买行为
      • 2.2 加购行为
      • 2.3 浏览行为
      • 2.4 广告行为
      • 2.5 4A分布
      • 2.6 已有人群
      • 2.7 十大靶群
      • 2.8 性别
      • 2.9 年龄
      • 2.10 婚姻状况
      • 2.11 学历
      • 2.12 购买力
    • 3、构建人群DATA
    • 4、 获取人群大小(Tkinter)
    • 5、食用方法
      • 5.1 填逻辑表
      • 5.2 运行程序,输入cookie

1、搭建逻辑

每一行表示一个标签,每一列表示一个参数,不需要或不限就不填
人群名称相同则视为一个人群包,多标签间运算逻辑通过运算列控制
品牌为不填时会默认为三级类目,品牌不为空时(可任意写,只要不为空值)则默认为品牌*三级类目

人群名称运算卡片名称品牌类目开始时间结束时间频次价格渠道行为身份已有人群属性
人群1浏览行为XX家用电器-大家电-平板电视2022-04-012022-04-15
人群1交集购买行为家用电器-大家电-平板电视2022-04-012022-04-15
人群1差集购买行为XX家用电器-大家电-平板电视2022-04-012022-04-15
人群2浏览行为XX家用电器-大家电-平板电视2022-04-012022-04-15
人群3购买行为家用电器-大家电-平板电视2022-04-012022-04-15
2、构建卡片DATA 2.0 获取所需ID

需要的ID有用户行为中品牌ID,类目ID,与广告行为中的广告ID

2.0.1 获取品牌&类目ID

通过输入账号cookie自动获取品牌ID,因此逻辑表中可以随便写,但类目要按格式写

def get_core_id(cookie, cate_name): # 拿品牌三级类目id

    cate1 = list(cate_name.split('-'))[0]
    cate2 = list(cate_name.split('-'))[1]
    cate3 = list(cate_name.split('-'))[2]

    url = 'https://4a.jd.com/datamill/api/accountManagement/mainAccountInfoOuter/brandInfo?pageNum=0&pageSize=10'
    headers = {
        'user-agent': 'PostmanRuntime/7.28.4',
        "accept": "*/*",
        'cookie': cookie
    }
    txt = requests.get(url, headers=headers).text
    data = json.loads(txt)["result"]["data"][0]
    brand_id = data['brandCode']
    cate_list = data["category"]

    for i in range(30):
        if cate_list[i]["name"] == cate1:
            cate_id1 = cate_list[i]["categoryCode"]
            cate_children = cate_list[i]["children"]
            break

    for j in range(30):
        if cate_children[j]["name"] == cate2:
            cate_id2 = cate_children[j]["categoryCode"]
            cate_grandson = cate_children[j]["children"]
            break

    for k in range(30):
        if cate_grandson[k]["name"] == cate3:
            cate_id3 = cate_grandson[k]["categoryCode"]
            break

    cate_id = str(cate_id1) + "_" + str(cate_id2) + "_" + str(cate_id3)
    id_data = {
        "brand_id": brand_id,
        "cate_id": cate_id
    }
    return id_data
2.0.2 获取广告ID

进入人群圈选点击广告行为卡片时后台会弹出一条lineList请求,其中包含数坊权限内全部的广告触点与对应ID,使用EXCEL表中名称全部匹配为ID后组合为一条ID来实现渠道多选

def get_ad_id(cookie, ad_name): # 拿广告id
    name_list = ad_name.split(",")
    ad_id = ''
    url = 'https://4a.jd.com/datamill/api/audienceManagement/newCustomAudienceEditInner/lineList'
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36',
        'cookie': cookie
    }
    txt = requests.get(url, headers=headers).text
    data_list = json.loads(txt)["result"]["data"]
    for data in data_list:
        if data["name"] in name_list:
            ad_id += str(data["id"])
            ad_id += ','
    return ad_id.rstrip(",")
2.1 购买行为

购买行为设定传入品牌ID、类目ID、时间段(开始时间与结束时间)、频次、价格,
当频次或价格为"不限"时输入为空

def get_order_data(cookie, brand_name, cate_name, start_time, end_time, frequency, price):  # 购买行为
    brand_id = get_core_id(cookie, cate_name)["brand_id"]
    cate_id = get_core_id(cookie, cate_name)["cate_id"]
    data = {
            "cardType": "order",
            "cardCode": "300662",
            "type": "behaviorV2",
            "key": "order",
            "screen": "all",
            "dimension": "3" if pd.isnull(brand_name) else "2",
            "brandCode": '' if pd.isnull(brand_name) else str(brand_id),
            "cateList": str(cate_id) if pd.isnull(brand_name) else str(cate_id.split("_")[-1]),
            "isRelativeTime": 'false',  # 必须是绝对时间
            "startDate": str(start_time).rstrip("00:00:00").rstrip(),
            "endDate": str(end_time).rstrip("00:00:00").rstrip(),
            "frequency": {"operator": "nolimit"} if pd.isnull(frequency) else {"operator": "between", "value": frequency},
            "price": {"operator": "nolimit"} if pd.isnull(price) else {"operator": "between", "value": price},
            "displayDef": "1"  # 没用但必须有
            }
    return data
2.2 加购行为
def get_addCart_data(cookie, brand_name, cate_name, start_time, end_time, frequency, price):  # 加购行为
    brand_id = get_core_id(cookie, cate_name)["brand_id"]
    cate_id = get_core_id(cookie, cate_name)["cate_id"]
    data = {
            "cardType": "addCart",
            "cardCode": "300666",
            "type": "behaviorV2",
            "key": "addCart",
            "screen": "all",
            "dimension": "3" if pd.isnull(brand_name) else "2",
            "brandCode": '' if pd.isnull(brand_name) else str(brand_id),
            "cateList": str(cate_id) if pd.isnull(brand_name) else str(cate_id.split("_")[-1]),
            "isRelativeTime": 'false',  # 必须是绝对时间
            "startDate": start_time,
            "endDate": end_time,
            "frequency": {"operator": "nolimit"} if pd.isnull(frequency) else {"operator": "between", "value": frequency},
            "price": {"operator": "nolimit"} if pd.isnull(price) else {"operator": "between", "value": price},
            "displayDef": "1"  # 没用但必须有
            }
    return data
2.3 浏览行为
def get_view_data(cookie, brand_name, cate_name, start_time, end_time, frequency, price):  # 浏览行为
    brand_id = get_core_id(cookie, cate_name)["brand_id"]
    cate_id = get_core_id(cookie, cate_name)["cate_id"]
    data = {
            "cardType": "view",
            "cardCode": "300658",
            "type": "behaviorV2",
            "key": "view",
            "screen": "all",
            "dimension": "3" if pd.isnull(brand_name) else "2",
            "brandCode": '' if pd.isnull(brand_name) else str(brand_id),
            "cateList": str(cate_id) if pd.isnull(brand_name) else str(cate_id.split("_")[-1]),
            "isRelativeTime": 'false',  # 必须是绝对时间
            "startDate": str(start_time).rstrip("00:00:00").rstrip(),
            "endDate": str(end_time).rstrip("00:00:00").rstrip(),
            "frequency": {"operator": "nolimit"} if pd.isnull(frequency) else {"operator": "between", "value": frequency},
            "price": {"operator": "nolimit"} if pd.isnull(price) else {"operator": "between", "value": price},
            "displayDef": "1"  # 没用但必须有
                }
    return data
2.4 广告行为
def get_ad_data(cookie, brand_name, cate_name, ad_name, behavior, start_time, end_time, frequency):  # 广告行为
    brand_id = get_core_id(cookie, cate_name)["brand_id"]
    cate_id = get_core_id(cookie, cate_name)["cate_id"]
    ad_id = get_ad_id(cookie, ad_name)
    data = {
            "cardType": "advertisement",
            "cardTitle": "广告行为",
            "cardCode": "300270",
            "key": "impression",
            "type": "behaviorV2",
            "line": str(ad_id),
            "behaviorType": "impression" if behavior == '曝光' else "click",
            "isRelativeTime": 'false',
            "frequency": {"operator": "nolimit"} if pd.isnull(frequency) else {"operator": "between", "value": frequency},
            "dimension": "10" if brand_id == '' else "2",
            "startDate": start_time,
            "endDate": end_time,
            "brandCode": brand_id,
            "cateList": cate_id,
            "displayDef": "1"  # 没用但必须有
             }
    return data

2.5 4A分布
def get_4a_data(cookie, brand_name, cate_name, start_time, end_time, status):  # 4A分布
    brand_id = get_core_id(cookie, cate_name)["brand_id"]
    cate_id = get_core_id(cookie, cate_name)["cate_id"]
    status_list = status.split(",")
    if "认知" in status_list:
        pass
    data = {
        "audienceDefinition": {
            "type": "intersection",
            "children": [
                {
                    "cardType": "layout",
                    "cardTitle": "4A分布",
                    "cardCode": "300214",
                    "type": "4alayoutV2",
                    "modelType": "1",
                    "brandCode": str(brand_id),
                    "cateList": str(cate_id) if pd.isnull(brand_name) else str(cate_id.split("_")[-1]),
                    "status": status,
                    "isRelativeTime": "false",
                    "startDate": start_time,
                    "endDate": end_time,
                    "displayDef": "1"  # 没用但必须有
                }
            ]
        }
    }
    return data
2.6 已有人群

已有人群需要上传人群ID,在数坊刷新人群列表时发送audienceList请求获取全部人群名称及人群ID,将上传参数改为startDate=1999-04-03&endDate=3000-04-03&pageSize=100,000,确保获取该账号下全部人群信息

def get_old_data(name): # 已有人群

    url = "https://4a.jd.com/datamill/api/growthStrategy/audienceManagement/audienceList?name=&startDate=19999-04-03&endDate=3000-04-03&status=-1&audienceType=all&pageNum=0&pageSize=100000"
    for info in info_list:
        if info["name"] == name:
            audienceId = info["id"]
    data = {
        "audienceDefinition": {
            "type": "intersection",
            "children": [
                {
            "cardType": "custom",
            "cardCode": "102180",
            "categoryPath": "已有人群",
            "type": "package",
            "audienceId": audienceId
                }
            ]
        }
    }
    return data
2.7 十大靶群

十大靶群上传的关键参数ids需要的就是中文名称,多选情况加使用因为逗号分隔

def get_ten_group(name): # 十大靶群
    data = {
        "cardType": "normal",
        "cardCode": "300226",
        "type": "normal_label",
        "labelCode": "400227",
        "labelNameEn": "dc_group_type",
        "labelNameCn": "十大靶群",
        "tagType": "checkbox",
        "params": {
            "dc_group_type": {
                "ids": name
                }
            },
        "displayDef": "十大靶群 是 学生一族"
    }
    return data
2.8 性别

女性对应ids为0,男性对应ids为1,多选使用英文逗号分隔

def get_normal(name): # 性别
    info = name.replace("女", "0").replace("男", "1")
    data = {
        "cardType": "normal",
        "cardCode": "302154",
        "type": "normal_label",
        "labelCode": "402155",
        "labelNameEn": "ulp_base_sex",
        "labelNameCn": "性别",
        "tagType": "checkbox",
        "params": {
            "ulp_base_sex": {
                "ids": info
                }
            },
        "displayDef": "性别 是 女"
    }
    return data
2.9 年龄

与性别不同年龄排序从1开始,15岁以下ids为1, 16-25岁为2,以此类推,多选使用英文逗号分隔

def get_age(name): # 年龄
    info = name.replace("15岁以下", "1").replace("16-25岁", "2").replace("26-35岁", "3").replace("36-45岁", "4").replace("46-55岁", "5").replace("56岁以上", "6")
    data = {
        "cardType": "normal",
        "cardCode": "302162",
        "type": "normal_label",
        "labelCode": "402163",
        "labelNameEn": "ulp_base_age",
        "labelNameCn": "预测年龄",
        "tagType": "checkbox",
        "params": {
            "ulp_base_age": {
                "ids": info
            }
        },
        "displayDef": "预测年龄 是 16-25岁"
    }
    return data
2.10 婚姻状况

与性别类似,未婚ids为0,已婚ids为1,多选使用英文逗号分隔

def get_marriage(name): # 婚姻状况
    info = name.replace("未婚", "0").replace("已婚", "1")
    data = {
        "cardType": "normal",
        "cardCode": "301322",
        "type": "normal_label",
        "labelCode": "401323",
        "labelNameEn": "ulp_base_marriage",
        "labelNameCn": "婚姻状况",
        "tagType": "checkbox",
        "params": {
            "ulp_base_marriage": {
                "ids": info
            }
        },
        "displayDef": "婚姻状况 是 未婚"
    }
    return data
2.11 学历

与年龄相似,初中及以及下ids为1,高中为2……,多选使用英文逗号分隔

def get_edu(name): # 学历
    info = name.replace("初中及以下", "1").replace("高中(中专)", "2").replace("大学(本科及专科)", "3").replace("研究生(硕士及以上)", "4")
    data = {
        "cardType": "normal",
        "cardCode": "301326",
        "type": "normal_label",
        "labelCode": "402186",
        "labelNameEn": "ulp_base_education",
        "labelNameCn": "学历",
        "tagType": "checkbox",
        "params": {
            "ulp_base_education": {
                "ids": info
            }
        },
        "displayDef": "学历 是 初中及以下"
    }
    return data
2.12 购买力

与年龄相似,土豪ids为1,高级白领为2……,多选使用英文逗号分隔

def get_power(name): # 购买力——金钱就是力量
    info = name.replace("土豪", "1").replace("高级白领", "2").replace("小白领", "3").replace("蓝领", "4").replace("收入很少", "5")
    data = {
        "cardType": "normal",
        "cardCode": "300622",
        "type": "normal_label",
        "labelCode": "400655",
        "labelNameEn": "cust_purchpower",
        "labelNameCn": "购买力分段",
        "tagType": "checkbox",
        "params": {
            "cust_purchpower": {
                "ids": info
            }
        },
        "displayDef": "购买力分段 是 土豪"
    }
    return data
3、构建人群DATA

读取人群逻辑文件,判断人群为单标签还是多标签后,组合获取人群大小所需上传的数据

def get_card(cookie, path): # 读取逻辑,返回人群名与data
    card_list = []
    df = pd.read_excel(path)
    people_list = df['人群名称'].drop_duplicates() # 提取人群名称列后去重,拿到全部人群名称
    for people in people_list:
        df1 = df[df["人群名称"].str.contains(people)]
        if len(df1) == 1: # 如果人群只有一个卡片
            data = eval('{"audienceDefinition":{"type":"intersection","children":[' + str(get_data(cookie, df1)) + ']}}')
        else: # 多个卡片
            data2 = str(get_data(cookie, df1.loc[[0]]))
            for i in range(len(df1)-1):
                if df1.iloc[i+1, 1] == "交集":
                    operation = "intersection"
                elif df1.iloc[i+1, 1] == "差集":
                    operation = "diff"
                elif df1.iloc[i+1, 1] == "并集":
                    operation = "union"
                data2 = '{"type":"' + operation + '","children":[' + data2 + ',' + str(get_data(cookie, df1.loc[[i+1]])) + ']}'
            data_fall = '{"audienceDefinition":' + data2 + '}'
            data = eval(data_fall)
        card_data = {
            "name" : people,
            "data" : data
        }
        card_list.append(card_data)
    return card_list
4、 获取人群大小(Tkinter)

用了Tkinter改善丑陋的大黑框,但是Tkinter也先不美化了,能用就行

def people_count(cookie, info):
    url = 'https://4a.jd.com/datamill/api/audienceManagement/predictAudienceSize'
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36',
        "content-type": 'application/json',
        'cookie': cookie
    }
    r = requests.post(url, headers=headers, data=json.dumps(info))
    return r.text


def run():
    result_list = []
    cookie = E1.get()
    path = E2.get()
    people_list = get_card(cookie=cookie, path=path + r"人群逻辑表.xlsx")
    for people in people_list:
        people_size = eval(people_count(cookie, people["data"]))
        result_list.append(
            {
                "人群名称" : people["name"],
                "人群大小" : people_size["result"]["audienceSize"]
            }
        )
    result_df = pd.DataFrame(result_list)
    result_df.to_excel(path + r"人群大小.xlsx")


B1 = tk.Button(root, text="提交", command=run)
B1.pack()


root.mainloop()
5、食用方法 5.1 填逻辑表

这里测试用的逻辑是21年618期间,3k+空调类目购买用户浏览频次分布,频次从1至100次共一百个人群

5.2 运行程序,输入cookie

把cookie粘进去,点击提交等着就行了,等结果文件出来后就行了,偷懒 时间关系没有设置反馈类的输出,总之能用就行
本来还需要输入逻辑表所在文件夹的路径,后来自己用的时候嫌麻烦就给写死了

源码下载链接https://download.csdn.net/download/qq_43210367/85339129
有兴趣同学可自行修改,代码仅用于学习,禁止商用

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/874872.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号