前段时间写了个简单的人群计算程序数坊自动人群计算1.0,最近时间比较充裕,所以决定完善一下,增加了多标签的人群计算,增加了广告、已有人群、属性等常用标签,虽然BUG百出,但亲测勉强能用
其实计算逻辑很简单,上传人群逻辑到接口就会返回人群大小,比较复杂的地方主要在于人群逻辑表格的搭建与表格转为可上传的JSON数据,自动圈包也类似只不过接口不同而已
- 0、前言
- 1、搭建逻辑
- 2、构建卡片DATA
- 2.0 获取所需ID
- 2.0.1 获取品牌&类目ID
- 2.0.2 获取广告ID
- 2.1 购买行为
- 2.2 加购行为
- 2.3 浏览行为
- 2.4 广告行为
- 2.5 4A分布
- 2.6 已有人群
- 2.7 十大靶群
- 2.8 性别
- 2.9 年龄
- 2.10 婚姻状况
- 2.11 学历
- 2.12 购买力
- 3、构建人群DATA
- 4、 获取人群大小(Tkinter)
- 5、食用方法
- 5.1 填逻辑表
- 5.2 运行程序,输入cookie
每一行表示一个标签,每一列表示一个参数,不需要或不限就不填
人群名称相同则视为一个人群包,多标签间运算逻辑通过运算列控制
品牌为不填时会默认为三级类目,品牌不为空时(可任意写,只要不为空值)则默认为品牌*三级类目
| 人群名称 | 运算 | 卡片名称 | 品牌 | 类目 | 开始时间 | 结束时间 | 频次 | 价格 | 渠道 | 行为 | 身份 | 已有人群 | 属性 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 人群1 | 空 | 浏览行为 | XX | 家用电器-大家电-平板电视 | 2022-04-01 | 2022-04-15 | |||||||
| 人群1 | 交集 | 购买行为 | 家用电器-大家电-平板电视 | 2022-04-01 | 2022-04-15 | ||||||||
| 人群1 | 差集 | 购买行为 | XX | 家用电器-大家电-平板电视 | 2022-04-01 | 2022-04-15 | |||||||
| 人群2 | 空 | 浏览行为 | XX | 家用电器-大家电-平板电视 | 2022-04-01 | 2022-04-15 | |||||||
| 人群3 | 空 | 购买行为 | 家用电器-大家电-平板电视 | 2022-04-01 | 2022-04-15 |
需要的ID有用户行为中品牌ID,类目ID,与广告行为中的广告ID
2.0.1 获取品牌&类目ID通过输入账号cookie自动获取品牌ID,因此逻辑表中可以随便写,但类目要按格式写
def get_core_id(cookie, cate_name): # 拿品牌三级类目id
cate1 = list(cate_name.split('-'))[0]
cate2 = list(cate_name.split('-'))[1]
cate3 = list(cate_name.split('-'))[2]
url = 'https://4a.jd.com/datamill/api/accountManagement/mainAccountInfoOuter/brandInfo?pageNum=0&pageSize=10'
headers = {
'user-agent': 'PostmanRuntime/7.28.4',
"accept": "*/*",
'cookie': cookie
}
txt = requests.get(url, headers=headers).text
data = json.loads(txt)["result"]["data"][0]
brand_id = data['brandCode']
cate_list = data["category"]
for i in range(30):
if cate_list[i]["name"] == cate1:
cate_id1 = cate_list[i]["categoryCode"]
cate_children = cate_list[i]["children"]
break
for j in range(30):
if cate_children[j]["name"] == cate2:
cate_id2 = cate_children[j]["categoryCode"]
cate_grandson = cate_children[j]["children"]
break
for k in range(30):
if cate_grandson[k]["name"] == cate3:
cate_id3 = cate_grandson[k]["categoryCode"]
break
cate_id = str(cate_id1) + "_" + str(cate_id2) + "_" + str(cate_id3)
id_data = {
"brand_id": brand_id,
"cate_id": cate_id
}
return id_data
2.0.2 获取广告ID
进入人群圈选点击广告行为卡片时后台会弹出一条lineList请求,其中包含数坊权限内全部的广告触点与对应ID,使用EXCEL表中名称全部匹配为ID后组合为一条ID来实现渠道多选
def get_ad_id(cookie, ad_name): # 拿广告id
name_list = ad_name.split(",")
ad_id = ''
url = 'https://4a.jd.com/datamill/api/audienceManagement/newCustomAudienceEditInner/lineList'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36',
'cookie': cookie
}
txt = requests.get(url, headers=headers).text
data_list = json.loads(txt)["result"]["data"]
for data in data_list:
if data["name"] in name_list:
ad_id += str(data["id"])
ad_id += ','
return ad_id.rstrip(",")
2.1 购买行为
购买行为设定传入品牌ID、类目ID、时间段(开始时间与结束时间)、频次、价格,
当频次或价格为"不限"时输入为空
def get_order_data(cookie, brand_name, cate_name, start_time, end_time, frequency, price): # 购买行为
brand_id = get_core_id(cookie, cate_name)["brand_id"]
cate_id = get_core_id(cookie, cate_name)["cate_id"]
data = {
"cardType": "order",
"cardCode": "300662",
"type": "behaviorV2",
"key": "order",
"screen": "all",
"dimension": "3" if pd.isnull(brand_name) else "2",
"brandCode": '' if pd.isnull(brand_name) else str(brand_id),
"cateList": str(cate_id) if pd.isnull(brand_name) else str(cate_id.split("_")[-1]),
"isRelativeTime": 'false', # 必须是绝对时间
"startDate": str(start_time).rstrip("00:00:00").rstrip(),
"endDate": str(end_time).rstrip("00:00:00").rstrip(),
"frequency": {"operator": "nolimit"} if pd.isnull(frequency) else {"operator": "between", "value": frequency},
"price": {"operator": "nolimit"} if pd.isnull(price) else {"operator": "between", "value": price},
"displayDef": "1" # 没用但必须有
}
return data
2.2 加购行为
def get_addCart_data(cookie, brand_name, cate_name, start_time, end_time, frequency, price): # 加购行为
brand_id = get_core_id(cookie, cate_name)["brand_id"]
cate_id = get_core_id(cookie, cate_name)["cate_id"]
data = {
"cardType": "addCart",
"cardCode": "300666",
"type": "behaviorV2",
"key": "addCart",
"screen": "all",
"dimension": "3" if pd.isnull(brand_name) else "2",
"brandCode": '' if pd.isnull(brand_name) else str(brand_id),
"cateList": str(cate_id) if pd.isnull(brand_name) else str(cate_id.split("_")[-1]),
"isRelativeTime": 'false', # 必须是绝对时间
"startDate": start_time,
"endDate": end_time,
"frequency": {"operator": "nolimit"} if pd.isnull(frequency) else {"operator": "between", "value": frequency},
"price": {"operator": "nolimit"} if pd.isnull(price) else {"operator": "between", "value": price},
"displayDef": "1" # 没用但必须有
}
return data
2.3 浏览行为
def get_view_data(cookie, brand_name, cate_name, start_time, end_time, frequency, price): # 浏览行为
brand_id = get_core_id(cookie, cate_name)["brand_id"]
cate_id = get_core_id(cookie, cate_name)["cate_id"]
data = {
"cardType": "view",
"cardCode": "300658",
"type": "behaviorV2",
"key": "view",
"screen": "all",
"dimension": "3" if pd.isnull(brand_name) else "2",
"brandCode": '' if pd.isnull(brand_name) else str(brand_id),
"cateList": str(cate_id) if pd.isnull(brand_name) else str(cate_id.split("_")[-1]),
"isRelativeTime": 'false', # 必须是绝对时间
"startDate": str(start_time).rstrip("00:00:00").rstrip(),
"endDate": str(end_time).rstrip("00:00:00").rstrip(),
"frequency": {"operator": "nolimit"} if pd.isnull(frequency) else {"operator": "between", "value": frequency},
"price": {"operator": "nolimit"} if pd.isnull(price) else {"operator": "between", "value": price},
"displayDef": "1" # 没用但必须有
}
return data
2.4 广告行为
def get_ad_data(cookie, brand_name, cate_name, ad_name, behavior, start_time, end_time, frequency): # 广告行为
brand_id = get_core_id(cookie, cate_name)["brand_id"]
cate_id = get_core_id(cookie, cate_name)["cate_id"]
ad_id = get_ad_id(cookie, ad_name)
data = {
"cardType": "advertisement",
"cardTitle": "广告行为",
"cardCode": "300270",
"key": "impression",
"type": "behaviorV2",
"line": str(ad_id),
"behaviorType": "impression" if behavior == '曝光' else "click",
"isRelativeTime": 'false',
"frequency": {"operator": "nolimit"} if pd.isnull(frequency) else {"operator": "between", "value": frequency},
"dimension": "10" if brand_id == '' else "2",
"startDate": start_time,
"endDate": end_time,
"brandCode": brand_id,
"cateList": cate_id,
"displayDef": "1" # 没用但必须有
}
return data
2.5 4A分布
def get_4a_data(cookie, brand_name, cate_name, start_time, end_time, status): # 4A分布
brand_id = get_core_id(cookie, cate_name)["brand_id"]
cate_id = get_core_id(cookie, cate_name)["cate_id"]
status_list = status.split(",")
if "认知" in status_list:
pass
data = {
"audienceDefinition": {
"type": "intersection",
"children": [
{
"cardType": "layout",
"cardTitle": "4A分布",
"cardCode": "300214",
"type": "4alayoutV2",
"modelType": "1",
"brandCode": str(brand_id),
"cateList": str(cate_id) if pd.isnull(brand_name) else str(cate_id.split("_")[-1]),
"status": status,
"isRelativeTime": "false",
"startDate": start_time,
"endDate": end_time,
"displayDef": "1" # 没用但必须有
}
]
}
}
return data
2.6 已有人群
已有人群需要上传人群ID,在数坊刷新人群列表时发送audienceList请求获取全部人群名称及人群ID,将上传参数改为startDate=1999-04-03&endDate=3000-04-03&pageSize=100,000,确保获取该账号下全部人群信息
def get_old_data(name): # 已有人群
url = "https://4a.jd.com/datamill/api/growthStrategy/audienceManagement/audienceList?name=&startDate=19999-04-03&endDate=3000-04-03&status=-1&audienceType=all&pageNum=0&pageSize=100000"
for info in info_list:
if info["name"] == name:
audienceId = info["id"]
data = {
"audienceDefinition": {
"type": "intersection",
"children": [
{
"cardType": "custom",
"cardCode": "102180",
"categoryPath": "已有人群",
"type": "package",
"audienceId": audienceId
}
]
}
}
return data
2.7 十大靶群
十大靶群上传的关键参数ids需要的就是中文名称,多选情况加使用因为逗号分隔
def get_ten_group(name): # 十大靶群
data = {
"cardType": "normal",
"cardCode": "300226",
"type": "normal_label",
"labelCode": "400227",
"labelNameEn": "dc_group_type",
"labelNameCn": "十大靶群",
"tagType": "checkbox",
"params": {
"dc_group_type": {
"ids": name
}
},
"displayDef": "十大靶群 是 学生一族"
}
return data
2.8 性别
女性对应ids为0,男性对应ids为1,多选使用英文逗号分隔
def get_normal(name): # 性别
info = name.replace("女", "0").replace("男", "1")
data = {
"cardType": "normal",
"cardCode": "302154",
"type": "normal_label",
"labelCode": "402155",
"labelNameEn": "ulp_base_sex",
"labelNameCn": "性别",
"tagType": "checkbox",
"params": {
"ulp_base_sex": {
"ids": info
}
},
"displayDef": "性别 是 女"
}
return data
2.9 年龄
与性别不同年龄排序从1开始,15岁以下ids为1, 16-25岁为2,以此类推,多选使用英文逗号分隔
def get_age(name): # 年龄
info = name.replace("15岁以下", "1").replace("16-25岁", "2").replace("26-35岁", "3").replace("36-45岁", "4").replace("46-55岁", "5").replace("56岁以上", "6")
data = {
"cardType": "normal",
"cardCode": "302162",
"type": "normal_label",
"labelCode": "402163",
"labelNameEn": "ulp_base_age",
"labelNameCn": "预测年龄",
"tagType": "checkbox",
"params": {
"ulp_base_age": {
"ids": info
}
},
"displayDef": "预测年龄 是 16-25岁"
}
return data
2.10 婚姻状况
与性别类似,未婚ids为0,已婚ids为1,多选使用英文逗号分隔
def get_marriage(name): # 婚姻状况
info = name.replace("未婚", "0").replace("已婚", "1")
data = {
"cardType": "normal",
"cardCode": "301322",
"type": "normal_label",
"labelCode": "401323",
"labelNameEn": "ulp_base_marriage",
"labelNameCn": "婚姻状况",
"tagType": "checkbox",
"params": {
"ulp_base_marriage": {
"ids": info
}
},
"displayDef": "婚姻状况 是 未婚"
}
return data
2.11 学历
与年龄相似,初中及以及下ids为1,高中为2……,多选使用英文逗号分隔
def get_edu(name): # 学历
info = name.replace("初中及以下", "1").replace("高中(中专)", "2").replace("大学(本科及专科)", "3").replace("研究生(硕士及以上)", "4")
data = {
"cardType": "normal",
"cardCode": "301326",
"type": "normal_label",
"labelCode": "402186",
"labelNameEn": "ulp_base_education",
"labelNameCn": "学历",
"tagType": "checkbox",
"params": {
"ulp_base_education": {
"ids": info
}
},
"displayDef": "学历 是 初中及以下"
}
return data
2.12 购买力
与年龄相似,土豪ids为1,高级白领为2……,多选使用英文逗号分隔
def get_power(name): # 购买力——金钱就是力量
info = name.replace("土豪", "1").replace("高级白领", "2").replace("小白领", "3").replace("蓝领", "4").replace("收入很少", "5")
data = {
"cardType": "normal",
"cardCode": "300622",
"type": "normal_label",
"labelCode": "400655",
"labelNameEn": "cust_purchpower",
"labelNameCn": "购买力分段",
"tagType": "checkbox",
"params": {
"cust_purchpower": {
"ids": info
}
},
"displayDef": "购买力分段 是 土豪"
}
return data
3、构建人群DATA
读取人群逻辑文件,判断人群为单标签还是多标签后,组合获取人群大小所需上传的数据
def get_card(cookie, path): # 读取逻辑,返回人群名与data
card_list = []
df = pd.read_excel(path)
people_list = df['人群名称'].drop_duplicates() # 提取人群名称列后去重,拿到全部人群名称
for people in people_list:
df1 = df[df["人群名称"].str.contains(people)]
if len(df1) == 1: # 如果人群只有一个卡片
data = eval('{"audienceDefinition":{"type":"intersection","children":[' + str(get_data(cookie, df1)) + ']}}')
else: # 多个卡片
data2 = str(get_data(cookie, df1.loc[[0]]))
for i in range(len(df1)-1):
if df1.iloc[i+1, 1] == "交集":
operation = "intersection"
elif df1.iloc[i+1, 1] == "差集":
operation = "diff"
elif df1.iloc[i+1, 1] == "并集":
operation = "union"
data2 = '{"type":"' + operation + '","children":[' + data2 + ',' + str(get_data(cookie, df1.loc[[i+1]])) + ']}'
data_fall = '{"audienceDefinition":' + data2 + '}'
data = eval(data_fall)
card_data = {
"name" : people,
"data" : data
}
card_list.append(card_data)
return card_list
4、 获取人群大小(Tkinter)
用了Tkinter改善丑陋的大黑框,但是Tkinter也先不美化了,能用就行
def people_count(cookie, info):
url = 'https://4a.jd.com/datamill/api/audienceManagement/predictAudienceSize'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36',
"content-type": 'application/json',
'cookie': cookie
}
r = requests.post(url, headers=headers, data=json.dumps(info))
return r.text
def run():
result_list = []
cookie = E1.get()
path = E2.get()
people_list = get_card(cookie=cookie, path=path + r"人群逻辑表.xlsx")
for people in people_list:
people_size = eval(people_count(cookie, people["data"]))
result_list.append(
{
"人群名称" : people["name"],
"人群大小" : people_size["result"]["audienceSize"]
}
)
result_df = pd.DataFrame(result_list)
result_df.to_excel(path + r"人群大小.xlsx")
B1 = tk.Button(root, text="提交", command=run)
B1.pack()
root.mainloop()
5、食用方法
5.1 填逻辑表
这里测试用的逻辑是21年618期间,3k+空调类目购买用户浏览频次分布,频次从1至100次共一百个人群
把cookie粘进去,点击提交等着就行了,等结果文件出来后就行了,偷懒 时间关系没有设置反馈类的输出,总之能用就行
本来还需要输入逻辑表所在文件夹的路径,后来自己用的时候嫌麻烦就给写死了
源码下载链接https://download.csdn.net/download/qq_43210367/85339129
有兴趣同学可自行修改,代码仅用于学习,禁止商用



