近期在做留存,漏斗等用户行为分析调研时,留存retention函数随着官网2.2 rc版本发布已经可以下载,然后漏斗函数window_funnel已经开发完成,合并到main版本,预计2.3版本会发布,现阶段通过编译,提前测试调研漏斗函数,以下为操作过程:
1.部署docker环境
mac环境下载docker.dmg,安装好,然后调整好docker资源:
可以编译好后,把docker资源再调整回来,之间就因为docker资源不够,编译不过,各种被kill,报错:
be部分编译到一半,然后停止了,再拉起编译,报错:
此时需要 sh build.sh --be --clean
note:可以通过sh build.sh --help查看compile支持的选项,fe和be模块可以分开构建;
2.docker pull starrocks main主干编译环境
可以参考docs/Build_in_docker.md at main · StarRocks/docs · GitHub
docker pull starrocks/dev-env:main
docker run -it --name starrocks2.3 -d starrocks/dev-env:main
进去编译环境:
docker exec -it starrocks-compile-env-container-id /bin/bash
3.编译
参考docs/Build_in_docker.md at main · StarRocks/docs · GitHub
git clone https://github.com/StarRocks/starrocks.git cd starrocks sh build.sh
整个编译过程大概40-60mins
[100%] Built target starrocks_be real 39m46.800s user 168m35.045s sys 10m22.659s
编译好后,生成的starrocks版本会在output目录,配置fe,be配置,启动服务,添加backend节点;
查看builtin functions:
MySQL [(none)]> select current_version(); +-------------------+ | current_version() | +-------------------+ | UNKNOWN ec27d13 | +-------------------+ 1 row in set (2.05 sec) MySQL [(none)]> MySQL [(none)]> MySQL [(none)]> create database gong; Query OK, 0 rows affected (0.00 sec) MySQL [(none)]> use gong; Database changed MySQL [gong]> show builtin functions from gong like '%retention%'; +---------------+ | Function Name | +---------------+ | retention | +---------------+ 1 row in set (0.02 sec) MySQL [gong]> show builtin functions from gong like '%window%'; +---------------+ | Function Name | +---------------+ | window_funnel | +---------------+ 1 row in set (0.00 sec)
4.测试
4.1 留存函数retention
mysql> select current_version(); +--------------------+ | current_version() | +--------------------+ | 2.2.0-RC01 f233d21 | +--------------------+ 1 row in set (0.00 sec) mysql> show builtin functions from gong like '%retention%'; +---------------+ | Function Name | +---------------+ | retention | +---------------+ 1 row in set (0.01 sec)
准备数据:
mysql> select * from retention_test; +------+---------------------+ | uid | date | +------+---------------------+ | 4 | 2020-01-01 00:00:00 | | 0 | 2020-01-01 00:00:00 | | 1 | 2020-01-01 00:00:00 | | 2 | 2020-01-01 00:00:00 | | 3 | 2020-01-01 00:00:00 | | 0 | 2020-01-02 00:00:00 | | 1 | 2020-01-02 00:00:00 | | 2 | 2020-01-02 00:00:00 | | 3 | 2020-01-02 00:00:00 | | 4 | 2020-01-02 00:00:00 | | 5 | 2020-01-02 00:00:00 | | 6 | 2020-01-02 00:00:00 | | 7 | 2020-01-02 00:00:00 | | 8 | 2020-01-02 00:00:00 | | 9 | 2020-01-02 00:00:00 | | 0 | 2020-01-03 00:00:00 | | 1 | 2020-01-03 00:00:00 | | 2 | 2020-01-03 00:00:00 | | 3 | 2020-01-03 00:00:00 | | 4 | 2020-01-03 00:00:00 | | 5 | 2020-01-03 00:00:00 | | 6 | 2020-01-03 00:00:00 | | 7 | 2020-01-03 00:00:00 | | 8 | 2020-01-03 00:00:00 | | 9 | 2020-01-03 00:00:00 | | 10 | 2020-01-03 00:00:00 | | 11 | 2020-01-03 00:00:00 | | 12 | 2020-01-03 00:00:00 | | 13 | 2020-01-03 00:00:00 | | 14 | 2020-01-03 00:00:00 | +------+---------------------+ 30 rows in set (0.01 sec)
让我们考虑使用 retention 功能的一个例子 ,以确定网站流量。
按唯一ID uid 对用户进行分组,使用 retention 功能。
mysql> select uid,
retention([date='2020-01-01',date='2020-01-02',date='2020-01-03'])
from retention_test where date in('2020-01-01','2020-01-02','2020-01-03')
group by uid order by uid;
+------+----------------------------------------------------------------------------+
| uid | retention([date = '2020-01-01', date = '2020-01-02', date = '2020-01-03']) |
+------+----------------------------------------------------------------------------+
| 0 | [1,1,1] |
| 1 | [1,1,1] |
| 2 | [1,1,1] |
| 3 | [1,1,1] |
| 4 | [1,1,1] |
| 5 | [0,0,0] |
| 6 | [0,0,0] |
| 7 | [0,0,0] |
| 8 | [0,0,0] |
| 9 | [0,0,0] |
| 10 | [0,0,0] |
| 11 | [0,0,0] |
| 12 | [0,0,0] |
| 13 | [0,0,0] |
| 14 | [0,0,0] |
+------+----------------------------------------------------------------------------+
15 rows in set (0.03 sec)
计算每天的现场访问总数:
mysql> SELECt sum(r[1]) AS r1,sum(r[2]) AS r2,sum(r[3]) AS r3
FROM (
SELECt uid,
retention([date = '2020-01-01', date = '2020-01-02', date = '2020-01-03']) As r
FROM retention_test WHERe date IN ('2020-01-01', '2020-01-02', '2020-01-03')
GROUP BY uid
) as rr;
+------+------+------+
| r1 | r2 | r3 |
+------+------+------+
| 5 | 5 | 5 |
+------+------+------+
1 row in set (0.39 sec)
条件:
- r1-2020-01-01期间访问该网站的独立访问者数量( cond1 条件)。
- r2-在2020-01-01和2020-01-02之间的特定时间段内访问该网站的唯一访问者的数量 (cond1 和 cond2条件)。
- r3-在2020-01-01和2020-01-03之间的特定时间段内访问该网站的唯一访问者的数量 (cond1 和 cond3条件)。
4.2 window funnel函数测试:
window_funnel并merge到主分支,还没正式release,参考:
https://github.com/StarRocks/starrocks/pull/5542
4.2.1 Use case(datetime)
mysql> select * from action; +------+------------+---------------------+ | uid | event_type | time | +------+------------+---------------------+ | 1 | 浏览 | 2020-01-02 11:00:00 | | 1 | 点击 | 2020-01-02 11:10:00 | | 1 | 下单 | 2020-01-02 11:20:00 | | 1 | 支付 | 2020-01-02 11:30:00 | | 1 | 浏览 | 2020-01-02 11:00:00 | | 2 | 下单 | 2020-01-02 11:00:00 | | 2 | 支付 | 2020-01-02 11:10:00 | | 3 | 浏览 | 2020-01-02 11:20:00 | | 3 | 点击 | 2020-01-02 12:00:00 | | 4 | 浏览 | 2020-01-02 11:50:00 | | 4 | 点击 | 2020-01-02 12:00:00 | | 5 | 浏览 | 2020-01-02 11:50:00 | | 5 | 点击 | 2020-01-02 12:00:00 | | 5 | 下单 | 2020-01-02 11:10:00 | | 6 | 浏览 | 2020-01-02 11:50:00 | | 6 | 点击 | 2020-01-02 12:00:00 | | 6 | 下单 | 2020-01-02 12:10:00 | +------+------------+---------------------+ 17 rows in set (0.01 sec) mysql> select uid, window_funnel(600, time, 0, [event_type="浏览", event_type="点击", event_type="下单", event_type="支付"]) as level from action group by uid order by uid; +------+-------+ | uid | level | +------+-------+ | 1 | 2 | | 2 | 0 | | 3 | 1 | | 4 | 2 | | 5 | 2 | | 6 | 2 | +------+-------+ 6 rows in set (0.04 sec)
4.2.2 Use case(date)
mysql> select * from action9; +------+------------+------------+ | uid | event_type | time | +------+------------+------------+ | 1 | 浏览 | 2020-01-02 | | 1 | 点击 | 2020-01-03 | | 1 | 下单 | 2020-01-04 | | 1 | 支付 | 2020-01-05 | | 2 | 浏览 | 2020-01-02 | | 2 | 点击 | 2020-01-06 | +------+------------+------------+ 6 rows in set (0.01 sec) mysql> select uid, window_funnel(3, time, 0, [event_type="浏览", event_type="点击", event_type="下单", event_type="支付"]) as level from action9 group by uid order by uid; +------+-------+ | uid | level | +------+-------+ | 1 | 4 | | 2 | 1 | +------+-------+ 2 rows in set (0.02 sec)



