目前kylin4.0版本已经出现了一段时间,4.0版本跟之前版本差异较大。但是由于其存储parquet文件以及默认spark引擎的改动,使其在处理大数据量时有了较高的性能提升,kylin官网有过测试,感兴趣的可以看一下。
kylin元数据存储了模型、cube、segment等信息,在kylin改造以及数据恢复方面起到很重要的作用,后面公司可能会升级到kylin4,但是对于kylin3的元数据还没有一个全面的认识,所以根据工作和个人理解,准备写篇文章从新认识下kylin的元数据。下面以kylin3.1.0版本为例,主要以问题的形式进行记录。
1、kylin元数据存储在哪,名称是什么?
查看kylin的配置文件kylin.properties可以看到元数据的配置信息:
因此,kylin元数据存储在hbase中,且表名称为kylin_metadata,即kylin初始安装完,会在hbase上生成一个kylin_metadata表用于存储kylin的元数据信息。。下面是hbase中kylin_metadata的信息。
可以看到初始状态下kylin元数据只有一些用户信息
2、kylin元数据以何种形式存储,如何查看?
Kylin使用resource root path + resource name + resource suffix作为Hbase中的rowkey来存储元数据,且以二进制字节的格式将元数据存储在Hbase中。
下面我们通过运行kylin自带的sample.sh命令来生成一些样例模型和cube。命令运行结束重启下kylin或者重新加载下元数据即可看到如下页面:
此时再通过scan命令可以看到元数据中多了一些下面的信息(内容太多,只截取了部分):
这样看是不是感觉很不方便,此时可以退出hbase shell状态,使用kylin的命令来进行查看。如下:
./bin/metastore.sh cat /model_desc/kylin_sales_model.json
结果如下(其实会打印很多日志,我这里直接截取的核心):
{
"uuid" : "0928468a-9fab-4185-9a14-6f2e7c74823f",
"name" : "kylin_sales_model",
"is_draft" : false,
"description" : "",
"fact_table" : "DEFAULT.KYLIN_SALES",
"lookups" : [ {
"table" : "DEFAULT.KYLIN_CAL_DT",
"kind" : "LOOKUP",
"alias" : "KYLIN_CAL_DT",
"join" : {
"type" : "inner",
"primary_key" : [ "KYLIN_CAL_DT.CAL_DT" ],
"foreign_key" : [ "KYLIN_SALES.PART_DT" ]
}
}, {
"table" : "DEFAULT.KYLIN_CATEGORY_GROUPINGS",
"kind" : "LOOKUP",
"alias" : "KYLIN_CATEGORY_GROUPINGS",
"join" : {
"type" : "inner",
"primary_key" : [ "KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID", "KYLIN_CATEGORY_GROUPINGS.SITE_ID" ],
"foreign_key" : [ "KYLIN_SALES.LEAF_CATEG_ID", "KYLIN_SALES.LSTG_SITE_ID" ]
}
}, {
"table" : "DEFAULT.KYLIN_ACCOUNT",
"alias" : "BUYER_ACCOUNT",
"kind" : "LOOKUP",
"join" : {
"type" : "inner",
"primary_key" : [ "BUYER_ACCOUNT.ACCOUNT_ID" ],
"foreign_key" : [ "KYLIN_SALES.BUYER_ID" ]
}
}, {
"table" : "DEFAULT.KYLIN_ACCOUNT",
"alias" : "SELLER_ACCOUNT",
"kind" : "LOOKUP",
"join" : {
"type" : "inner",
"primary_key" : [ "SELLER_ACCOUNT.ACCOUNT_ID" ],
"foreign_key" : [ "KYLIN_SALES.SELLER_ID" ]
}
}, {
"table" : "DEFAULT.KYLIN_COUNTRY",
"alias" : "BUYER_COUNTRY",
"kind" : "LOOKUP",
"join" : {
"type" : "inner",
"primary_key" : [ "BUYER_COUNTRY.COUNTRY" ],
"foreign_key" : [ "BUYER_ACCOUNT.ACCOUNT_COUNTRY" ]
}
}, {
"table" : "DEFAULT.KYLIN_COUNTRY",
"alias" : "SELLER_COUNTRY",
"kind" : "LOOKUP",
"join" : {
"type" : "inner",
"primary_key" : [ "SELLER_COUNTRY.COUNTRY" ],
"foreign_key" : [ "SELLER_ACCOUNT.ACCOUNT_COUNTRY" ]
}
}],
"dimensions" : [ {
"table" : "KYLIN_SALES",
"columns" : [ "TRANS_ID", "SELLER_ID", "BUYER_ID", "PART_DT", "LEAF_CATEG_ID", "LSTG_FORMAT_NAME", "LSTG_SITE_ID", "OPS_USER_ID", "OPS_REGION" ]
}, {
"table" : "KYLIN_CAL_DT",
"columns" : [ "CAL_DT", "WEEK_BEG_DT", "MONTH_BEG_DT", "YEAR_BEG_DT" ]
}, {
"table" : "KYLIN_CATEGORY_GROUPINGS",
"columns" : [ "USER_DEFINED_FIELD1", "USER_DEFINED_FIELD3", "meta_CATEG_NAME", "CATEG_LVL2_NAME", "CATEG_LVL3_NAME", "LEAF_CATEG_ID", "SITE_ID" ]
}, {
"table" : "BUYER_ACCOUNT",
"columns" : [ "ACCOUNT_ID", "ACCOUNT_BUYER_LEVEL", "ACCOUNT_SELLER_LEVEL", "ACCOUNT_COUNTRY", "ACCOUNT_CONTACT" ]
}, {
"table" : "SELLER_ACCOUNT",
"columns" : [ "ACCOUNT_ID", "ACCOUNT_BUYER_LEVEL", "ACCOUNT_SELLER_LEVEL", "ACCOUNT_COUNTRY", "ACCOUNT_CONTACT" ]
}, {
"table" : "BUYER_COUNTRY",
"columns" : [ "COUNTRY", "NAME" ]
}, {
"table" : "SELLER_COUNTRY",
"columns" : [ "COUNTRY", "NAME" ]
} ],
"metrics": [
"KYLIN_SALES.PRICE",
"KYLIN_SALES.ITEM_COUNT"
],
"last_modified" : 1422435345362,
"filter_condition" : "",
"partition_desc" : {
"partition_date_column" : "KYLIN_SALES.PART_DT",
"partition_time_column" : null,
"partition_date_start" : 1325376000000,
"partition_date_format" : "yyyy-MM-dd",
"partition_time_format" : "HH:mm:ss",
"partition_type" : "APPEND",
"partition_condition_builder" : "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
},
"capacity" : "MEDIUM"
}
可以看到通过kylin自带的metastore.sh命令可以较为直观的查看kylin存储在hbase中的信息。
注:
官网给出查看元数据的方式是先通过list查看有哪些元数据:
./bin/metastore.sh list /path
因为元数据的存储有层次关系,如果一开始不知道想看那个元数据,可以直接用./bin/metastore.sh list /查看根目录下有哪些内容,然后在一级级展示,遇到自己想看的内容,就再通过cat命令打开,如查看/model_desc/kylin_sales_model.json元数据:
./bin/metastore.sh cat /model_desc/kylin_sales_model.json
3、构建segments,查看元信息表中对应内容
构建之后页面上可以看到的关键信息为
于此可以简单得出一个结论:每一次的构建,即一个segment对应一张表,后期segment的融合即是hbase表的融合
4、备份元数据与恢复
备份元数据真的很重要,特别是生产环境中,中间有两次因为kylin改造以及平台升级导致kylin元数据被误删,最后都是通过元数据恢复。所以这个真的很重要。两种备份方式如下:
./bin/metastore.sh backup 备份所有元数据(推荐) ./bin/metastore.sh fetch /path 备份指定路径下的元数据
元数据恢复如下:
./bin/metastore.sh reset 重置元数据,即清空除用户信息外的所有元数据(慎用) ./bin/metastore.sh restore /*/备份文件 重置元数据后通过restore命令从备份文件中恢复元数据
参考:Apache Kylin | 大数据分析型数据仓库



