栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Hive Explain Plan

Hive Explain Plan

EXPLAIN 1. 从一张表里检索所有记录
hive> explain select * from web_site;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
    limit:-1
    Select Operator [SEL_1]
      Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18","_col19","_col20","_col21","_col22","_col23","_col24","_col25"]
      TableScan [TS_0]
        Output:["web_site_sk","web_site_id","web_rec_start_date","web_rec_end_date","web_name","web_open_date_sk","web_close_date_sk","web_class","web_manager","web_mkt_id","web_mkt_class","web_mkt_desc","web_market_manager","web_company_id","web_company_name","web_street_number","web_street_name","web_street_type","web_suite_number","web_city","web_county","web_state","web_zip","web_country","web_gmt_offset","web_tax_percentage"]
2. select + 过滤条件

可以看到,在 TableScan 后,紧跟着 Filter Operator。

hive> explain select * from web_site where web_site_sk='1';
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
    limit:-1
    Select Operator [SEL_2]
      Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18","_col19","_col20","_col21","_col22","_col23","_col24","_col25"]
      Filter Operator [FIL_4]
        predicate:(web_site_sk = 1L)
        TableScan [TS_0]
          Output:["web_site_sk","web_site_id","web_rec_start_date","web_rec_end_date","web_name","web_open_date_sk","web_close_date_sk","web_class","web_manager","web_mkt_id","web_mkt_class","web_mkt_desc","web_market_manager","web_company_id","web_company_name","web_street_number","web_street_name","web_street_type","web_suite_number","web_city","web_county","web_state","web_zip","web_country","web_gmt_offset","web_tax_percentage"]
3. WHERe + GROUP BY
hive> explain select web_site_id,count(1) from web_site where web_site_sk='1' group by web_site_id;
OK
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 2 vectorized
      File Output Operator [FS_14]
        Group By Operator [GBY_13] (rows=7 width=2061)
          Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0
        <-Map 1 [SIMPLE_EDGE] vectorized
          SHUFFLE [RS_12]
            PartitionCols:_col0
            Group By Operator [GBY_11] (rows=15 width=2061)
              Output:["_col0","_col1"],aggregations:["count()"],keys:web_site_id
              Select Operator [SEL_10] (rows=15 width=2061)
                Output:["web_site_id"]
                Filter Operator [FIL_9] (rows=15 width=2061)
                  predicate:(web_site_sk = 1L)
                  TableScan [TS_0] (rows=30 width=2061)
                    tpcds_bin_partitioned_orc_2@web_site,web_site,Tbl:COMPLETE,Col:NONE,Output:["web_site_sk","web_site_id"]

Time taken: 0.283 seconds, Fetched: 25 row(s)
4. mr 执行框架 WHERe + GROUP BY

注意,当执行引擎变化时,生成的执行计划有变化。如以下 例 4 和例 3 的 SQL 一致,但是计划不一样。

hive> SET hive.execution.engine=mr;
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> explain select web_site_id,count(1) cnt from web_site where web_site_sk='1' group by web_site_id having cnt > 1;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: web_site
            Statistics: Num rows: 30 Data size: 61850 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (web_site_sk = 1L) (type: boolean)
              Statistics: Num rows: 15 Data size: 30925 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: web_site_id (type: char(16))
                outputColumnNames: web_site_id
                Statistics: Num rows: 15 Data size: 30925 Basic stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: count(1)
                  keys: web_site_id (type: char(16))
                  mode: hash
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 15 Data size: 30925 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: char(16))
                    sort order: +
                    Map-reduce partition columns: _col0 (type: char(16))
                    Statistics: Num rows: 15 Data size: 30925 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col1 (type: bigint)
      Execution mode: vectorized
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          keys: KEY._col0 (type: char(16))
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 7 Data size: 14431 Basic stats: COMPLETE Column stats: NONE
          Filter Operator
            predicate: (_col1 > 1L) (type: boolean)
            Statistics: Num rows: 2 Data size: 4123 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 2 Data size: 4123 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.079 seconds, Fetched: 55 row(s)
6. 两个 job
hive> explain  SELECT CNT,COUNT(1) WEB_SITE_COUNT FROM (select web_site_id,count(1) cnt from web_site where web_site_sk='1' group by web_site_id) T GROUP BY cnt;
OK
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 3 vectorized
      File Output Operator [FS_23]
        Group By Operator [GBY_22] (rows=3 width=2061)
          Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0
        <-Reducer 2 [SIMPLE_EDGE] vectorized
          SHUFFLE [RS_21]
            PartitionCols:_col0
            Group By Operator [GBY_20] (rows=7 width=2061)
              Output:["_col0","_col1"],aggregations:["count()"],keys:_col1
              Select Operator [SEL_19] (rows=7 width=2061)
                Output:["_col1"]
                Group By Operator [GBY_18] (rows=7 width=2061)
                  Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0
                <-Map 1 [SIMPLE_EDGE] vectorized
                  SHUFFLE [RS_17]
                    PartitionCols:_col0
                    Group By Operator [GBY_16] (rows=15 width=2061)
                      Output:["_col0","_col1"],aggregations:["count()"],keys:web_site_id
                      Select Operator [SEL_15] (rows=15 width=2061)
                        Output:["web_site_id"]
                        Filter Operator [FIL_14] (rows=15 width=2061)
                          predicate:(web_site_sk = 1L)
                          TableScan [TS_0] (rows=30 width=2061)
                            tpcds_bin_partitioned_orc_2@web_site,web_site,Tbl:COMPLETE,Col:NONE,Output:["web_site_sk","web_site_id"]

Time taken: 0.106 seconds, Fetched: 35 row(s)
EXPLAIN DEPENDENCY 1. TABLE
hive> EXPLAIN DEPENDENCY select web_site_id,count(1) cnt from web_site where web_site_sk='1' group by web_site_id;
OK
{"input_tables":[{"tablename":"tpcds_bin_partitioned_orc_2@web_site","tabletype":"MANAGED_TABLE"}],"input_partitions":[]}
VIEW
hive> CREATE VIEW V1 AS SELECt * FROM  web_site where web_site_sk='1' ;
OK

hive> EXPLAIN DEPENDENCY select web_site_id,count(1) cnt FROM v1 group by web_site_id;
OK
{"input_tables":[
    {"tablename":"tpcds_bin_partitioned_orc_2@v1","tabletype":"VIRTUAL_VIEW"},
    {"tablename":"tpcds_bin_partitioned_orc_2@web_site","tabletype":"MANAGED_TABLE","tableParents":"[tpcds_bin_partitioned_orc_2@v1]"}],"input_partitions":[]
}
EXPLAIN AUTHORIZATION

显示执行 SQL 需要的权限信息。

hive> EXPLAIN AUTHORIZATION select web_site_id,count(1) cnt from web_site where web_site_sk='1' group by web_site_id;
OK
INPUTS: 
  tpcds_bin_partitioned_orc_2@web_site
OUTPUTS: 
  hdfs://localhost:9000/tmp/hive/houzhizhen/be957992-b9df-4b3d-a583-cf57873b77e7/hive_2021-12-03_17-03-43_073_796285877379130350-1/-mr-10001
CURRENT_USER: 
  houzhizhen
OPERATION: 
  QUERY
EXPLAIN VECTORIZATION
hive> EXPLAIN VECTORIZATION select web_site_id,count(1) cnt FROM v1 group by web_site_id;
OK
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 2 vectorized
      File Output Operator [FS_14]
        Group By Operator [GBY_13] (rows=7 width=2061)
          Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0
        <-Map 1 [SIMPLE_EDGE] vectorized
          SHUFFLE [RS_12]
            PartitionCols:_col0
            Group By Operator [GBY_11] (rows=15 width=2061)
              Output:["_col0","_col1"],aggregations:["count()"],keys:web_site_id
              Select Operator [SEL_10] (rows=15 width=2061)
                Output:["web_site_id"]
                Filter Operator [FIL_9] (rows=15 width=2061)
                  predicate:(web_site_sk = 1L)
                  TableScan [TS_0] (rows=30 width=2061)
                    tpcds_bin_partitioned_orc_2@web_site,web_site,Tbl:COMPLETE,Col:NONE,Output:["web_site_sk","web_site_id"],properties:{"insideView":"TRUE"}

Time taken: 0.094 seconds, Fetched: 25 row(s)
hive> EXPLAIN  select web_site_id,count(1) cnt FROM v1 group by web_site_id;
OK
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 2 vectorized
      File Output Operator [FS_14]
        Group By Operator [GBY_13] (rows=7 width=2061)
          Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0
        <-Map 1 [SIMPLE_EDGE] vectorized
          SHUFFLE [RS_12]
            PartitionCols:_col0
            Group By Operator [GBY_11] (rows=15 width=2061)
              Output:["_col0","_col1"],aggregations:["count()"],keys:web_site_id
              Select Operator [SEL_10] (rows=15 width=2061)
                Output:["web_site_id"]
                Filter Operator [FIL_9] (rows=15 width=2061)
                  predicate:(web_site_sk = 1L)
                  TableScan [TS_0] (rows=30 width=2061)
                    tpcds_bin_partitioned_orc_2@web_site,web_site,Tbl:COMPLETE,Col:NONE,Output:["web_site_sk","web_site_id"],properties:{"insideView":"TRUE"}
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/632708.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号