hive> explain select * from web_site;
OK
Plan optimized by CBO.
Stage-0
Fetch Operator
limit:-1
Select Operator [SEL_1]
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18","_col19","_col20","_col21","_col22","_col23","_col24","_col25"]
TableScan [TS_0]
Output:["web_site_sk","web_site_id","web_rec_start_date","web_rec_end_date","web_name","web_open_date_sk","web_close_date_sk","web_class","web_manager","web_mkt_id","web_mkt_class","web_mkt_desc","web_market_manager","web_company_id","web_company_name","web_street_number","web_street_name","web_street_type","web_suite_number","web_city","web_county","web_state","web_zip","web_country","web_gmt_offset","web_tax_percentage"]
2. select + 过滤条件
可以看到,在 TableScan 后,紧跟着 Filter Operator。
hive> explain select * from web_site where web_site_sk='1';
OK
Plan optimized by CBO.
Stage-0
Fetch Operator
limit:-1
Select Operator [SEL_2]
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18","_col19","_col20","_col21","_col22","_col23","_col24","_col25"]
Filter Operator [FIL_4]
predicate:(web_site_sk = 1L)
TableScan [TS_0]
Output:["web_site_sk","web_site_id","web_rec_start_date","web_rec_end_date","web_name","web_open_date_sk","web_close_date_sk","web_class","web_manager","web_mkt_id","web_mkt_class","web_mkt_desc","web_market_manager","web_company_id","web_company_name","web_street_number","web_street_name","web_street_type","web_suite_number","web_city","web_county","web_state","web_zip","web_country","web_gmt_offset","web_tax_percentage"]
3. WHERe + GROUP BY
hive> explain select web_site_id,count(1) from web_site where web_site_sk='1' group by web_site_id;
OK
Plan optimized by CBO.
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 2 vectorized
File Output Operator [FS_14]
Group By Operator [GBY_13] (rows=7 width=2061)
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0
<-Map 1 [SIMPLE_EDGE] vectorized
SHUFFLE [RS_12]
PartitionCols:_col0
Group By Operator [GBY_11] (rows=15 width=2061)
Output:["_col0","_col1"],aggregations:["count()"],keys:web_site_id
Select Operator [SEL_10] (rows=15 width=2061)
Output:["web_site_id"]
Filter Operator [FIL_9] (rows=15 width=2061)
predicate:(web_site_sk = 1L)
TableScan [TS_0] (rows=30 width=2061)
tpcds_bin_partitioned_orc_2@web_site,web_site,Tbl:COMPLETE,Col:NONE,Output:["web_site_sk","web_site_id"]
Time taken: 0.283 seconds, Fetched: 25 row(s)
4. mr 执行框架 WHERe + GROUP BY
注意,当执行引擎变化时,生成的执行计划有变化。如以下 例 4 和例 3 的 SQL 一致,但是计划不一样。
hive> SET hive.execution.engine=mr;
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> explain select web_site_id,count(1) cnt from web_site where web_site_sk='1' group by web_site_id having cnt > 1;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: web_site
Statistics: Num rows: 30 Data size: 61850 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (web_site_sk = 1L) (type: boolean)
Statistics: Num rows: 15 Data size: 30925 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: web_site_id (type: char(16))
outputColumnNames: web_site_id
Statistics: Num rows: 15 Data size: 30925 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count(1)
keys: web_site_id (type: char(16))
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 15 Data size: 30925 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: char(16))
sort order: +
Map-reduce partition columns: _col0 (type: char(16))
Statistics: Num rows: 15 Data size: 30925 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Execution mode: vectorized
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: char(16))
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 7 Data size: 14431 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (_col1 > 1L) (type: boolean)
Statistics: Num rows: 2 Data size: 4123 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 2 Data size: 4123 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.079 seconds, Fetched: 55 row(s)
6. 两个 job
hive> explain SELECT CNT,COUNT(1) WEB_SITE_COUNT FROM (select web_site_id,count(1) cnt from web_site where web_site_sk='1' group by web_site_id) T GROUP BY cnt;
OK
Plan optimized by CBO.
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 3 vectorized
File Output Operator [FS_23]
Group By Operator [GBY_22] (rows=3 width=2061)
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0
<-Reducer 2 [SIMPLE_EDGE] vectorized
SHUFFLE [RS_21]
PartitionCols:_col0
Group By Operator [GBY_20] (rows=7 width=2061)
Output:["_col0","_col1"],aggregations:["count()"],keys:_col1
Select Operator [SEL_19] (rows=7 width=2061)
Output:["_col1"]
Group By Operator [GBY_18] (rows=7 width=2061)
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0
<-Map 1 [SIMPLE_EDGE] vectorized
SHUFFLE [RS_17]
PartitionCols:_col0
Group By Operator [GBY_16] (rows=15 width=2061)
Output:["_col0","_col1"],aggregations:["count()"],keys:web_site_id
Select Operator [SEL_15] (rows=15 width=2061)
Output:["web_site_id"]
Filter Operator [FIL_14] (rows=15 width=2061)
predicate:(web_site_sk = 1L)
TableScan [TS_0] (rows=30 width=2061)
tpcds_bin_partitioned_orc_2@web_site,web_site,Tbl:COMPLETE,Col:NONE,Output:["web_site_sk","web_site_id"]
Time taken: 0.106 seconds, Fetched: 35 row(s)
EXPLAIN DEPENDENCY
1. TABLE
hive> EXPLAIN DEPENDENCY select web_site_id,count(1) cnt from web_site where web_site_sk='1' group by web_site_id;
OK
{"input_tables":[{"tablename":"tpcds_bin_partitioned_orc_2@web_site","tabletype":"MANAGED_TABLE"}],"input_partitions":[]}
VIEW
hive> CREATE VIEW V1 AS SELECt * FROM web_site where web_site_sk='1' ;
OK
hive> EXPLAIN DEPENDENCY select web_site_id,count(1) cnt FROM v1 group by web_site_id;
OK
{"input_tables":[
{"tablename":"tpcds_bin_partitioned_orc_2@v1","tabletype":"VIRTUAL_VIEW"},
{"tablename":"tpcds_bin_partitioned_orc_2@web_site","tabletype":"MANAGED_TABLE","tableParents":"[tpcds_bin_partitioned_orc_2@v1]"}],"input_partitions":[]
}
EXPLAIN AUTHORIZATION
显示执行 SQL 需要的权限信息。
hive> EXPLAIN AUTHORIZATION select web_site_id,count(1) cnt from web_site where web_site_sk='1' group by web_site_id; OK INPUTS: tpcds_bin_partitioned_orc_2@web_site OUTPUTS: hdfs://localhost:9000/tmp/hive/houzhizhen/be957992-b9df-4b3d-a583-cf57873b77e7/hive_2021-12-03_17-03-43_073_796285877379130350-1/-mr-10001 CURRENT_USER: houzhizhen OPERATION: QUERYEXPLAIN VECTORIZATION
hive> EXPLAIN VECTORIZATION select web_site_id,count(1) cnt FROM v1 group by web_site_id;
OK
Plan optimized by CBO.
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 2 vectorized
File Output Operator [FS_14]
Group By Operator [GBY_13] (rows=7 width=2061)
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0
<-Map 1 [SIMPLE_EDGE] vectorized
SHUFFLE [RS_12]
PartitionCols:_col0
Group By Operator [GBY_11] (rows=15 width=2061)
Output:["_col0","_col1"],aggregations:["count()"],keys:web_site_id
Select Operator [SEL_10] (rows=15 width=2061)
Output:["web_site_id"]
Filter Operator [FIL_9] (rows=15 width=2061)
predicate:(web_site_sk = 1L)
TableScan [TS_0] (rows=30 width=2061)
tpcds_bin_partitioned_orc_2@web_site,web_site,Tbl:COMPLETE,Col:NONE,Output:["web_site_sk","web_site_id"],properties:{"insideView":"TRUE"}
Time taken: 0.094 seconds, Fetched: 25 row(s)
hive> EXPLAIN select web_site_id,count(1) cnt FROM v1 group by web_site_id;
OK
Plan optimized by CBO.
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 2 vectorized
File Output Operator [FS_14]
Group By Operator [GBY_13] (rows=7 width=2061)
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0
<-Map 1 [SIMPLE_EDGE] vectorized
SHUFFLE [RS_12]
PartitionCols:_col0
Group By Operator [GBY_11] (rows=15 width=2061)
Output:["_col0","_col1"],aggregations:["count()"],keys:web_site_id
Select Operator [SEL_10] (rows=15 width=2061)
Output:["web_site_id"]
Filter Operator [FIL_9] (rows=15 width=2061)
predicate:(web_site_sk = 1L)
TableScan [TS_0] (rows=30 width=2061)
tpcds_bin_partitioned_orc_2@web_site,web_site,Tbl:COMPLETE,Col:NONE,Output:["web_site_sk","web_site_id"],properties:{"insideView":"TRUE"}



