看下join的执行过程,还是看explain计划。
sql代码如下:
explain
select t1.prov_id
,t2.deep
from (
select prov_id
,deep
from dim.dim_city
where prov_id = 110000
) t1
join (
select prov_id
,deep
from dim.dim_city
where deep = 1
) t2
on t1.prov_id = t2.prov_id
;
执行计划
STAGE DEPENDENCIES: Stage-4 is a root stage Stage-3 depends on stages: Stage-4 Stage-0 depends on stages: Stage-3 STAGE PLANS: //对应第一段子查询 Stage: Stage-4 Map Reduce Local Work Alias -> Map Local Tables: t1:dim_city Fetch Operator limit: -1 Alias -> Map Local Operator Tree: t1:dim_city TableScan alias: dim_city filterExpr: (prov_id = 110000) (type: boolean) Statistics: Num rows: 3775 Data size: 522191 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (prov_id = 110000) (type: boolean) Statistics: Num rows: 1887 Data size: 261026 Basic stats: COMPLETE Column stats: NONE Select Operator Statistics: Num rows: 1887 Data size: 261026 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 110000 (type: int) 1 _col0 (type: int) //对应第二段子查询 Stage: Stage-3 Map Reduce Map Operator Tree: TableScan alias: dim_city filterExpr: ((deep = 1) and prov_id is not null) (type: boolean) Statistics: Num rows: 3775 Data size: 522191 Basic stats: COMPLETE Column stats: NONE Filter Operator //这里有个细节,join时null值不能关联,在取数据时直接就过滤掉了 predicate: ((deep = 1) and prov_id is not null) (type: boolean) Statistics: Num rows: 1887 Data size: 261026 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: prov_id (type: int) outputColumnNames: _col0 Statistics: Num rows: 1887 Data size: 261026 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: //声明连接算法是内连接 Inner Join 0 to 1 //指定两表连接的条件 keys: //第一个表输出的数据集 0 110000 (type: int),110000是关联的条件 //第二个表输出的数据集,_col0是关联的条件 1 _col0 (type: int) Statistics: Num rows: 2075 Data size: 287128 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: 110000 (type: int), 1 (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 2075 Data size: 287128 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 2075 Data size: 287128 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink



