-- 先连接后过滤 explain select count(1) from student t1 inner join student t2 on t1.id = t2.id and t2.name = 'shanghai' where t1.city = 'beijing';
1 STAGE DEPENDENCIES: 2 Stage-5 is a root stage 3 Stage-2 depends on stages: Stage-5 4 Stage-0 depends on stages: Stage-2 5 6 STAGE PLANS: 7 Stage: Stage-5 8 Map Reduce Local Work 9 Alias -> Map Local Tables: 10 t1 11 Fetch Operator 12 limit: -1 13 Alias -> Map Local Operator Tree: 14 t1 15 TableScan 16 alias: t1 17 filterExpr: (id is not null and (city = 'beijing')) (type: boolean) 18 Statistics: Num rows: 3 Data size: 9 Basic stats: COMPLETE Column stats: NONE 19 Filter Operator 20 predicate: (id is not null and (city = 'beijing')) (type: boolean) 21 Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE 22 HashTable Sink Operator 23 keys: 24 0 id (type: int) 25 1 id (type: int) 26 27 Stage: Stage-2 28 Map Reduce 29 Map Operator Tree: 30 TableScan 31 alias: t2 32 filterExpr: ((name = 'shanghai') and id is not null) (type: boolean) 33 Statistics: Num rows: 3 Data size: 9 Basic stats: COMPLETE Column stats: NONE 34 Filter Operator 35 predicate: ((name = 'shanghai') and id is not null) (type: boolean) 36 Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE 37 Map Join Operator 38 condition map: 39 Inner Join 0 to 1 40 keys: 41 0 id (type: int) 42 1 id (type: int) 43 Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE 44 Group By Operator 45 aggregations: count(1) 46 mode: hash 47 outputColumnNames: _col0 48 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 49 Reduce Output Operator 50 sort order: 51 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 52 value expressions: _col0 (type: bigint) 53 Local Work: 54 Map Reduce Local Work 55 Reduce Operator Tree: 56 Group By Operator 57 aggregations: count(VALUE._col0) 58 mode: mergepartial 59 outputColumnNames: _col0 60 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 61 File Output Operator 62 compressed: false 63 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 64 table: 65 input format: org.apache.hadoop.mapred.SequenceFileInputFormat 66 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat 67 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 68 69 Stage: Stage-0 70 Fetch Operator 71 limit: -1 72 Processor Tree:再使用子查询进行先过滤,再连接。
explain select count(1) from (select * from student where city = 'beijing') t1 inner join student t2 on t1.id = t2.id and t2.name = 'shanghai';
1 STAGE DEPENDENCIES: 2 Stage-5 is a root stage 3 Stage-2 depends on stages: Stage-5 4 Stage-0 depends on stages: Stage-2 5 6 STAGE PLANS: 7 Stage: Stage-5 8 Map Reduce Local Work 9 Alias -> Map Local Tables: 10 t1:student 11 Fetch Operator 12 limit: -1 13 Alias -> Map Local Operator Tree: 14 t1:student 15 TableScan 16 alias: student 17 filterExpr: ((city = 'beijing') and id is not null) (type: boolean) 18 Statistics: Num rows: 3 Data size: 9 Basic stats: COMPLETE Column stats: NONE 19 Filter Operator 20 predicate: ((city = 'beijing') and id is not null) (type: boolean) 21 Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE 22 Select Operator 23 expressions: id (type: int) 24 outputColumnNames: _col0 25 Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE 26 HashTable Sink Operator 27 keys: 28 0 _col0 (type: int) 29 1 id (type: int) 30 31 Stage: Stage-2 32 Map Reduce 33 Map Operator Tree: 34 TableScan 35 alias: t2 36 filterExpr: ((name = 'shanghai') and id is not null) (type: boolean) 37 Statistics: Num rows: 3 Data size: 9 Basic stats: COMPLETE Column stats: NONE 38 Filter Operator 39 predicate: ((name = 'shanghai') and id is not null) (type: boolean) 40 Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE 41 Map Join Operator 42 condition map: 43 Inner Join 0 to 1 44 keys: 45 0 _col0 (type: int) 46 1 id (type: int) 47 Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE 48 Group By Operator 49 aggregations: count(1) 50 mode: hash 51 outputColumnNames: _col0 52 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 53 Reduce Output Operator 54 sort order: 55 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 56 value expressions: _col0 (type: bigint) 57 Local Work: 58 Map Reduce Local Work 59 Reduce Operator Tree: 60 Group By Operator 61 aggregations: count(VALUE._col0) 62 mode: mergepartial 63 outputColumnNames: _col0 64 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 65 File Output Operator 66 compressed: false 67 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 68 table: 69 input format: org.apache.hadoop.mapred.SequenceFileInputFormat 70 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat 71 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 72 73 Stage: Stage-0 74 Fetch Operator 75 limit: -1 76 Processor Tree: 77 ListSink
从中我们可以清晰的看到,两者的执行计划和执行效率其实都差不多。应该是Hive 引擎自动做了谓词下推的优化机制。



