栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Spark优化之执行计划

Spark优化之执行计划

1.sparksql 执行计划

spark.sql("select a.uid,count(1) from a join u on a.uid=u.uid where a.actorlevel>10 group by a.uid order by a.uid").explain()

(1)explain() 查看物理计划

== Physical Plan ==
*(4) Sort [uid#84 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(uid#84 ASC NULLS FIRST, 200)
   +- *(3) HashAggregate(keys=[uid#84], functions=[count(1)])
      +- Exchange hashpartitioning(uid#84, 200)
         +- *(2) HashAggregate(keys=[uid#84], functions=[partial_count(1)])
            +- *(2) Project [uid#84]
               +- *(2) BroadcastHashJoin [cast(uid#84 as bigint)], [uid#23L], Inner, BuildRight
                  :- *(2) Project [uid#84]
                  :  +- *(2) Filter ((isnotnull(actorlevel#50) && (cast(actorlevel#50 as int) > 10)) && isnotnull(uid#84))
                  :     +- *(2) FileScan json [actorlevel#50,uid#84] Batched: false, Format: JSON, Location: InMemoryFileIndex[file:/D:/workspace/sparktest/datas/actorinfo], PartitionFilters: [], PushedFilters: [IsNotNull(actorlevel), IsNotNull(uid)], ReadSchema: struct
                  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]))
                     +- *(1) Project [uid#23L]
                        +- *(1) Filter isnotnull(uid#23L)
                           +- *(1) FileScan json [uid#23L] Batched: false, Format: JSON, Location: InMemoryFileIndex[file:/D:/workspace/sparktest/datas/yhtx_1022], PartitionFilters: [], PushedFilters: [IsNotNull(uid)], ReadSchema: struct

(2)explain(true)查看整个SQL的执行计划,主要分为4个阶段

 --1:解析过程

== Parsed Logical Plan ==
'Sort ['a.uid ASC NULLS FIRST], true
+- 'Aggregate ['a.uid], ['a.uid, unresolvedalias('count(1), None)]
   +- 'Filter ('a.actorlevel > 10)
      +- 'Join Inner, ('a.uid = 'u.uid)
         :- 'UnresolvedRelation `a`
         +- 'UnresolvedRelation `u`

  --2:逻辑阶段

== Analyzed Logical Plan ==
uid: string, count(1): bigint
Project [uid#84, count(1)#181L]
+- Sort [uid#84 ASC NULLS FIRST], true
   +- Aggregate [uid#84], [uid#84, count(1) AS count(1)#181L]
      +- Filter (cast(actorlevel#50 as int) > 10)
         +- Join Inner, (cast(uid#84 as bigint) = uid#23L)
            :- SubqueryAlias a
            :  +- Relation[actorid#48,actorinfoflag#49,actorlevel#50,actvdate#51,adchannel#52,adsubchannel#53,areaid#54,channel#55,city#56,clienLogDate#57,deviceid#58,edition#59,gameid#60,guid#61,imei#62,language#63,lastgamedate#64,logdate#65,logindates#66,logtype#67L,lvlupdate#68,lvluptimes#69L,money#70,netsp#71,... 13 more fields] json
            +- SubqueryAlias u
               +- Relation[actorid#6L,actorname#7,approach#8,areaid#9L,channel#10L,deviceid#11,edition#12L,gameid#13L,imei#14,item#15L,itemcount#16L,itemname#17,logdate#18,logtype#19L,opersys#20L,subchannel#21,type#22L,uid#23L] json

 --3:优化阶段

== Optimized Logical Plan ==
Sort [uid#84 ASC NULLS FIRST], true
+- Aggregate [uid#84], [uid#84, count(1) AS count(1)#181L]
   +- Project [uid#84]
      +- Join Inner, (cast(uid#84 as bigint) = uid#23L)
         :- Project [uid#84]
         :  +- Filter ((isnotnull(actorlevel#50) && (cast(actorlevel#50 as int) > 10)) && isnotnull(uid#84))
         :     +- Relation[actorid#48,actorinfoflag#49,actorlevel#50,actvdate#51,adchannel#52,adsubchannel#53,areaid#54,channel#55,city#56,clienLogDate#57,deviceid#58,edition#59,gameid#60,guid#61,imei#62,language#63,lastgamedate#64,logdate#65,logindates#66,logtype#67L,lvlupdate#68,lvluptimes#69L,money#70,netsp#71,... 13 more fields] json
         +- Project [uid#23L]
            +- Filter isnotnull(uid#23L)
               +- Relation[actorid#6L,actorname#7,approach#8,areaid#9L,channel#10L,deviceid#11,edition#12L,gameid#13L,imei#14,item#15L,itemcount#16L,itemname#17,logdate#18,logtype#19L,opersys#20L,subchannel#21,type#22L,uid#23L] json

--4:物理执行计划

== Physical Plan ==
*(4) Sort [uid#84 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(uid#84 ASC NULLS FIRST, 200)
   +- *(3) HashAggregate(keys=[uid#84], functions=[count(1)], output=[uid#84, count(1)#181L])
      +- Exchange hashpartitioning(uid#84, 200)
         +- *(2) HashAggregate(keys=[uid#84], functions=[partial_count(1)], output=[uid#84, count#184L])
            +- *(2) Project [uid#84]
               +- *(2) BroadcastHashJoin [cast(uid#84 as bigint)], [uid#23L], Inner, BuildRight
                  :- *(2) Project [uid#84]
                  :  +- *(2) Filter ((isnotnull(actorlevel#50) && (cast(actorlevel#50 as int) > 10)) && isnotnull(uid#84))
                  :     +- *(2) FileScan json [actorlevel#50,uid#84] Batched: false, Format: JSON, Location: InMemoryFileIndex[file:/D:/workspace/sparktest/datas/actorinfo], PartitionFilters: [], PushedFilters: [IsNotNull(actorlevel), IsNotNull(uid)], ReadSchema: struct
                  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]))
                     +- *(1) Project [uid#23L]
                        +- *(1) Filter isnotnull(uid#23L)
                           +- *(1) FileScan json [uid#23L] Batched: false, Format: JSON, Location: InMemoryFileIndex[file:/D:/workspace/sparktest/datas/yhtx_1022], PartitionFilters: [], PushedFilters: [IsNotNull(uid)], ReadSchema: struct
 

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/747952.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号