依赖关系:两个相邻RDD之间的关系
血缘关系:多个连续的RDD的依赖关系
下图演示了RDD的血缘关系:
- RDD是不会保存数据的,但是每个RDD会保存自己的血缘关系;
- 血缘关系的意义:因为RDD不保存数据,一旦计算失败了,不能从上一个RDD重新计算,必须重头计算,那么RDD必须要知道数据源在哪里,血缘关系就用于追溯数据源,提高了容错性
血缘关系演示
package SparkCore._04_血缘关系
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object wordcount {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName("wc").setMaster("local[*]")
val sc = new SparkContext(conf)
val fileRDD: RDD[String] = sc.textFile("SparkCore/target/classes/wc.txt")
println(fileRDD.toDebugString)
println("***********************")
val words: RDD[String] = fileRDD.flatMap(_.split(" "))
println(words.toDebugString)
println("***********************")
val wordToOne: RDD[(String, Int)] = words.map((_, 1))
println(wordToOne.toDebugString)
println("***********************")
val wordToSum: RDD[(String, Int)] = wordToOne.reduceByKey(_ + _)
println(wordToSum.toDebugString)
println("***********************")
val tuples: Array[(String, Int)] = wordToSum.collect()
tuples.foreach(println)
//todo 3.关闭链接
sc.stop()
}
}
执行结果:
(2) SparkCore/target/classes/wc.txt MapPartitionsRDD[1] at textFile at wordcount.scala:17 []
| SparkCore/target/classes/wc.txt HadoopRDD[0] at textFile at wordcount.scala:17 []
***********************
(2) MapPartitionsRDD[2] at flatMap at wordcount.scala:21 []
| SparkCore/target/classes/wc.txt MapPartitionsRDD[1] at textFile at wordcount.scala:17 []
| SparkCore/target/classes/wc.txt HadoopRDD[0] at textFile at wordcount.scala:17 []
***********************
(2) MapPartitionsRDD[3] at map at wordcount.scala:25 []
| MapPartitionsRDD[2] at flatMap at wordcount.scala:21 []
| SparkCore/target/classes/wc.txt MapPartitionsRDD[1] at textFile at wordcount.scala:17 []
| SparkCore/target/classes/wc.txt HadoopRDD[0] at textFile at wordcount.scala:17 []
***********************
(2) ShuffledRDD[4] at reduceByKey at wordcount.scala:29 []
+-(2) MapPartitionsRDD[3] at map at wordcount.scala:25 []
| MapPartitionsRDD[2] at flatMap at wordcount.scala:21 []
| SparkCore/target/classes/wc.txt MapPartitionsRDD[1] at textFile at wordcount.scala:17 []
| SparkCore/target/classes/wc.txt HadoopRDD[0] at textFile at wordcount.scala:17 []
***********************
(hive,1)
(mapreduce,1)
(flink,1)
(spark,1)
(hadoop,2)
Process finished with exit code 0
从上面可以看到一个RDD所经过的算子
并且可以看到这个算子是否有shuffle
±:表示依赖断开,也就是经历了shuffle
(1) 表示分区
package SparkCore._04_血缘关系
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object wordcountdep {
def main(args: Array[String]): Unit = {
//todo 1. 建立和Spark框架的链接
val conf: SparkConf = new SparkConf().setAppName("wc").setMaster("local[*]")
val sc = new SparkContext(conf)
//todo 2.业务逻辑处理
val fileRDD: RDD[String] = sc.textFile("SparkCore/target/classes/wc.txt")
println(fileRDD.dependencies)
println("***********************")
val words: RDD[String] = fileRDD.flatMap(_.split(" "))
println(words.dependencies)
println("***********************")
val wordToOne: RDD[(String, Int)] = words.map((_, 1))
println(wordToOne.dependencies)
println("***********************")
val wordToSum: RDD[(String, Int)] = wordToOne.reduceByKey(_ + _)
println(wordToSum.dependencies)
println("***********************")
val tuples: Array[(String, Int)] = wordToSum.collect()
tuples.foreach(println)
//todo 3.关闭链接
sc.stop()
}
}
执行结果:
"D:development softwareJavajdk1.8.0_281binjava.exe" "-javaagent:D:development softwareIDEAIntelliJ IDEA 2021.2libidea_rt.jar=9900:D:development softwareIDEAIntelliJ IDEA 2021.2bin" -Dfile.encoding=UTF-8 -classpath "D:development softwareJavajdk1.8.0_281jrelibcharsets.jar;D:development softwareJavajdk1.8.0_281jrelibdeploy.jar;D:development softwareJavajdk1.8.0_281jrelibextaccess-bridge-64.jar;D:development softwareJavajdk1.8.0_281jrelibextcldrdata.jar;D:development softwareJavajdk1.8.0_281jrelibextdnsns.jar;D:development softwareJavajdk1.8.0_281jrelibextjaccess.jar;D:development softwareJavajdk1.8.0_281jrelibextjfxrt.jar;D:development softwareJavajdk1.8.0_281jrelibextlocaledata.jar;D:development softwareJavajdk1.8.0_281jrelibextnashorn.jar;D:development softwareJavajdk1.8.0_281jrelibextsunec.jar;D:development softwareJavajdk1.8.0_281jrelibextsunjce_provider.jar;D:development softwareJavajdk1.8.0_281jrelibextsunmscapi.jar;D:development softwareJavajdk1.8.0_281jrelibextsunpkcs11.jar;D:development softwareJavajdk1.8.0_281jrelibextzipfs.jar;D:development softwareJavajdk1.8.0_281jrelibjavaws.jar;D:development softwareJavajdk1.8.0_281jrelibjce.jar;D:development softwareJavajdk1.8.0_281jrelibjfr.jar;D:development softwareJavajdk1.8.0_281jrelibjfxswt.jar;D:development softwareJavajdk1.8.0_281jrelibjsse.jar;D:development softwareJavajdk1.8.0_281jrelibmanagement-agent.jar;D:development softwareJavajdk1.8.0_281jrelibplugin.jar;D:development softwareJavajdk1.8.0_281jrelibresources.jar;D:development softwareJavajdk1.8.0_281jrelibrt.jar;D:IdeaProjectscom-yato-bigdataSparkCoretargetclasses;D:development softwarescalascala-2.12.11libscala-library.jar;D:development softwarescalascala-2.12.11libscala-parser-combinators_2.12-1.0.7.jar;D:development softwarescalascala-2.12.11libscala-reflect.jar;D:development softwarescalascala-2.12.11libscala-swing_2.12-2.0.3.jar;D:development softwarescalascala-2.12.11libscala-xml_2.12-1.0.6.jar;D:development softwareRepMavenorgapachesparkspark-core_2.123.0.0spark-core_2.12-3.0.0.jar;D:development softwareRepMavencomthoughtworksparanamerparanamer2.8paranamer-2.8.jar;D:development softwareRepMavenorgapacheavroavro1.8.2avro-1.8.2.jar;D:development softwareRepMavenorgcodehausjacksonjackson-core-asl1.9.13jackson-core-asl-1.9.13.jar;D:development softwareRepMavenorgcodehausjacksonjackson-mapper-asl1.9.13jackson-mapper-asl-1.9.13.jar;D:development softwareRepMavenorgapachecommonscommons-compress1.8.1commons-compress-1.8.1.jar;D:development softwareRepMavenorgtukaanixz1.5xz-1.5.jar;D:development softwareRepMavenorgapacheavroavro-mapred1.8.2avro-mapred-1.8.2-hadoop2.jar;D:development softwareRepMavenorgapacheavroavro-ipc1.8.2avro-ipc-1.8.2.jar;D:development softwareRepMavencommons-codeccommons-codec1.9commons-codec-1.9.jar;D:development softwareRepMavencomtwitterchill_2.12 .9.5chill_2.12-0.9.5.jar;D:development softwareRepMavencomesotericsoftwarekryo-shaded4.0.2kryo-shaded-4.0.2.jar;D:development softwareRepMavencomesotericsoftwareminlog1.3.0minlog-1.3.0.jar;D:development softwareRepMavenorgobjenesisobjenesis2.5.1objenesis-2.5.1.jar;D:development softwareRepMavencomtwitterchill-java .9.5chill-java-0.9.5.jar;D:development softwareRepMavenorgapachexbeanxbean-asm7-shaded4.15xbean-asm7-shaded-4.15.jar;D:development softwareRepMavenorgapachehadoophadoop-client2.7.4hadoop-client-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-common2.7.4hadoop-common-2.7.4.jar;D:development softwareRepMavencommons-clicommons-cli1.2commons-cli-1.2.jar;D:development softwareRepMavenxmlencxmlenc .52xmlenc-0.52.jar;D:development softwareRepMavencommons-httpclientcommons-httpclient3.1commons-httpclient-3.1.jar;D:development softwareRepMavencommons-iocommons-io2.4commons-io-2.4.jar;D:development softwareRepMavencommons-collectionscommons-collections3.2.2commons-collections-3.2.2.jar;D:development softwareRepMavenorgmortbayjettyjetty-sslengine6.1.26jetty-sslengine-6.1.26.jar;D:development softwareRepMavenjavaxservletjspjsp-api2.1jsp-api-2.1.jar;D:development softwareRepMavencommons-langcommons-lang2.6commons-lang-2.6.jar;D:development softwareRepMavencommons-configurationcommons-configuration1.6commons-configuration-1.6.jar;D:development softwareRepMavencommons-digestercommons-digester1.8commons-digester-1.8.jar;D:development softwareRepMavencommons-beanutilscommons-beanutils1.7.0commons-beanutils-1.7.0.jar;D:development softwareRepMavencomgoogleprotobufprotobuf-java2.5.0protobuf-java-2.5.0.jar;D:development softwareRepMavencomgooglecodegsongson2.2.4gson-2.2.4.jar;D:development softwareRepMavenorgapachehadoophadoop-auth2.7.4hadoop-auth-2.7.4.jar;D:development softwareRepMavenorgapachehttpcomponentshttpclient4.2.5httpclient-4.2.5.jar;D:development softwareRepMavenorgapachehttpcomponentshttpcore4.2.4httpcore-4.2.4.jar;D:development softwareRepMavenorgapachedirectoryserverapacheds-kerberos-codec2.0.0-M15apacheds-kerberos-codec-2.0.0-M15.jar;D:development softwareRepMavenorgapachedirectoryserverapacheds-i18n2.0.0-M15apacheds-i18n-2.0.0-M15.jar;D:development softwareRepMavenorgapachedirectoryapiapi-asn1-api1.0.0-M20api-asn1-api-1.0.0-M20.jar;D:development softwareRepMavenorgapachedirectoryapiapi-util1.0.0-M20api-util-1.0.0-M20.jar;D:development softwareRepMavenorgapachecuratorcurator-client2.7.1curator-client-2.7.1.jar;D:development softwareRepMavenorgapachehtracehtrace-core3.1.0-incubatinghtrace-core-3.1.0-incubating.jar;D:development softwareRepMavenorgapachehadoophadoop-hdfs2.7.4hadoop-hdfs-2.7.4.jar;D:development softwareRepMavenorgmortbayjettyjetty-util6.1.26jetty-util-6.1.26.jar;D:development softwareRepMavenxercesxercesImpl2.9.1xercesImpl-2.9.1.jar;D:development softwareRepMavenxml-apisxml-apis1.3.04xml-apis-1.3.04.jar;D:development softwareRepMavenorgapachehadoophadoop-mapreduce-client-app2.7.4hadoop-mapreduce-client-app-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-mapreduce-client-common2.7.4hadoop-mapreduce-client-common-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-yarn-client2.7.4hadoop-yarn-client-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-yarn-server-common2.7.4hadoop-yarn-server-common-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-mapreduce-client-shuffle2.7.4hadoop-mapreduce-client-shuffle-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-yarn-api2.7.4hadoop-yarn-api-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-mapreduce-client-core2.7.4hadoop-mapreduce-client-core-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-yarn-common2.7.4hadoop-yarn-common-2.7.4.jar;D:development softwareRepMavenjavaxxmlbindjaxb-api2.2.2jaxb-api-2.2.2.jar;D:development softwareRepMavenjavaxxmlstreamstax-api1.0-2stax-api-1.0-2.jar;D:development softwareRepMavenorgcodehausjacksonjackson-jaxrs1.9.13jackson-jaxrs-1.9.13.jar;D:development softwareRepMavenorgcodehausjacksonjackson-xc1.9.13jackson-xc-1.9.13.jar;D:development softwareRepMavenorgapachehadoophadoop-mapreduce-client-jobclient2.7.4hadoop-mapreduce-client-jobclient-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-annotations2.7.4hadoop-annotations-2.7.4.jar;D:development softwareRepMavenorgapachesparkspark-launcher_2.123.0.0spark-launcher_2.12-3.0.0.jar;D:development softwareRepMavenorgapachesparkspark-kvstore_2.123.0.0spark-kvstore_2.12-3.0.0.jar;D:development softwareRepMavenorgfusesourceleveldbjnileveldbjni-all1.8leveldbjni-all-1.8.jar;D:development softwareRepMavencomfasterxmljacksoncorejackson-core2.10.0jackson-core-2.10.0.jar;D:development softwareRepMavencomfasterxmljacksoncorejackson-annotations2.10.0jackson-annotations-2.10.0.jar;D:development softwareRepMavenorgapachesparkspark-network-common_2.123.0.0spark-network-common_2.12-3.0.0.jar;D:development softwareRepMavenorgapachesparkspark-network-shuffle_2.123.0.0spark-network-shuffle_2.12-3.0.0.jar;D:development softwareRepMavenorgapachesparkspark-unsafe_2.123.0.0spark-unsafe_2.12-3.0.0.jar;D:development softwareRepMavenjavaxactivationactivation1.1.1activation-1.1.1.jar;D:development softwareRepMavenorgapachecuratorcurator-recipes2.7.1curator-recipes-2.7.1.jar;D:development softwareRepMavenorgapachecuratorcurator-framework2.7.1curator-framework-2.7.1.jar;D:development softwareRepMavencomgoogleguavaguava16.0.1guava-16.0.1.jar;D:development softwareRepMavenorgapachezookeeperzookeeper3.4.14zookeeper-3.4.14.jar;D:development softwareRepMavenorgapacheyetusaudience-annotations .5.0audience-annotations-0.5.0.jar;D:development softwareRepMavenjavaxservletjavax.servlet-api3.1.0javax.servlet-api-3.1.0.jar;D:development softwareRepMavenorgapachecommonscommons-lang33.9commons-lang3-3.9.jar;D:development softwareRepMavenorgapachecommonscommons-math33.4.1commons-math3-3.4.1.jar;D:development softwareRepMavenorgapachecommonscommons-text1.6commons-text-1.6.jar;D:development softwareRepMavencomgooglecodefindbugsjsr3053.0.0jsr305-3.0.0.jar;D:development softwareRepMavenorgslf4jslf4j-api1.7.30slf4j-api-1.7.30.jar;D:development softwareRepMavenorgslf4jjul-to-slf4j1.7.30jul-to-slf4j-1.7.30.jar;D:development softwareRepMavenorgslf4jjcl-over-slf4j1.7.30jcl-over-slf4j-1.7.30.jar;D:development softwareRepMavenlog4jlog4j1.2.17log4j-1.2.17.jar;D:development softwareRepMavenorgslf4jslf4j-log4j121.7.30slf4j-log4j12-1.7.30.jar;D:development softwareRepMavencomningcompress-lzf1.0.3compress-lzf-1.0.3.jar;D:development softwareRepMavenorgxerialsnappysnappy-java1.1.7.5snappy-java-1.1.7.5.jar;D:development softwareRepMavenorglz4lz4-java1.7.1lz4-java-1.7.1.jar;D:development softwareRepMavencomgithublubenzstd-jni1.4.4-3zstd-jni-1.4.4-3.jar;D:development softwareRepMavenorgroaringbitmapRoaringBitmap .7.45RoaringBitmap-0.7.45.jar;D:development softwareRepMavenorgroaringbitmapshims .7.45shims-0.7.45.jar;D:development softwareRepMavencommons-netcommons-net3.1commons-net-3.1.jar;D:development softwareRepMavenorgscala-langmodulesscala-xml_2.121.2.0scala-xml_2.12-1.2.0.jar;D:development softwareRepMavenorgscala-langscala-library2.12.10scala-library-2.12.10.jar;D:development softwareRepMavenorgscala-langscala-reflect2.12.10scala-reflect-2.12.10.jar;D:development softwareRepMavenorgjson4sjson4s-jackson_2.123.6.6json4s-jackson_2.12-3.6.6.jar;D:development softwareRepMavenorgjson4sjson4s-core_2.123.6.6json4s-core_2.12-3.6.6.jar;D:development softwareRepMavenorgjson4sjson4s-ast_2.123.6.6json4s-ast_2.12-3.6.6.jar;D:development softwareRepMavenorgjson4sjson4s-scalap_2.123.6.6json4s-scalap_2.12-3.6.6.jar;D:development softwareRepMavenorgglassfishjerseycorejersey-client2.30jersey-client-2.30.jar;D:development softwareRepMavenjakartawsrsjakarta.ws.rs-api2.1.6jakarta.ws.rs-api-2.1.6.jar;D:development softwareRepMavenorgglassfishhk2externaljakarta.inject2.6.1jakarta.inject-2.6.1.jar;D:development softwareRepMavenorgglassfishjerseycorejersey-common2.30jersey-common-2.30.jar;D:development softwareRepMavenjakartaannotationjakarta.annotation-api1.3.5jakarta.annotation-api-1.3.5.jar;D:development softwareRepMavenorgglassfishhk2osgi-resource-locator1.0.3osgi-resource-locator-1.0.3.jar;D:development softwareRepMavenorgglassfishjerseycorejersey-server2.30jersey-server-2.30.jar;D:development softwareRepMavenorgglassfishjerseymediajersey-media-jaxb2.30jersey-media-jaxb-2.30.jar;D:development softwareRepMavenjakartavalidationjakarta.validation-api2.0.2jakarta.validation-api-2.0.2.jar;D:development softwareRepMavenorgglassfishjerseycontainersjersey-container-servlet2.30jersey-container-servlet-2.30.jar;D:development softwareRepMavenorgglassfishjerseycontainersjersey-container-servlet-core2.30jersey-container-servlet-core-2.30.jar;D:development softwareRepMavenorgglassfishjerseyinjectjersey-hk22.30jersey-hk2-2.30.jar;D:development softwareRepMavenorgglassfishhk2hk2-locator2.6.1hk2-locator-2.6.1.jar;D:development softwareRepMavenorgglassfishhk2externalaopalliance-repackaged2.6.1aopalliance-repackaged-2.6.1.jar;D:development softwareRepMavenorgglassfishhk2hk2-api2.6.1hk2-api-2.6.1.jar;D:development softwareRepMavenorgglassfishhk2hk2-utils2.6.1hk2-utils-2.6.1.jar;D:development softwareRepMavenorgjavassistjavassist3.25.0-GAjavassist-3.25.0-GA.jar;D:development softwareRepMavenionettynetty-all4.1.47.Finalnetty-all-4.1.47.Final.jar;D:development softwareRepMavencomclearspringanalyticsstream2.9.6stream-2.9.6.jar;D:development softwareRepMaveniodropwizardmetricsmetrics-core4.1.1metrics-core-4.1.1.jar;D:development softwareRepMaveniodropwizardmetricsmetrics-jvm4.1.1metrics-jvm-4.1.1.jar;D:development softwareRepMaveniodropwizardmetricsmetrics-json4.1.1metrics-json-4.1.1.jar;D:development softwareRepMaveniodropwizardmetricsmetrics-graphite4.1.1metrics-graphite-4.1.1.jar;D:development softwareRepMaveniodropwizardmetricsmetrics-jmx4.1.1metrics-jmx-4.1.1.jar;D:development softwareRepMavencomfasterxmljacksoncorejackson-databind2.10.0jackson-databind-2.10.0.jar;D:development softwareRepMavencomfasterxmljacksonmodulejackson-module-scala_2.122.10.0jackson-module-scala_2.12-2.10.0.jar;D:development softwareRepMavencomfasterxmljacksonmodulejackson-module-paranamer2.10.0jackson-module-paranamer-2.10.0.jar;D:development softwareRepMavenorgapacheivyivy2.4.0ivy-2.4.0.jar;D:development softwareRepMavenorooro2.0.8oro-2.0.8.jar;D:development softwareRepMavennetrazorvinepyrolite4.30pyrolite-4.30.jar;D:development softwareRepMavennetsfpy4jpy4j .10.9py4j-0.10.9.jar;D:development softwareRepMavenorgapachesparkspark-tags_2.123.0.0spark-tags_2.12-3.0.0.jar;D:development softwareRepMavenorgapachecommonscommons-crypto1.0.0commons-crypto-1.0.0.jar;D:development softwareRepMavenorgspark-projectsparkunused1.0.0unused-1.0.0.jar" SparkCore._04_血缘关系.wordcountdep List(org.apache.spark.OneToOneDependency@4d41ba0f) *********************** List(org.apache.spark.OneToOneDependency@59072e9d) *********************** List(org.apache.spark.OneToOneDependency@2924f1d8) *********************** List(org.apache.spark.ShuffleDependency@7dffda8b) *********************** (hive,1) (mapreduce,1) (flink,1) (spark,1) (hadoop,2) Process finished with exit code 0
(1)oneToOneDependency
-
oneTooneDependency又叫做窄依赖
-
窄依赖表示每一个父(上游)RDD的Partition最多被子(下游)RDD的一个Partition使用,窄依赖我们形象的比喻为独生子女。
(2)shuffleDependency
-
shuffleDependency又叫做宽依赖
-
宽依赖表示同一个父(上游)RDD的Partition被多个子(下游)RDD的Partition依赖,会引起Shuffle,总结:宽依赖我们形象的比喻为多生。
(1)窄依赖
窄依赖数据分区到分区,task个数不变,还是n个task
(2)宽依赖
宽依赖前后的Task数量会改变,shuffle前task数量等于分区数n,shuffle后task数量等于分区数m,一共n+m个task
- 对于宽依赖来说,下游RDD分区的数据是经过上游RDD各个分区数据打乱重组的,因此,上游RDD必须每个分区的数据都准备好,下游RDD才能进行运算
- 对于窄依赖来说,分区之间可以相互独立运行,分区1不需要等分区2,因此不需要阶段划分
DAG(Directed Acyclic Graph)有向无环图是由点和线组成的拓扑图形,该图形具有方向,不会闭环。例如,DAG记录了RDD的转换过程和任务的阶段。
阶段和shuffle有必然的联系
(1)从行动算子进去
一直到
(1) 根据触发行动算子的RDD创建ResultStage
(2) 然后由此RDD沿着依赖往前追溯,如果是shuffleDependency,就会创建ShuffleMapStage
(3) 结论就是:
① 阶段数量= shuffle依赖数量+1,这个1其实指的就是ResultStage,因为没有shuffle依赖,也会有ResultStage;
② ResultStage永远只有一个,就是最后需要执行的stage。
四个概念:Application、Job、Stage和Task
- Application:初始化一个SparkContext即生成一个Application;
- Job:一个Action算子就会生成一个Job;
- Stage:Stage等于宽依赖(ShuffleDependency)的个数加1;
- task:
(1)一个Stage阶段中,最后一个RDD的分区个数就是当前Stage的Task的个数。
(2)一个作业(Application)的task个数就是所有stage的task个数总和
Application->Job->Stage->Task每一层都是1对n的关系。
- 一个Application可以有多个行动算子
- 一个job有>=0数量的shuffle依赖
- 一个stage中最后的Rdd有>=1的分区,一个分区对应一个任务
没有阶段划分,任务数量怎么来的
①submitStage(finalStage)
②val missing = getMissingParaentStage(stage)
③submitMissingTasks(stage,jobId.get) 提交没有上一阶段的tasks
④ Seq[Task]:当前阶段中所有的task
1.Task集合的来源:
匹配当前阶段的类型(说明不同阶段的任务是不一样的)
2. 从上面的截图中可以看出partitionsToCompute的map算子里面创建Task,因为map是一一映射,所以task数量取决于partitionsToCompute
结论:
Job.numPartitions来自于阶段中最后一个RDD的分区个数,所以一个阶段中的Task数量=当前阶段中最后一个RDD的分区个数。
Task的种类和stage是挂钩的
Stage分为ShuffleMapStage和 ResultStage



