栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

【Spark Core】RDD依赖关系

【Spark Core】RDD依赖关系

1.RDD 血缘关系


依赖关系:两个相邻RDD之间的关系
血缘关系:多个连续的RDD的依赖关系

2.RDD血缘关系的演示

下图演示了RDD的血缘关系:

  • RDD是不会保存数据的,但是每个RDD会保存自己的血缘关系;
  • 血缘关系的意义:因为RDD不保存数据,一旦计算失败了,不能从上一个RDD重新计算,必须重头计算,那么RDD必须要知道数据源在哪里,血缘关系就用于追溯数据源,提高了容错性

血缘关系演示

package SparkCore._04_血缘关系

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}


object wordcount {
  def main(args: Array[String]): Unit = {

    val conf: SparkConf = new SparkConf().setAppName("wc").setMaster("local[*]")
    val sc = new SparkContext(conf)

    val fileRDD: RDD[String] = sc.textFile("SparkCore/target/classes/wc.txt")
    println(fileRDD.toDebugString)
    println("***********************")

    val words: RDD[String] = fileRDD.flatMap(_.split(" "))
    println(words.toDebugString)
    println("***********************")

    val wordToOne: RDD[(String, Int)] = words.map((_, 1))
    println(wordToOne.toDebugString)
    println("***********************")

    val wordToSum: RDD[(String, Int)] = wordToOne.reduceByKey(_ + _)
    println(wordToSum.toDebugString)
    println("***********************")
    
    val tuples: Array[(String, Int)] = wordToSum.collect()
    tuples.foreach(println)


    //todo 3.关闭链接
    sc.stop()

  }
}

执行结果:

(2) SparkCore/target/classes/wc.txt MapPartitionsRDD[1] at textFile at wordcount.scala:17 []
 |  SparkCore/target/classes/wc.txt HadoopRDD[0] at textFile at wordcount.scala:17 []
***********************
(2) MapPartitionsRDD[2] at flatMap at wordcount.scala:21 []
 |  SparkCore/target/classes/wc.txt MapPartitionsRDD[1] at textFile at wordcount.scala:17 []
 |  SparkCore/target/classes/wc.txt HadoopRDD[0] at textFile at wordcount.scala:17 []
***********************
(2) MapPartitionsRDD[3] at map at wordcount.scala:25 []
 |  MapPartitionsRDD[2] at flatMap at wordcount.scala:21 []
 |  SparkCore/target/classes/wc.txt MapPartitionsRDD[1] at textFile at wordcount.scala:17 []
 |  SparkCore/target/classes/wc.txt HadoopRDD[0] at textFile at wordcount.scala:17 []
***********************
(2) ShuffledRDD[4] at reduceByKey at wordcount.scala:29 []
 +-(2) MapPartitionsRDD[3] at map at wordcount.scala:25 []
    |  MapPartitionsRDD[2] at flatMap at wordcount.scala:21 []
    |  SparkCore/target/classes/wc.txt MapPartitionsRDD[1] at textFile at wordcount.scala:17 []
    |  SparkCore/target/classes/wc.txt HadoopRDD[0] at textFile at wordcount.scala:17 []
***********************
(hive,1)
(mapreduce,1)
(flink,1)
(spark,1)
(hadoop,2)

Process finished with exit code 0

从上面可以看到一个RDD所经过的算子
并且可以看到这个算子是否有shuffle
±:表示依赖断开,也就是经历了shuffle
(1) 表示分区

3.RDD依赖关系演示
package SparkCore._04_血缘关系

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}


object wordcountdep {
  def main(args: Array[String]): Unit = {
    //todo 1. 建立和Spark框架的链接
    val conf: SparkConf = new SparkConf().setAppName("wc").setMaster("local[*]")
    val sc = new SparkContext(conf)

    //todo 2.业务逻辑处理
    val fileRDD: RDD[String] = sc.textFile("SparkCore/target/classes/wc.txt")
    println(fileRDD.dependencies)
    println("***********************")

    val words: RDD[String] = fileRDD.flatMap(_.split(" "))
    println(words.dependencies)
    println("***********************")

    val wordToOne: RDD[(String, Int)] = words.map((_, 1))
    println(wordToOne.dependencies)
    println("***********************")

    val wordToSum: RDD[(String, Int)] = wordToOne.reduceByKey(_ + _)
    println(wordToSum.dependencies)
    println("***********************")

    val tuples: Array[(String, Int)] = wordToSum.collect()
    tuples.foreach(println)


    //todo 3.关闭链接
    sc.stop()

  }
}

执行结果:

"D:development softwareJavajdk1.8.0_281binjava.exe" "-javaagent:D:development softwareIDEAIntelliJ IDEA 2021.2libidea_rt.jar=9900:D:development softwareIDEAIntelliJ IDEA 2021.2bin" -Dfile.encoding=UTF-8 -classpath "D:development softwareJavajdk1.8.0_281jrelibcharsets.jar;D:development softwareJavajdk1.8.0_281jrelibdeploy.jar;D:development softwareJavajdk1.8.0_281jrelibextaccess-bridge-64.jar;D:development softwareJavajdk1.8.0_281jrelibextcldrdata.jar;D:development softwareJavajdk1.8.0_281jrelibextdnsns.jar;D:development softwareJavajdk1.8.0_281jrelibextjaccess.jar;D:development softwareJavajdk1.8.0_281jrelibextjfxrt.jar;D:development softwareJavajdk1.8.0_281jrelibextlocaledata.jar;D:development softwareJavajdk1.8.0_281jrelibextnashorn.jar;D:development softwareJavajdk1.8.0_281jrelibextsunec.jar;D:development softwareJavajdk1.8.0_281jrelibextsunjce_provider.jar;D:development softwareJavajdk1.8.0_281jrelibextsunmscapi.jar;D:development softwareJavajdk1.8.0_281jrelibextsunpkcs11.jar;D:development softwareJavajdk1.8.0_281jrelibextzipfs.jar;D:development softwareJavajdk1.8.0_281jrelibjavaws.jar;D:development softwareJavajdk1.8.0_281jrelibjce.jar;D:development softwareJavajdk1.8.0_281jrelibjfr.jar;D:development softwareJavajdk1.8.0_281jrelibjfxswt.jar;D:development softwareJavajdk1.8.0_281jrelibjsse.jar;D:development softwareJavajdk1.8.0_281jrelibmanagement-agent.jar;D:development softwareJavajdk1.8.0_281jrelibplugin.jar;D:development softwareJavajdk1.8.0_281jrelibresources.jar;D:development softwareJavajdk1.8.0_281jrelibrt.jar;D:IdeaProjectscom-yato-bigdataSparkCoretargetclasses;D:development softwarescalascala-2.12.11libscala-library.jar;D:development softwarescalascala-2.12.11libscala-parser-combinators_2.12-1.0.7.jar;D:development softwarescalascala-2.12.11libscala-reflect.jar;D:development softwarescalascala-2.12.11libscala-swing_2.12-2.0.3.jar;D:development softwarescalascala-2.12.11libscala-xml_2.12-1.0.6.jar;D:development softwareRepMavenorgapachesparkspark-core_2.123.0.0spark-core_2.12-3.0.0.jar;D:development softwareRepMavencomthoughtworksparanamerparanamer2.8paranamer-2.8.jar;D:development softwareRepMavenorgapacheavroavro1.8.2avro-1.8.2.jar;D:development softwareRepMavenorgcodehausjacksonjackson-core-asl1.9.13jackson-core-asl-1.9.13.jar;D:development softwareRepMavenorgcodehausjacksonjackson-mapper-asl1.9.13jackson-mapper-asl-1.9.13.jar;D:development softwareRepMavenorgapachecommonscommons-compress1.8.1commons-compress-1.8.1.jar;D:development softwareRepMavenorgtukaanixz1.5xz-1.5.jar;D:development softwareRepMavenorgapacheavroavro-mapred1.8.2avro-mapred-1.8.2-hadoop2.jar;D:development softwareRepMavenorgapacheavroavro-ipc1.8.2avro-ipc-1.8.2.jar;D:development softwareRepMavencommons-codeccommons-codec1.9commons-codec-1.9.jar;D:development softwareRepMavencomtwitterchill_2.12.9.5chill_2.12-0.9.5.jar;D:development softwareRepMavencomesotericsoftwarekryo-shaded4.0.2kryo-shaded-4.0.2.jar;D:development softwareRepMavencomesotericsoftwareminlog1.3.0minlog-1.3.0.jar;D:development softwareRepMavenorgobjenesisobjenesis2.5.1objenesis-2.5.1.jar;D:development softwareRepMavencomtwitterchill-java.9.5chill-java-0.9.5.jar;D:development softwareRepMavenorgapachexbeanxbean-asm7-shaded4.15xbean-asm7-shaded-4.15.jar;D:development softwareRepMavenorgapachehadoophadoop-client2.7.4hadoop-client-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-common2.7.4hadoop-common-2.7.4.jar;D:development softwareRepMavencommons-clicommons-cli1.2commons-cli-1.2.jar;D:development softwareRepMavenxmlencxmlenc.52xmlenc-0.52.jar;D:development softwareRepMavencommons-httpclientcommons-httpclient3.1commons-httpclient-3.1.jar;D:development softwareRepMavencommons-iocommons-io2.4commons-io-2.4.jar;D:development softwareRepMavencommons-collectionscommons-collections3.2.2commons-collections-3.2.2.jar;D:development softwareRepMavenorgmortbayjettyjetty-sslengine6.1.26jetty-sslengine-6.1.26.jar;D:development softwareRepMavenjavaxservletjspjsp-api2.1jsp-api-2.1.jar;D:development softwareRepMavencommons-langcommons-lang2.6commons-lang-2.6.jar;D:development softwareRepMavencommons-configurationcommons-configuration1.6commons-configuration-1.6.jar;D:development softwareRepMavencommons-digestercommons-digester1.8commons-digester-1.8.jar;D:development softwareRepMavencommons-beanutilscommons-beanutils1.7.0commons-beanutils-1.7.0.jar;D:development softwareRepMavencomgoogleprotobufprotobuf-java2.5.0protobuf-java-2.5.0.jar;D:development softwareRepMavencomgooglecodegsongson2.2.4gson-2.2.4.jar;D:development softwareRepMavenorgapachehadoophadoop-auth2.7.4hadoop-auth-2.7.4.jar;D:development softwareRepMavenorgapachehttpcomponentshttpclient4.2.5httpclient-4.2.5.jar;D:development softwareRepMavenorgapachehttpcomponentshttpcore4.2.4httpcore-4.2.4.jar;D:development softwareRepMavenorgapachedirectoryserverapacheds-kerberos-codec2.0.0-M15apacheds-kerberos-codec-2.0.0-M15.jar;D:development softwareRepMavenorgapachedirectoryserverapacheds-i18n2.0.0-M15apacheds-i18n-2.0.0-M15.jar;D:development softwareRepMavenorgapachedirectoryapiapi-asn1-api1.0.0-M20api-asn1-api-1.0.0-M20.jar;D:development softwareRepMavenorgapachedirectoryapiapi-util1.0.0-M20api-util-1.0.0-M20.jar;D:development softwareRepMavenorgapachecuratorcurator-client2.7.1curator-client-2.7.1.jar;D:development softwareRepMavenorgapachehtracehtrace-core3.1.0-incubatinghtrace-core-3.1.0-incubating.jar;D:development softwareRepMavenorgapachehadoophadoop-hdfs2.7.4hadoop-hdfs-2.7.4.jar;D:development softwareRepMavenorgmortbayjettyjetty-util6.1.26jetty-util-6.1.26.jar;D:development softwareRepMavenxercesxercesImpl2.9.1xercesImpl-2.9.1.jar;D:development softwareRepMavenxml-apisxml-apis1.3.04xml-apis-1.3.04.jar;D:development softwareRepMavenorgapachehadoophadoop-mapreduce-client-app2.7.4hadoop-mapreduce-client-app-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-mapreduce-client-common2.7.4hadoop-mapreduce-client-common-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-yarn-client2.7.4hadoop-yarn-client-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-yarn-server-common2.7.4hadoop-yarn-server-common-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-mapreduce-client-shuffle2.7.4hadoop-mapreduce-client-shuffle-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-yarn-api2.7.4hadoop-yarn-api-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-mapreduce-client-core2.7.4hadoop-mapreduce-client-core-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-yarn-common2.7.4hadoop-yarn-common-2.7.4.jar;D:development softwareRepMavenjavaxxmlbindjaxb-api2.2.2jaxb-api-2.2.2.jar;D:development softwareRepMavenjavaxxmlstreamstax-api1.0-2stax-api-1.0-2.jar;D:development softwareRepMavenorgcodehausjacksonjackson-jaxrs1.9.13jackson-jaxrs-1.9.13.jar;D:development softwareRepMavenorgcodehausjacksonjackson-xc1.9.13jackson-xc-1.9.13.jar;D:development softwareRepMavenorgapachehadoophadoop-mapreduce-client-jobclient2.7.4hadoop-mapreduce-client-jobclient-2.7.4.jar;D:development softwareRepMavenorgapachehadoophadoop-annotations2.7.4hadoop-annotations-2.7.4.jar;D:development softwareRepMavenorgapachesparkspark-launcher_2.123.0.0spark-launcher_2.12-3.0.0.jar;D:development softwareRepMavenorgapachesparkspark-kvstore_2.123.0.0spark-kvstore_2.12-3.0.0.jar;D:development softwareRepMavenorgfusesourceleveldbjnileveldbjni-all1.8leveldbjni-all-1.8.jar;D:development softwareRepMavencomfasterxmljacksoncorejackson-core2.10.0jackson-core-2.10.0.jar;D:development softwareRepMavencomfasterxmljacksoncorejackson-annotations2.10.0jackson-annotations-2.10.0.jar;D:development softwareRepMavenorgapachesparkspark-network-common_2.123.0.0spark-network-common_2.12-3.0.0.jar;D:development softwareRepMavenorgapachesparkspark-network-shuffle_2.123.0.0spark-network-shuffle_2.12-3.0.0.jar;D:development softwareRepMavenorgapachesparkspark-unsafe_2.123.0.0spark-unsafe_2.12-3.0.0.jar;D:development softwareRepMavenjavaxactivationactivation1.1.1activation-1.1.1.jar;D:development softwareRepMavenorgapachecuratorcurator-recipes2.7.1curator-recipes-2.7.1.jar;D:development softwareRepMavenorgapachecuratorcurator-framework2.7.1curator-framework-2.7.1.jar;D:development softwareRepMavencomgoogleguavaguava16.0.1guava-16.0.1.jar;D:development softwareRepMavenorgapachezookeeperzookeeper3.4.14zookeeper-3.4.14.jar;D:development softwareRepMavenorgapacheyetusaudience-annotations.5.0audience-annotations-0.5.0.jar;D:development softwareRepMavenjavaxservletjavax.servlet-api3.1.0javax.servlet-api-3.1.0.jar;D:development softwareRepMavenorgapachecommonscommons-lang33.9commons-lang3-3.9.jar;D:development softwareRepMavenorgapachecommonscommons-math33.4.1commons-math3-3.4.1.jar;D:development softwareRepMavenorgapachecommonscommons-text1.6commons-text-1.6.jar;D:development softwareRepMavencomgooglecodefindbugsjsr3053.0.0jsr305-3.0.0.jar;D:development softwareRepMavenorgslf4jslf4j-api1.7.30slf4j-api-1.7.30.jar;D:development softwareRepMavenorgslf4jjul-to-slf4j1.7.30jul-to-slf4j-1.7.30.jar;D:development softwareRepMavenorgslf4jjcl-over-slf4j1.7.30jcl-over-slf4j-1.7.30.jar;D:development softwareRepMavenlog4jlog4j1.2.17log4j-1.2.17.jar;D:development softwareRepMavenorgslf4jslf4j-log4j121.7.30slf4j-log4j12-1.7.30.jar;D:development softwareRepMavencomningcompress-lzf1.0.3compress-lzf-1.0.3.jar;D:development softwareRepMavenorgxerialsnappysnappy-java1.1.7.5snappy-java-1.1.7.5.jar;D:development softwareRepMavenorglz4lz4-java1.7.1lz4-java-1.7.1.jar;D:development softwareRepMavencomgithublubenzstd-jni1.4.4-3zstd-jni-1.4.4-3.jar;D:development softwareRepMavenorgroaringbitmapRoaringBitmap.7.45RoaringBitmap-0.7.45.jar;D:development softwareRepMavenorgroaringbitmapshims.7.45shims-0.7.45.jar;D:development softwareRepMavencommons-netcommons-net3.1commons-net-3.1.jar;D:development softwareRepMavenorgscala-langmodulesscala-xml_2.121.2.0scala-xml_2.12-1.2.0.jar;D:development softwareRepMavenorgscala-langscala-library2.12.10scala-library-2.12.10.jar;D:development softwareRepMavenorgscala-langscala-reflect2.12.10scala-reflect-2.12.10.jar;D:development softwareRepMavenorgjson4sjson4s-jackson_2.123.6.6json4s-jackson_2.12-3.6.6.jar;D:development softwareRepMavenorgjson4sjson4s-core_2.123.6.6json4s-core_2.12-3.6.6.jar;D:development softwareRepMavenorgjson4sjson4s-ast_2.123.6.6json4s-ast_2.12-3.6.6.jar;D:development softwareRepMavenorgjson4sjson4s-scalap_2.123.6.6json4s-scalap_2.12-3.6.6.jar;D:development softwareRepMavenorgglassfishjerseycorejersey-client2.30jersey-client-2.30.jar;D:development softwareRepMavenjakartawsrsjakarta.ws.rs-api2.1.6jakarta.ws.rs-api-2.1.6.jar;D:development softwareRepMavenorgglassfishhk2externaljakarta.inject2.6.1jakarta.inject-2.6.1.jar;D:development softwareRepMavenorgglassfishjerseycorejersey-common2.30jersey-common-2.30.jar;D:development softwareRepMavenjakartaannotationjakarta.annotation-api1.3.5jakarta.annotation-api-1.3.5.jar;D:development softwareRepMavenorgglassfishhk2osgi-resource-locator1.0.3osgi-resource-locator-1.0.3.jar;D:development softwareRepMavenorgglassfishjerseycorejersey-server2.30jersey-server-2.30.jar;D:development softwareRepMavenorgglassfishjerseymediajersey-media-jaxb2.30jersey-media-jaxb-2.30.jar;D:development softwareRepMavenjakartavalidationjakarta.validation-api2.0.2jakarta.validation-api-2.0.2.jar;D:development softwareRepMavenorgglassfishjerseycontainersjersey-container-servlet2.30jersey-container-servlet-2.30.jar;D:development softwareRepMavenorgglassfishjerseycontainersjersey-container-servlet-core2.30jersey-container-servlet-core-2.30.jar;D:development softwareRepMavenorgglassfishjerseyinjectjersey-hk22.30jersey-hk2-2.30.jar;D:development softwareRepMavenorgglassfishhk2hk2-locator2.6.1hk2-locator-2.6.1.jar;D:development softwareRepMavenorgglassfishhk2externalaopalliance-repackaged2.6.1aopalliance-repackaged-2.6.1.jar;D:development softwareRepMavenorgglassfishhk2hk2-api2.6.1hk2-api-2.6.1.jar;D:development softwareRepMavenorgglassfishhk2hk2-utils2.6.1hk2-utils-2.6.1.jar;D:development softwareRepMavenorgjavassistjavassist3.25.0-GAjavassist-3.25.0-GA.jar;D:development softwareRepMavenionettynetty-all4.1.47.Finalnetty-all-4.1.47.Final.jar;D:development softwareRepMavencomclearspringanalyticsstream2.9.6stream-2.9.6.jar;D:development softwareRepMaveniodropwizardmetricsmetrics-core4.1.1metrics-core-4.1.1.jar;D:development softwareRepMaveniodropwizardmetricsmetrics-jvm4.1.1metrics-jvm-4.1.1.jar;D:development softwareRepMaveniodropwizardmetricsmetrics-json4.1.1metrics-json-4.1.1.jar;D:development softwareRepMaveniodropwizardmetricsmetrics-graphite4.1.1metrics-graphite-4.1.1.jar;D:development softwareRepMaveniodropwizardmetricsmetrics-jmx4.1.1metrics-jmx-4.1.1.jar;D:development softwareRepMavencomfasterxmljacksoncorejackson-databind2.10.0jackson-databind-2.10.0.jar;D:development softwareRepMavencomfasterxmljacksonmodulejackson-module-scala_2.122.10.0jackson-module-scala_2.12-2.10.0.jar;D:development softwareRepMavencomfasterxmljacksonmodulejackson-module-paranamer2.10.0jackson-module-paranamer-2.10.0.jar;D:development softwareRepMavenorgapacheivyivy2.4.0ivy-2.4.0.jar;D:development softwareRepMavenorooro2.0.8oro-2.0.8.jar;D:development softwareRepMavennetrazorvinepyrolite4.30pyrolite-4.30.jar;D:development softwareRepMavennetsfpy4jpy4j.10.9py4j-0.10.9.jar;D:development softwareRepMavenorgapachesparkspark-tags_2.123.0.0spark-tags_2.12-3.0.0.jar;D:development softwareRepMavenorgapachecommonscommons-crypto1.0.0commons-crypto-1.0.0.jar;D:development softwareRepMavenorgspark-projectsparkunused1.0.0unused-1.0.0.jar" SparkCore._04_血缘关系.wordcountdep
List(org.apache.spark.OneToOneDependency@4d41ba0f)
***********************
List(org.apache.spark.OneToOneDependency@59072e9d)
***********************
List(org.apache.spark.OneToOneDependency@2924f1d8)
***********************
List(org.apache.spark.ShuffleDependency@7dffda8b)
***********************
(hive,1)
(mapreduce,1)
(flink,1)
(spark,1)
(hadoop,2)

Process finished with exit code 0

(1)oneToOneDependency

  • oneTooneDependency又叫做窄依赖

  • 窄依赖表示每一个父(上游)RDD的Partition最多被子(下游)RDD的一个Partition使用,窄依赖我们形象的比喻为独生子女。

(2)shuffleDependency

  • shuffleDependency又叫做宽依赖

  • 宽依赖表示同一个父(上游)RDD的Partition被多个子(下游)RDD的Partition依赖,会引起Shuffle,总结:宽依赖我们形象的比喻为多生。

4.stage&partition&task 4.1Task数量和分区的关系

(1)窄依赖
窄依赖数据分区到分区,task个数不变,还是n个task

(2)宽依赖
宽依赖前后的Task数量会改变,shuffle前task数量等于分区数n,shuffle后task数量等于分区数m,一共n+m个task

4.2 阶段的划分
  • 对于宽依赖来说,下游RDD分区的数据是经过上游RDD各个分区数据打乱重组的,因此,上游RDD必须每个分区的数据都准备好,下游RDD才能进行运算
  • 对于窄依赖来说,分区之间可以相互独立运行,分区1不需要等分区2,因此不需要阶段划分

    DAG(Directed Acyclic Graph)有向无环图是由点和线组成的拓扑图形,该图形具有方向,不会闭环。例如,DAG记录了RDD的转换过程和任务的阶段。

    阶段和shuffle有必然的联系
4.3 阶段划分源码

(1)从行动算子进去


一直到

(1) 根据触发行动算子的RDD创建ResultStage
(2) 然后由此RDD沿着依赖往前追溯,如果是shuffleDependency,就会创建ShuffleMapStage
(3) 结论就是:
① 阶段数量= shuffle依赖数量+1,这个1其实指的就是ResultStage,因为没有shuffle依赖,也会有ResultStage;
② ResultStage永远只有一个,就是最后需要执行的stage。

4.4 RDD 任务划分

四个概念:Application、Job、Stage和Task

  • Application:初始化一个SparkContext即生成一个Application;
  • Job:一个Action算子就会生成一个Job;
  • Stage:Stage等于宽依赖(ShuffleDependency)的个数加1;
  • task:

(1)一个Stage阶段中,最后一个RDD的分区个数就是当前Stage的Task的个数。
(2)一个作业(Application)的task个数就是所有stage的task个数总和

Application->Job->Stage->Task每一层都是1对n的关系。

  • 一个Application可以有多个行动算子
  • 一个job有>=0数量的shuffle依赖
  • 一个stage中最后的Rdd有>=1的分区,一个分区对应一个任务
4.5 Task的数量

没有阶段划分,任务数量怎么来的
①submitStage(finalStage)
②val missing = getMissingParaentStage(stage)
③submitMissingTasks(stage,jobId.get) 提交没有上一阶段的tasks
④ Seq[Task]:当前阶段中所有的task

1.Task集合的来源:
匹配当前阶段的类型(说明不同阶段的任务是不一样的)


2. 从上面的截图中可以看出partitionsToCompute的map算子里面创建Task,因为map是一一映射,所以task数量取决于partitionsToCompute


结论:
Job.numPartitions来自于阶段中最后一个RDD的分区个数,所以一个阶段中的Task数量=当前阶段中最后一个RDD的分区个数。

4.6 Task种类的划分

Task的种类和stage是挂钩的
Stage分为ShuffleMapStage和 ResultStage

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/307929.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号