WordCount代码 为了能直观地感受 Spark 框架的效果,接下来我们实现一个大数据学科中最常见的教学 案例 WordCountorg.apache.spark spark-core_2.123.0.0 net.alchim31.maven scala-maven-plugin3.2.2 testCompile org.apache.maven.plugins maven-assembly-plugin2.2.1 jar-with-dependencies make-assembly package single
package com.muzili.applications
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object wordcount {
def main(args: Array[String]): Unit = {
// 创建 Spark 运行配置对象
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
// 创建 Spark 上下文环境对象(连接对象)
val sc : SparkContext = new SparkContext(sparkConf)
// 读取文件数据
val fileRDD: RDD[String] = sc.textFile("C:\Users\muzili\Desktop/word.txt")
// 将文件中的数据进行分词
val wordRDD: RDD[String] = fileRDD.flatMap( _.split(" ") )
// 转换数据结构 word => (word, 1)
val word2OneRDD: RDD[(String, Int)] = wordRDD.map((_,1))
// 将转换结构后的数据按照相同的单词进行分组聚合
val word2CountRDD: RDD[(String, Int)] = word2OneRDD.reduceByKey(_+_)
// 将数据聚合结果采集到内存中
val word2Count: Array[(String, Int)] = word2CountRDD.collect()
// 打印结果
word2Count.foreach(println)
//关闭 Spark 连接
sc.stop()
}
}
并在桌面创建文件word.txt:
hello scala hello spark hello hadoop hello flink执行过程中,会产生大量的执行日志,日志见下文打印日志一,如果为了能够更好的查看程序的执行结果,可以在项目的 resources 目录中创建 log4j.properties 文件,并添加日志配置信息:
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd
HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to ERROR. When running the spark-shell,
the
# log level for this class is used to overwrite the root logger's log level, so
that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=ERROR
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=ERROR
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent
UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
打印日志一:
D:developer_toolsJavajdk1.8.0_251binjava.exe "-javaagent:D:developer_toolsIntelliJ IDEAIntelliJ IDEA 2020.1.1libidea_rt.jar=51489:D:developer_toolsIntelliJ IDEAIntelliJ IDEA 2020.1.1bin" -Dfile.encoding=UTF-8 -classpath D:developer_toolsJavajdk1.8.0_251jrelibcharsets.jar;D:developer_toolsJavajdk1.8.0_251jrelibdeploy.jar;D:developer_toolsJavajdk1.8.0_251jrelibextaccess-bridge-64.jar;D:developer_toolsJavajdk1.8.0_251jrelibextcldrdata.jar;D:developer_toolsJavajdk1.8.0_251jrelibextdnsns.jar;D:developer_toolsJavajdk1.8.0_251jrelibextjaccess.jar;D:developer_toolsJavajdk1.8.0_251jrelibextjfxrt.jar;D:developer_toolsJavajdk1.8.0_251jrelibextlocaledata.jar;D:developer_toolsJavajdk1.8.0_251jrelibextnashorn.jar;D:developer_toolsJavajdk1.8.0_251jrelibextsunec.jar;D:developer_toolsJavajdk1.8.0_251jrelibextsunjce_provider.jar;D:developer_toolsJavajdk1.8.0_251jrelibextsunmscapi.jar;D:developer_toolsJavajdk1.8.0_251jrelibextsunpkcs11.jar;D:developer_toolsJavajdk1.8.0_251jrelibextzipfs.jar;D:developer_toolsJavajdk1.8.0_251jrelibjavaws.jar;D:developer_toolsJavajdk1.8.0_251jrelibjce.jar;D:developer_toolsJavajdk1.8.0_251jrelibjfr.jar;D:developer_toolsJavajdk1.8.0_251jrelibjfxswt.jar;D:developer_toolsJavajdk1.8.0_251jrelibjsse.jar;D:developer_toolsJavajdk1.8.0_251jrelibmanagement-agent.jar;D:developer_toolsJavajdk1.8.0_251jrelibplugin.jar;D:developer_toolsJavajdk1.8.0_251jrelibresources.jar;D:developer_toolsJavajdk1.8.0_251jrelibrt.jar;D:codecode01spark_testspark_coretargetclasses;D:developer_toolsScalalibscala-library.jar;D:developer_toolsScalalibscala-reflect.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-core_2.112.4.4spark-core_2.11-2.4.4.jar;D:developer_toolsMavenrepositorycomthoughtworksparanamerparanamer2.8paranamer-2.8.jar;D:developer_toolsMavenrepositoryorgapacheavroavro1.8.2avro-1.8.2.jar;D:developer_toolsMavenrepositoryorgcodehausjacksonjackson-core-asl1.9.13jackson-core-asl-1.9.13.jar;D:developer_toolsMavenrepositoryorgcodehausjacksonjackson-mapper-asl1.9.13jackson-mapper-asl-1.9.13.jar;D:developer_toolsMavenrepositoryorgapachecommonscommons-compress1.8.1commons-compress-1.8.1.jar;D:developer_toolsMavenrepositoryorgtukaanixz1.5xz-1.5.jar;D:developer_toolsMavenrepositoryorgapacheavroavro-mapred1.8.2avro-mapred-1.8.2-hadoop2.jar;D:developer_toolsMavenrepositoryorgapacheavroavro-ipc1.8.2avro-ipc-1.8.2.jar;D:developer_toolsMavenrepositorycommons-codeccommons-codec1.9commons-codec-1.9.jar;D:developer_toolsMavenrepositorycomtwitterchill_2.11 .9.3chill_2.11-0.9.3.jar;D:developer_toolsMavenrepositorycomesotericsoftwarekryo-shaded4.0.2kryo-shaded-4.0.2.jar;D:developer_toolsMavenrepositorycomesotericsoftwareminlog1.3.0minlog-1.3.0.jar;D:developer_toolsMavenrepositoryorgobjenesisobjenesis2.5.1objenesis-2.5.1.jar;D:developer_toolsMavenrepositorycomtwitterchill-java .9.3chill-java-0.9.3.jar;D:developer_toolsMavenrepositoryorgapachexbeanxbean-asm6-shaded4.8xbean-asm6-shaded-4.8.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-client2.6.5hadoop-client-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-common2.6.5hadoop-common-2.6.5.jar;D:developer_toolsMavenrepositorycommons-clicommons-cli1.2commons-cli-1.2.jar;D:developer_toolsMavenrepositoryxmlencxmlenc .52xmlenc-0.52.jar;D:developer_toolsMavenrepositorycommons-httpclientcommons-httpclient3.1commons-httpclient-3.1.jar;D:developer_toolsMavenrepositorycommons-iocommons-io2.4commons-io-2.4.jar;D:developer_toolsMavenrepositorycommons-collectionscommons-collections3.2.2commons-collections-3.2.2.jar;D:developer_toolsMavenrepositorycommons-langcommons-lang2.6commons-lang-2.6.jar;D:developer_toolsMavenrepositorycommons-configurationcommons-configuration1.6commons-configuration-1.6.jar;D:developer_toolsMavenrepositorycommons-digestercommons-digester1.8commons-digester-1.8.jar;D:developer_toolsMavenrepositorycommons-beanutilscommons-beanutils1.7.0commons-beanutils-1.7.0.jar;D:developer_toolsMavenrepositorycomgoogleprotobufprotobuf-java2.5.0protobuf-java-2.5.0.jar;D:developer_toolsMavenrepositorycomgooglecodegsongson2.2.4gson-2.2.4.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-auth2.6.5hadoop-auth-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehttpcomponentshttpclient4.2.5httpclient-4.2.5.jar;D:developer_toolsMavenrepositoryorgapachehttpcomponentshttpcore4.2.4httpcore-4.2.4.jar;D:developer_toolsMavenrepositoryorgapachedirectoryserverapacheds-kerberos-codec2.0.0-M15apacheds-kerberos-codec-2.0.0-M15.jar;D:developer_toolsMavenrepositoryorgapachedirectoryserverapacheds-i18n2.0.0-M15apacheds-i18n-2.0.0-M15.jar;D:developer_toolsMavenrepositoryorgapachedirectoryapiapi-asn1-api1.0.0-M20api-asn1-api-1.0.0-M20.jar;D:developer_toolsMavenrepositoryorgapachedirectoryapiapi-util1.0.0-M20api-util-1.0.0-M20.jar;D:developer_toolsMavenrepositoryorgapachecuratorcurator-client2.6.0curator-client-2.6.0.jar;D:developer_toolsMavenrepositoryorghtracehtrace-core3.0.4htrace-core-3.0.4.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-hdfs2.6.5hadoop-hdfs-2.6.5.jar;D:developer_toolsMavenrepositoryorgmortbayjettyjetty-util6.1.26jetty-util-6.1.26.jar;D:developer_toolsMavenrepositoryxercesxercesImpl2.9.1xercesImpl-2.9.1.jar;D:developer_toolsMavenrepositoryxml-apisxml-apis1.3.04xml-apis-1.3.04.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-mapreduce-client-app2.6.5hadoop-mapreduce-client-app-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-mapreduce-client-common2.6.5hadoop-mapreduce-client-common-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-yarn-client2.6.5hadoop-yarn-client-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-yarn-server-common2.6.5hadoop-yarn-server-common-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-mapreduce-client-shuffle2.6.5hadoop-mapreduce-client-shuffle-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-yarn-api2.6.5hadoop-yarn-api-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-mapreduce-client-core2.6.5hadoop-mapreduce-client-core-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-yarn-common2.6.5hadoop-yarn-common-2.6.5.jar;D:developer_toolsMavenrepositoryjavaxxmlbindjaxb-api2.2.2jaxb-api-2.2.2.jar;D:developer_toolsMavenrepositoryjavaxxmlstreamstax-api1.0-2stax-api-1.0-2.jar;D:developer_toolsMavenrepositoryorgcodehausjacksonjackson-jaxrs1.9.13jackson-jaxrs-1.9.13.jar;D:developer_toolsMavenrepositoryorgcodehausjacksonjackson-xc1.9.13jackson-xc-1.9.13.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-mapreduce-client-jobclient2.6.5hadoop-mapreduce-client-jobclient-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-annotations2.6.5hadoop-annotations-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-launcher_2.112.4.4spark-launcher_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-kvstore_2.112.4.4spark-kvstore_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryorgfusesourceleveldbjnileveldbjni-all1.8leveldbjni-all-1.8.jar;D:developer_toolsMavenrepositorycomfasterxmljacksoncorejackson-core2.6.7jackson-core-2.6.7.jar;D:developer_toolsMavenrepositorycomfasterxmljacksoncorejackson-annotations2.6.7jackson-annotations-2.6.7.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-network-common_2.112.4.4spark-network-common_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-network-shuffle_2.112.4.4spark-network-shuffle_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-unsafe_2.112.4.4spark-unsafe_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryjavaxactivationactivation1.1.1activation-1.1.1.jar;D:developer_toolsMavenrepositoryorgapachecuratorcurator-recipes2.6.0curator-recipes-2.6.0.jar;D:developer_toolsMavenrepositoryorgapachecuratorcurator-framework2.6.0curator-framework-2.6.0.jar;D:developer_toolsMavenrepositorycomgoogleguavaguava16.0.1guava-16.0.1.jar;D:developer_toolsMavenrepositoryorgapachezookeeperzookeeper3.4.6zookeeper-3.4.6.jar;D:developer_toolsMavenrepositoryjavaxservletjavax.servlet-api3.1.0javax.servlet-api-3.1.0.jar;D:developer_toolsMavenrepositoryorgapachecommonscommons-lang33.5commons-lang3-3.5.jar;D:developer_toolsMavenrepositoryorgapachecommonscommons-math33.4.1commons-math3-3.4.1.jar;D:developer_toolsMavenrepositorycomgooglecodefindbugsjsr3051.3.9jsr305-1.3.9.jar;D:developer_toolsMavenrepositoryorgslf4jslf4j-api1.7.16slf4j-api-1.7.16.jar;D:developer_toolsMavenrepositoryorgslf4jjul-to-slf4j1.7.16jul-to-slf4j-1.7.16.jar;D:developer_toolsMavenrepositoryorgslf4jjcl-over-slf4j1.7.16jcl-over-slf4j-1.7.16.jar;D:developer_toolsMavenrepositorylog4jlog4j1.2.17log4j-1.2.17.jar;D:developer_toolsMavenrepositoryorgslf4jslf4j-log4j121.7.16slf4j-log4j12-1.7.16.jar;D:developer_toolsMavenrepositorycomningcompress-lzf1.0.3compress-lzf-1.0.3.jar;D:developer_toolsMavenrepositoryorgxerialsnappysnappy-java1.1.7.3snappy-java-1.1.7.3.jar;D:developer_toolsMavenrepositoryorglz4lz4-java1.4.0lz4-java-1.4.0.jar;D:developer_toolsMavenrepositorycomgithublubenzstd-jni1.3.2-2zstd-jni-1.3.2-2.jar;D:developer_toolsMavenrepositoryorgroaringbitmapRoaringBitmap .7.45RoaringBitmap-0.7.45.jar;D:developer_toolsMavenrepositoryorgroaringbitmapshims .7.45shims-0.7.45.jar;D:developer_toolsMavenrepositorycommons-netcommons-net3.1commons-net-3.1.jar;D:developer_toolsMavenrepositoryorgscala-langscala-library2.11.12scala-library-2.11.12.jar;D:developer_toolsMavenrepositoryorgjson4sjson4s-jackson_2.113.5.3json4s-jackson_2.11-3.5.3.jar;D:developer_toolsMavenrepositoryorgjson4sjson4s-core_2.113.5.3json4s-core_2.11-3.5.3.jar;D:developer_toolsMavenrepositoryorgjson4sjson4s-ast_2.113.5.3json4s-ast_2.11-3.5.3.jar;D:developer_toolsMavenrepositoryorgjson4sjson4s-scalap_2.113.5.3json4s-scalap_2.11-3.5.3.jar;D:developer_toolsMavenrepositoryorgscala-langmodulesscala-xml_2.111.0.6scala-xml_2.11-1.0.6.jar;D:developer_toolsMavenrepositoryorgglassfishjerseycorejersey-client2.22.2jersey-client-2.22.2.jar;D:developer_toolsMavenrepositoryjavaxwsrsjavax.ws.rs-api2.0.1javax.ws.rs-api-2.0.1.jar;D:developer_toolsMavenrepositoryorgglassfishhk2hk2-api2.4.0-b34hk2-api-2.4.0-b34.jar;D:developer_toolsMavenrepositoryorgglassfishhk2hk2-utils2.4.0-b34hk2-utils-2.4.0-b34.jar;D:developer_toolsMavenrepositoryorgglassfishhk2externalaopalliance-repackaged2.4.0-b34aopalliance-repackaged-2.4.0-b34.jar;D:developer_toolsMavenrepositoryorgglassfishhk2externaljavax.inject2.4.0-b34javax.inject-2.4.0-b34.jar;D:developer_toolsMavenrepositoryorgglassfishhk2hk2-locator2.4.0-b34hk2-locator-2.4.0-b34.jar;D:developer_toolsMavenrepositoryorgjavassistjavassist3.18.1-GAjavassist-3.18.1-GA.jar;D:developer_toolsMavenrepositoryorgglassfishjerseycorejersey-common2.22.2jersey-common-2.22.2.jar;D:developer_toolsMavenrepositoryjavaxannotationjavax.annotation-api1.2javax.annotation-api-1.2.jar;D:developer_toolsMavenrepositoryorgglassfishjerseybundlesrepackagedjersey-guava2.22.2jersey-guava-2.22.2.jar;D:developer_toolsMavenrepositoryorgglassfishhk2osgi-resource-locator1.0.1osgi-resource-locator-1.0.1.jar;D:developer_toolsMavenrepositoryorgglassfishjerseycorejersey-server2.22.2jersey-server-2.22.2.jar;D:developer_toolsMavenrepositoryorgglassfishjerseymediajersey-media-jaxb2.22.2jersey-media-jaxb-2.22.2.jar;D:developer_toolsMavenrepositoryjavaxvalidationvalidation-api1.1.0.Finalvalidation-api-1.1.0.Final.jar;D:developer_toolsMavenrepositoryorgglassfishjerseycontainersjersey-container-servlet2.22.2jersey-container-servlet-2.22.2.jar;D:developer_toolsMavenrepositoryorgglassfishjerseycontainersjersey-container-servlet-core2.22.2jersey-container-servlet-core-2.22.2.jar;D:developer_toolsMavenrepositoryionettynetty-all4.1.17.Finalnetty-all-4.1.17.Final.jar;D:developer_toolsMavenrepositoryionettynetty3.9.9.Finalnetty-3.9.9.Final.jar;D:developer_toolsMavenrepositorycomclearspringanalyticsstream2.7.0stream-2.7.0.jar;D:developer_toolsMavenrepositoryiodropwizardmetricsmetrics-core3.1.5metrics-core-3.1.5.jar;D:developer_toolsMavenrepositoryiodropwizardmetricsmetrics-jvm3.1.5metrics-jvm-3.1.5.jar;D:developer_toolsMavenrepositoryiodropwizardmetricsmetrics-json3.1.5metrics-json-3.1.5.jar;D:developer_toolsMavenrepositoryiodropwizardmetricsmetrics-graphite3.1.5metrics-graphite-3.1.5.jar;D:developer_toolsMavenrepositorycomfasterxmljacksoncorejackson-databind2.6.7.1jackson-databind-2.6.7.1.jar;D:developer_toolsMavenrepositorycomfasterxmljacksonmodulejackson-module-scala_2.112.6.7.1jackson-module-scala_2.11-2.6.7.1.jar;D:developer_toolsMavenrepositoryorgscala-langscala-reflect2.11.8scala-reflect-2.11.8.jar;D:developer_toolsMavenrepositorycomfasterxmljacksonmodulejackson-module-paranamer2.7.9jackson-module-paranamer-2.7.9.jar;D:developer_toolsMavenrepositoryorgapacheivyivy2.4.0ivy-2.4.0.jar;D:developer_toolsMavenrepositoryorooro2.0.8oro-2.0.8.jar;D:developer_toolsMavenrepositorynetrazorvinepyrolite4.13pyrolite-4.13.jar;D:developer_toolsMavenrepositorynetsfpy4jpy4j .10.7py4j-0.10.7.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-tags_2.112.4.4spark-tags_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryorgapachecommonscommons-crypto1.0.0commons-crypto-1.0.0.jar;D:developer_toolsMavenrepositoryorgspark-projectsparkunused1.0.0unused-1.0.0.jar com.sibat.applications.wordcount Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 21/10/13 17:44:17 INFO SparkContext: Running Spark version 2.4.4 21/10/13 17:44:17 INFO SparkContext: Submitted application: WordCount 21/10/13 17:44:17 INFO SecurityManager: Changing view acls to: muzili 21/10/13 17:44:17 INFO SecurityManager: Changing modify acls to: muzili 21/10/13 17:44:17 INFO SecurityManager: Changing view acls groups to: 21/10/13 17:44:17 INFO SecurityManager: Changing modify acls groups to: 21/10/13 17:44:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(muzili); groups with view permissions: Set(); users with modify permissions: Set(muzili); groups with modify permissions: Set() 21/10/13 17:44:18 INFO Utils: Successfully started service 'sparkDriver' on port 51527. 21/10/13 17:44:19 INFO SparkEnv: Registering MapOutputTracker 21/10/13 17:44:19 INFO SparkEnv: Registering BlockManagerMaster 21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 21/10/13 17:44:19 INFO DiskBlockManager: Created local directory at C:UsersmuziliAppDataLocalTempblockmgr-a6f2f260-7970-4a16-82b5-93d659f2c49f 21/10/13 17:44:19 INFO MemoryStore: MemoryStore started with capacity 1975.8 MB 21/10/13 17:44:19 INFO SparkEnv: Registering OutputCommitCoordinator 21/10/13 17:44:19 INFO Utils: Successfully started service 'SparkUI' on port 4040. 21/10/13 17:44:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://LAPTOP-R0NFMTAH:4040 21/10/13 17:44:19 INFO Executor: Starting executor ID driver on host localhost 21/10/13 17:44:19 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51568. 21/10/13 17:44:19 INFO NettyBlockTransferService: Server created on LAPTOP-R0NFMTAH:51568 21/10/13 17:44:19 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 21/10/13 17:44:19 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None) 21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: Registering block manager LAPTOP-R0NFMTAH:51568 with 1975.8 MB RAM, BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None) 21/10/13 17:44:19 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None) 21/10/13 17:44:19 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None) 21/10/13 17:44:19 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 1975.6 MB) 21/10/13 17:44:19 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 1975.6 MB) 21/10/13 17:44:19 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 20.4 KB, free: 1975.8 MB) 21/10/13 17:44:19 INFO SparkContext: Created broadcast 0 from textFile at wordcount.scala:14 21/10/13 17:44:19 INFO FileInputFormat: Total input paths to process : 1 21/10/13 17:44:19 INFO SparkContext: Starting job: collect at wordcount.scala:22 21/10/13 17:44:20 INFO DAGScheduler: Registering RDD 3 (map at wordcount.scala:18) 21/10/13 17:44:20 INFO DAGScheduler: Got job 0 (collect at wordcount.scala:22) with 2 output partitions 21/10/13 17:44:20 INFO DAGScheduler: Final stage: ResultStage 1 (collect at wordcount.scala:22) 21/10/13 17:44:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 21/10/13 17:44:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0) 21/10/13 17:44:20 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at wordcount.scala:18), which has no missing parents 21/10/13 17:44:20 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.0 KB, free 1975.6 MB) 21/10/13 17:44:20 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.9 KB, free 1975.6 MB) 21/10/13 17:44:20 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 2.9 KB, free: 1975.8 MB) 21/10/13 17:44:20 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161 21/10/13 17:44:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at wordcount.scala:18) (first 15 tasks are for partitions Vector(0, 1)) 21/10/13 17:44:20 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 21/10/13 17:44:20 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7887 bytes) 21/10/13 17:44:20 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7887 bytes) 21/10/13 17:44:20 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 21/10/13 17:44:20 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 21/10/13 17:44:20 INFO HadoopRDD: Input split: file:/C:/Users/muzili/Desktop/word.txt:0+25 21/10/13 17:44:20 INFO HadoopRDD: Input split: file:/C:/Users/muzili/Desktop/word.txt:25+26 21/10/13 17:44:20 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1157 bytes result sent to driver 21/10/13 17:44:20 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1157 bytes result sent to driver 21/10/13 17:44:20 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 165 ms on localhost (executor driver) (1/2) 21/10/13 17:44:20 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 181 ms on localhost (executor driver) (2/2) 21/10/13 17:44:20 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 21/10/13 17:44:20 INFO DAGScheduler: ShuffleMapStage 0 (map at wordcount.scala:18) finished in 0.256 s 21/10/13 17:44:20 INFO DAGScheduler: looking for newly runnable stages 21/10/13 17:44:20 INFO DAGScheduler: running: Set() 21/10/13 17:44:20 INFO DAGScheduler: waiting: Set(ResultStage 1) 21/10/13 17:44:20 INFO DAGScheduler: failed: Set() 21/10/13 17:44:20 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at wordcount.scala:20), which has no missing parents 21/10/13 17:44:20 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 1975.6 MB) 21/10/13 17:44:20 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2029.0 B, free 1975.6 MB) 21/10/13 17:44:20 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 2029.0 B, free: 1975.8 MB) 21/10/13 17:44:20 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1161 21/10/13 17:44:20 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at wordcount.scala:20) (first 15 tasks are for partitions Vector(0, 1)) 21/10/13 17:44:20 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 21/10/13 17:44:20 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, ANY, 7662 bytes) 21/10/13 17:44:20 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, ANY, 7662 bytes) 21/10/13 17:44:20 INFO Executor: Running task 1.0 in stage 1.0 (TID 3) 21/10/13 17:44:20 INFO Executor: Running task 0.0 in stage 1.0 (TID 2) 21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks 21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks 21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms 21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms 21/10/13 17:44:20 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1284 bytes result sent to driver 21/10/13 17:44:20 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1261 bytes result sent to driver 21/10/13 17:44:20 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 60 ms on localhost (executor driver) (1/2) 21/10/13 17:44:20 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 58 ms on localhost (executor driver) (2/2) 21/10/13 17:44:20 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 21/10/13 17:44:20 INFO DAGScheduler: ResultStage 1 (collect at wordcount.scala:22) finished in 0.070 s 21/10/13 17:44:20 INFO DAGScheduler: Job 0 finished: collect at wordcount.scala:22, took 0.561218 s 21/10/13 17:44:20 INFO SparkUI: Stopped Spark web UI at http://LAPTOP-R0NFMTAH:4040 (scala,1) (flink,1) (hello,4) (spark,1) (hadoop,1) 21/10/13 17:44:20 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 21/10/13 17:44:20 INFO MemoryStore: MemoryStore cleared 21/10/13 17:44:20 INFO BlockManager: BlockManager stopped 21/10/13 17:44:20 INFO BlockManagerMaster: BlockManagerMaster stopped 21/10/13 17:44:20 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 21/10/13 17:44:20 INFO SparkContext: Successfully stopped SparkContext 21/10/13 17:44:20 INFO ShutdownHookManager: Shutdown hook called 21/10/13 17:44:20 INFO ShutdownHookManager: Deleting directory C:UsersmuziliAppDataLocalTempspark-74be267d-556a-40a4-9253-55fc0a910290



