栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Spark案例之WordCount

Spark案例之WordCount

创建 Maven 项目 增加 Scala 插件 Spark 由 Scala 语言开发的,所以本课件接下来的开发所使用的语言也为 Scala ,咱们当前使用的 Spark 版本为 3.0.0 ,默认采用的 Scala 编译版本为 2.12 ,所以后续开发时。我们依然采用这个版本。开发前请保证 IDEA 开发工具中含有 Scala 开发插件

增加依赖关系 修改 Maven 项目中的 POM 文件,增加 Spark 框架的依赖关系。本次基于 Spark3.0 版 本,使用时请注意对应版本。
    
        
            org.apache.spark
            spark-core_2.12
            3.0.0
        
    

    
    
    
    
    net.alchim31.maven
    scala-maven-plugin
    3.2.2
    
    
        
        
            testCompile
        
    
    
    
        
            org.apache.maven.plugins
            maven-assembly-plugin
            2.2.1
            
                
                    jar-with-dependencies
                
            
            
                
                    make-assembly
                    package
                    
                        single
                    
                
            
        
    
    
WordCount代码 为了能直观地感受 Spark 框架的效果,接下来我们实现一个大数据学科中最常见的教学 案例 WordCount
package com.muzili.applications

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object wordcount {
  def main(args: Array[String]): Unit = {

    // 创建 Spark 运行配置对象
    val sparkConf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
    // 创建 Spark 上下文环境对象(连接对象)
    val sc : SparkContext = new SparkContext(sparkConf)
    // 读取文件数据
    val fileRDD: RDD[String] = sc.textFile("C:\Users\muzili\Desktop/word.txt")
    // 将文件中的数据进行分词
    val wordRDD: RDD[String] = fileRDD.flatMap( _.split(" ") )
    // 转换数据结构 word => (word, 1)
    val word2OneRDD: RDD[(String, Int)] = wordRDD.map((_,1))
    // 将转换结构后的数据按照相同的单词进行分组聚合
    val word2CountRDD: RDD[(String, Int)] = word2OneRDD.reduceByKey(_+_)
    // 将数据聚合结果采集到内存中
    val word2Count: Array[(String, Int)] = word2CountRDD.collect()
    // 打印结果
    word2Count.foreach(println)
    //关闭 Spark 连接
    sc.stop()

  }

}
并在桌面创建文件word.txt:
hello scala
hello spark
hello hadoop
hello flink
执行过程中,会产生大量的执行日志,日志见下文打印日志一,如果为了能够更好的查看程序的执行结果,可以在项目的 resources 目录中创建 log4j.properties 文件,并添加日志配置信息:
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd 
HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to ERROR. When running the spark-shell,
the
# log level for this class is used to overwrite the root logger's log level, so
that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=ERROR
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=ERROR
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent
UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

打印日志一:

D:developer_toolsJavajdk1.8.0_251binjava.exe "-javaagent:D:developer_toolsIntelliJ IDEAIntelliJ IDEA 2020.1.1libidea_rt.jar=51489:D:developer_toolsIntelliJ IDEAIntelliJ IDEA 2020.1.1bin" -Dfile.encoding=UTF-8 -classpath D:developer_toolsJavajdk1.8.0_251jrelibcharsets.jar;D:developer_toolsJavajdk1.8.0_251jrelibdeploy.jar;D:developer_toolsJavajdk1.8.0_251jrelibextaccess-bridge-64.jar;D:developer_toolsJavajdk1.8.0_251jrelibextcldrdata.jar;D:developer_toolsJavajdk1.8.0_251jrelibextdnsns.jar;D:developer_toolsJavajdk1.8.0_251jrelibextjaccess.jar;D:developer_toolsJavajdk1.8.0_251jrelibextjfxrt.jar;D:developer_toolsJavajdk1.8.0_251jrelibextlocaledata.jar;D:developer_toolsJavajdk1.8.0_251jrelibextnashorn.jar;D:developer_toolsJavajdk1.8.0_251jrelibextsunec.jar;D:developer_toolsJavajdk1.8.0_251jrelibextsunjce_provider.jar;D:developer_toolsJavajdk1.8.0_251jrelibextsunmscapi.jar;D:developer_toolsJavajdk1.8.0_251jrelibextsunpkcs11.jar;D:developer_toolsJavajdk1.8.0_251jrelibextzipfs.jar;D:developer_toolsJavajdk1.8.0_251jrelibjavaws.jar;D:developer_toolsJavajdk1.8.0_251jrelibjce.jar;D:developer_toolsJavajdk1.8.0_251jrelibjfr.jar;D:developer_toolsJavajdk1.8.0_251jrelibjfxswt.jar;D:developer_toolsJavajdk1.8.0_251jrelibjsse.jar;D:developer_toolsJavajdk1.8.0_251jrelibmanagement-agent.jar;D:developer_toolsJavajdk1.8.0_251jrelibplugin.jar;D:developer_toolsJavajdk1.8.0_251jrelibresources.jar;D:developer_toolsJavajdk1.8.0_251jrelibrt.jar;D:codecode01spark_testspark_coretargetclasses;D:developer_toolsScalalibscala-library.jar;D:developer_toolsScalalibscala-reflect.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-core_2.112.4.4spark-core_2.11-2.4.4.jar;D:developer_toolsMavenrepositorycomthoughtworksparanamerparanamer2.8paranamer-2.8.jar;D:developer_toolsMavenrepositoryorgapacheavroavro1.8.2avro-1.8.2.jar;D:developer_toolsMavenrepositoryorgcodehausjacksonjackson-core-asl1.9.13jackson-core-asl-1.9.13.jar;D:developer_toolsMavenrepositoryorgcodehausjacksonjackson-mapper-asl1.9.13jackson-mapper-asl-1.9.13.jar;D:developer_toolsMavenrepositoryorgapachecommonscommons-compress1.8.1commons-compress-1.8.1.jar;D:developer_toolsMavenrepositoryorgtukaanixz1.5xz-1.5.jar;D:developer_toolsMavenrepositoryorgapacheavroavro-mapred1.8.2avro-mapred-1.8.2-hadoop2.jar;D:developer_toolsMavenrepositoryorgapacheavroavro-ipc1.8.2avro-ipc-1.8.2.jar;D:developer_toolsMavenrepositorycommons-codeccommons-codec1.9commons-codec-1.9.jar;D:developer_toolsMavenrepositorycomtwitterchill_2.11.9.3chill_2.11-0.9.3.jar;D:developer_toolsMavenrepositorycomesotericsoftwarekryo-shaded4.0.2kryo-shaded-4.0.2.jar;D:developer_toolsMavenrepositorycomesotericsoftwareminlog1.3.0minlog-1.3.0.jar;D:developer_toolsMavenrepositoryorgobjenesisobjenesis2.5.1objenesis-2.5.1.jar;D:developer_toolsMavenrepositorycomtwitterchill-java.9.3chill-java-0.9.3.jar;D:developer_toolsMavenrepositoryorgapachexbeanxbean-asm6-shaded4.8xbean-asm6-shaded-4.8.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-client2.6.5hadoop-client-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-common2.6.5hadoop-common-2.6.5.jar;D:developer_toolsMavenrepositorycommons-clicommons-cli1.2commons-cli-1.2.jar;D:developer_toolsMavenrepositoryxmlencxmlenc.52xmlenc-0.52.jar;D:developer_toolsMavenrepositorycommons-httpclientcommons-httpclient3.1commons-httpclient-3.1.jar;D:developer_toolsMavenrepositorycommons-iocommons-io2.4commons-io-2.4.jar;D:developer_toolsMavenrepositorycommons-collectionscommons-collections3.2.2commons-collections-3.2.2.jar;D:developer_toolsMavenrepositorycommons-langcommons-lang2.6commons-lang-2.6.jar;D:developer_toolsMavenrepositorycommons-configurationcommons-configuration1.6commons-configuration-1.6.jar;D:developer_toolsMavenrepositorycommons-digestercommons-digester1.8commons-digester-1.8.jar;D:developer_toolsMavenrepositorycommons-beanutilscommons-beanutils1.7.0commons-beanutils-1.7.0.jar;D:developer_toolsMavenrepositorycomgoogleprotobufprotobuf-java2.5.0protobuf-java-2.5.0.jar;D:developer_toolsMavenrepositorycomgooglecodegsongson2.2.4gson-2.2.4.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-auth2.6.5hadoop-auth-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehttpcomponentshttpclient4.2.5httpclient-4.2.5.jar;D:developer_toolsMavenrepositoryorgapachehttpcomponentshttpcore4.2.4httpcore-4.2.4.jar;D:developer_toolsMavenrepositoryorgapachedirectoryserverapacheds-kerberos-codec2.0.0-M15apacheds-kerberos-codec-2.0.0-M15.jar;D:developer_toolsMavenrepositoryorgapachedirectoryserverapacheds-i18n2.0.0-M15apacheds-i18n-2.0.0-M15.jar;D:developer_toolsMavenrepositoryorgapachedirectoryapiapi-asn1-api1.0.0-M20api-asn1-api-1.0.0-M20.jar;D:developer_toolsMavenrepositoryorgapachedirectoryapiapi-util1.0.0-M20api-util-1.0.0-M20.jar;D:developer_toolsMavenrepositoryorgapachecuratorcurator-client2.6.0curator-client-2.6.0.jar;D:developer_toolsMavenrepositoryorghtracehtrace-core3.0.4htrace-core-3.0.4.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-hdfs2.6.5hadoop-hdfs-2.6.5.jar;D:developer_toolsMavenrepositoryorgmortbayjettyjetty-util6.1.26jetty-util-6.1.26.jar;D:developer_toolsMavenrepositoryxercesxercesImpl2.9.1xercesImpl-2.9.1.jar;D:developer_toolsMavenrepositoryxml-apisxml-apis1.3.04xml-apis-1.3.04.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-mapreduce-client-app2.6.5hadoop-mapreduce-client-app-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-mapreduce-client-common2.6.5hadoop-mapreduce-client-common-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-yarn-client2.6.5hadoop-yarn-client-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-yarn-server-common2.6.5hadoop-yarn-server-common-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-mapreduce-client-shuffle2.6.5hadoop-mapreduce-client-shuffle-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-yarn-api2.6.5hadoop-yarn-api-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-mapreduce-client-core2.6.5hadoop-mapreduce-client-core-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-yarn-common2.6.5hadoop-yarn-common-2.6.5.jar;D:developer_toolsMavenrepositoryjavaxxmlbindjaxb-api2.2.2jaxb-api-2.2.2.jar;D:developer_toolsMavenrepositoryjavaxxmlstreamstax-api1.0-2stax-api-1.0-2.jar;D:developer_toolsMavenrepositoryorgcodehausjacksonjackson-jaxrs1.9.13jackson-jaxrs-1.9.13.jar;D:developer_toolsMavenrepositoryorgcodehausjacksonjackson-xc1.9.13jackson-xc-1.9.13.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-mapreduce-client-jobclient2.6.5hadoop-mapreduce-client-jobclient-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachehadoophadoop-annotations2.6.5hadoop-annotations-2.6.5.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-launcher_2.112.4.4spark-launcher_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-kvstore_2.112.4.4spark-kvstore_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryorgfusesourceleveldbjnileveldbjni-all1.8leveldbjni-all-1.8.jar;D:developer_toolsMavenrepositorycomfasterxmljacksoncorejackson-core2.6.7jackson-core-2.6.7.jar;D:developer_toolsMavenrepositorycomfasterxmljacksoncorejackson-annotations2.6.7jackson-annotations-2.6.7.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-network-common_2.112.4.4spark-network-common_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-network-shuffle_2.112.4.4spark-network-shuffle_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-unsafe_2.112.4.4spark-unsafe_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryjavaxactivationactivation1.1.1activation-1.1.1.jar;D:developer_toolsMavenrepositoryorgapachecuratorcurator-recipes2.6.0curator-recipes-2.6.0.jar;D:developer_toolsMavenrepositoryorgapachecuratorcurator-framework2.6.0curator-framework-2.6.0.jar;D:developer_toolsMavenrepositorycomgoogleguavaguava16.0.1guava-16.0.1.jar;D:developer_toolsMavenrepositoryorgapachezookeeperzookeeper3.4.6zookeeper-3.4.6.jar;D:developer_toolsMavenrepositoryjavaxservletjavax.servlet-api3.1.0javax.servlet-api-3.1.0.jar;D:developer_toolsMavenrepositoryorgapachecommonscommons-lang33.5commons-lang3-3.5.jar;D:developer_toolsMavenrepositoryorgapachecommonscommons-math33.4.1commons-math3-3.4.1.jar;D:developer_toolsMavenrepositorycomgooglecodefindbugsjsr3051.3.9jsr305-1.3.9.jar;D:developer_toolsMavenrepositoryorgslf4jslf4j-api1.7.16slf4j-api-1.7.16.jar;D:developer_toolsMavenrepositoryorgslf4jjul-to-slf4j1.7.16jul-to-slf4j-1.7.16.jar;D:developer_toolsMavenrepositoryorgslf4jjcl-over-slf4j1.7.16jcl-over-slf4j-1.7.16.jar;D:developer_toolsMavenrepositorylog4jlog4j1.2.17log4j-1.2.17.jar;D:developer_toolsMavenrepositoryorgslf4jslf4j-log4j121.7.16slf4j-log4j12-1.7.16.jar;D:developer_toolsMavenrepositorycomningcompress-lzf1.0.3compress-lzf-1.0.3.jar;D:developer_toolsMavenrepositoryorgxerialsnappysnappy-java1.1.7.3snappy-java-1.1.7.3.jar;D:developer_toolsMavenrepositoryorglz4lz4-java1.4.0lz4-java-1.4.0.jar;D:developer_toolsMavenrepositorycomgithublubenzstd-jni1.3.2-2zstd-jni-1.3.2-2.jar;D:developer_toolsMavenrepositoryorgroaringbitmapRoaringBitmap.7.45RoaringBitmap-0.7.45.jar;D:developer_toolsMavenrepositoryorgroaringbitmapshims.7.45shims-0.7.45.jar;D:developer_toolsMavenrepositorycommons-netcommons-net3.1commons-net-3.1.jar;D:developer_toolsMavenrepositoryorgscala-langscala-library2.11.12scala-library-2.11.12.jar;D:developer_toolsMavenrepositoryorgjson4sjson4s-jackson_2.113.5.3json4s-jackson_2.11-3.5.3.jar;D:developer_toolsMavenrepositoryorgjson4sjson4s-core_2.113.5.3json4s-core_2.11-3.5.3.jar;D:developer_toolsMavenrepositoryorgjson4sjson4s-ast_2.113.5.3json4s-ast_2.11-3.5.3.jar;D:developer_toolsMavenrepositoryorgjson4sjson4s-scalap_2.113.5.3json4s-scalap_2.11-3.5.3.jar;D:developer_toolsMavenrepositoryorgscala-langmodulesscala-xml_2.111.0.6scala-xml_2.11-1.0.6.jar;D:developer_toolsMavenrepositoryorgglassfishjerseycorejersey-client2.22.2jersey-client-2.22.2.jar;D:developer_toolsMavenrepositoryjavaxwsrsjavax.ws.rs-api2.0.1javax.ws.rs-api-2.0.1.jar;D:developer_toolsMavenrepositoryorgglassfishhk2hk2-api2.4.0-b34hk2-api-2.4.0-b34.jar;D:developer_toolsMavenrepositoryorgglassfishhk2hk2-utils2.4.0-b34hk2-utils-2.4.0-b34.jar;D:developer_toolsMavenrepositoryorgglassfishhk2externalaopalliance-repackaged2.4.0-b34aopalliance-repackaged-2.4.0-b34.jar;D:developer_toolsMavenrepositoryorgglassfishhk2externaljavax.inject2.4.0-b34javax.inject-2.4.0-b34.jar;D:developer_toolsMavenrepositoryorgglassfishhk2hk2-locator2.4.0-b34hk2-locator-2.4.0-b34.jar;D:developer_toolsMavenrepositoryorgjavassistjavassist3.18.1-GAjavassist-3.18.1-GA.jar;D:developer_toolsMavenrepositoryorgglassfishjerseycorejersey-common2.22.2jersey-common-2.22.2.jar;D:developer_toolsMavenrepositoryjavaxannotationjavax.annotation-api1.2javax.annotation-api-1.2.jar;D:developer_toolsMavenrepositoryorgglassfishjerseybundlesrepackagedjersey-guava2.22.2jersey-guava-2.22.2.jar;D:developer_toolsMavenrepositoryorgglassfishhk2osgi-resource-locator1.0.1osgi-resource-locator-1.0.1.jar;D:developer_toolsMavenrepositoryorgglassfishjerseycorejersey-server2.22.2jersey-server-2.22.2.jar;D:developer_toolsMavenrepositoryorgglassfishjerseymediajersey-media-jaxb2.22.2jersey-media-jaxb-2.22.2.jar;D:developer_toolsMavenrepositoryjavaxvalidationvalidation-api1.1.0.Finalvalidation-api-1.1.0.Final.jar;D:developer_toolsMavenrepositoryorgglassfishjerseycontainersjersey-container-servlet2.22.2jersey-container-servlet-2.22.2.jar;D:developer_toolsMavenrepositoryorgglassfishjerseycontainersjersey-container-servlet-core2.22.2jersey-container-servlet-core-2.22.2.jar;D:developer_toolsMavenrepositoryionettynetty-all4.1.17.Finalnetty-all-4.1.17.Final.jar;D:developer_toolsMavenrepositoryionettynetty3.9.9.Finalnetty-3.9.9.Final.jar;D:developer_toolsMavenrepositorycomclearspringanalyticsstream2.7.0stream-2.7.0.jar;D:developer_toolsMavenrepositoryiodropwizardmetricsmetrics-core3.1.5metrics-core-3.1.5.jar;D:developer_toolsMavenrepositoryiodropwizardmetricsmetrics-jvm3.1.5metrics-jvm-3.1.5.jar;D:developer_toolsMavenrepositoryiodropwizardmetricsmetrics-json3.1.5metrics-json-3.1.5.jar;D:developer_toolsMavenrepositoryiodropwizardmetricsmetrics-graphite3.1.5metrics-graphite-3.1.5.jar;D:developer_toolsMavenrepositorycomfasterxmljacksoncorejackson-databind2.6.7.1jackson-databind-2.6.7.1.jar;D:developer_toolsMavenrepositorycomfasterxmljacksonmodulejackson-module-scala_2.112.6.7.1jackson-module-scala_2.11-2.6.7.1.jar;D:developer_toolsMavenrepositoryorgscala-langscala-reflect2.11.8scala-reflect-2.11.8.jar;D:developer_toolsMavenrepositorycomfasterxmljacksonmodulejackson-module-paranamer2.7.9jackson-module-paranamer-2.7.9.jar;D:developer_toolsMavenrepositoryorgapacheivyivy2.4.0ivy-2.4.0.jar;D:developer_toolsMavenrepositoryorooro2.0.8oro-2.0.8.jar;D:developer_toolsMavenrepositorynetrazorvinepyrolite4.13pyrolite-4.13.jar;D:developer_toolsMavenrepositorynetsfpy4jpy4j.10.7py4j-0.10.7.jar;D:developer_toolsMavenrepositoryorgapachesparkspark-tags_2.112.4.4spark-tags_2.11-2.4.4.jar;D:developer_toolsMavenrepositoryorgapachecommonscommons-crypto1.0.0commons-crypto-1.0.0.jar;D:developer_toolsMavenrepositoryorgspark-projectsparkunused1.0.0unused-1.0.0.jar com.sibat.applications.wordcount
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/10/13 17:44:17 INFO SparkContext: Running Spark version 2.4.4
21/10/13 17:44:17 INFO SparkContext: Submitted application: WordCount
21/10/13 17:44:17 INFO SecurityManager: Changing view acls to: muzili
21/10/13 17:44:17 INFO SecurityManager: Changing modify acls to: muzili
21/10/13 17:44:17 INFO SecurityManager: Changing view acls groups to: 
21/10/13 17:44:17 INFO SecurityManager: Changing modify acls groups to: 
21/10/13 17:44:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(muzili); groups with view permissions: Set(); users  with modify permissions: Set(muzili); groups with modify permissions: Set()
21/10/13 17:44:18 INFO Utils: Successfully started service 'sparkDriver' on port 51527.
21/10/13 17:44:19 INFO SparkEnv: Registering MapOutputTracker
21/10/13 17:44:19 INFO SparkEnv: Registering BlockManagerMaster
21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/10/13 17:44:19 INFO DiskBlockManager: Created local directory at C:UsersmuziliAppDataLocalTempblockmgr-a6f2f260-7970-4a16-82b5-93d659f2c49f
21/10/13 17:44:19 INFO MemoryStore: MemoryStore started with capacity 1975.8 MB
21/10/13 17:44:19 INFO SparkEnv: Registering OutputCommitCoordinator
21/10/13 17:44:19 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/10/13 17:44:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://LAPTOP-R0NFMTAH:4040
21/10/13 17:44:19 INFO Executor: Starting executor ID driver on host localhost
21/10/13 17:44:19 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51568.
21/10/13 17:44:19 INFO NettyBlockTransferService: Server created on LAPTOP-R0NFMTAH:51568
21/10/13 17:44:19 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/10/13 17:44:19 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO BlockManagerMasterEndpoint: Registering block manager LAPTOP-R0NFMTAH:51568 with 1975.8 MB RAM, BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, LAPTOP-R0NFMTAH, 51568, None)
21/10/13 17:44:19 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 1975.6 MB)
21/10/13 17:44:19 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 1975.6 MB)
21/10/13 17:44:19 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 20.4 KB, free: 1975.8 MB)
21/10/13 17:44:19 INFO SparkContext: Created broadcast 0 from textFile at wordcount.scala:14
21/10/13 17:44:19 INFO FileInputFormat: Total input paths to process : 1
21/10/13 17:44:19 INFO SparkContext: Starting job: collect at wordcount.scala:22
21/10/13 17:44:20 INFO DAGScheduler: Registering RDD 3 (map at wordcount.scala:18)
21/10/13 17:44:20 INFO DAGScheduler: Got job 0 (collect at wordcount.scala:22) with 2 output partitions
21/10/13 17:44:20 INFO DAGScheduler: Final stage: ResultStage 1 (collect at wordcount.scala:22)
21/10/13 17:44:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
21/10/13 17:44:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
21/10/13 17:44:20 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at wordcount.scala:18), which has no missing parents
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.0 KB, free 1975.6 MB)
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.9 KB, free 1975.6 MB)
21/10/13 17:44:20 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 2.9 KB, free: 1975.8 MB)
21/10/13 17:44:20 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
21/10/13 17:44:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at wordcount.scala:18) (first 15 tasks are for partitions Vector(0, 1))
21/10/13 17:44:20 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
21/10/13 17:44:20 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7887 bytes)
21/10/13 17:44:20 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7887 bytes)
21/10/13 17:44:20 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
21/10/13 17:44:20 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/10/13 17:44:20 INFO HadoopRDD: Input split: file:/C:/Users/muzili/Desktop/word.txt:0+25
21/10/13 17:44:20 INFO HadoopRDD: Input split: file:/C:/Users/muzili/Desktop/word.txt:25+26
21/10/13 17:44:20 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1157 bytes result sent to driver
21/10/13 17:44:20 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1157 bytes result sent to driver
21/10/13 17:44:20 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 165 ms on localhost (executor driver) (1/2)
21/10/13 17:44:20 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 181 ms on localhost (executor driver) (2/2)
21/10/13 17:44:20 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
21/10/13 17:44:20 INFO DAGScheduler: ShuffleMapStage 0 (map at wordcount.scala:18) finished in 0.256 s
21/10/13 17:44:20 INFO DAGScheduler: looking for newly runnable stages
21/10/13 17:44:20 INFO DAGScheduler: running: Set()
21/10/13 17:44:20 INFO DAGScheduler: waiting: Set(ResultStage 1)
21/10/13 17:44:20 INFO DAGScheduler: failed: Set()
21/10/13 17:44:20 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at wordcount.scala:20), which has no missing parents
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 1975.6 MB)
21/10/13 17:44:20 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2029.0 B, free 1975.6 MB)
21/10/13 17:44:20 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on LAPTOP-R0NFMTAH:51568 (size: 2029.0 B, free: 1975.8 MB)
21/10/13 17:44:20 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1161
21/10/13 17:44:20 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at wordcount.scala:20) (first 15 tasks are for partitions Vector(0, 1))
21/10/13 17:44:20 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
21/10/13 17:44:20 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, ANY, 7662 bytes)
21/10/13 17:44:20 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, ANY, 7662 bytes)
21/10/13 17:44:20 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
21/10/13 17:44:20 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
21/10/13 17:44:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms
21/10/13 17:44:20 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1284 bytes result sent to driver
21/10/13 17:44:20 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1261 bytes result sent to driver
21/10/13 17:44:20 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 60 ms on localhost (executor driver) (1/2)
21/10/13 17:44:20 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 58 ms on localhost (executor driver) (2/2)
21/10/13 17:44:20 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
21/10/13 17:44:20 INFO DAGScheduler: ResultStage 1 (collect at wordcount.scala:22) finished in 0.070 s
21/10/13 17:44:20 INFO DAGScheduler: Job 0 finished: collect at wordcount.scala:22, took 0.561218 s
21/10/13 17:44:20 INFO SparkUI: Stopped Spark web UI at http://LAPTOP-R0NFMTAH:4040
(scala,1)
(flink,1)
(hello,4)
(spark,1)
(hadoop,1)
21/10/13 17:44:20 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/10/13 17:44:20 INFO MemoryStore: MemoryStore cleared
21/10/13 17:44:20 INFO BlockManager: BlockManager stopped
21/10/13 17:44:20 INFO BlockManagerMaster: BlockManagerMaster stopped
21/10/13 17:44:20 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/10/13 17:44:20 INFO SparkContext: Successfully stopped SparkContext
21/10/13 17:44:20 INFO ShutdownHookManager: Shutdown hook called
21/10/13 17:44:20 INFO ShutdownHookManager: Deleting directory C:UsersmuziliAppDataLocalTempspark-74be267d-556a-40a4-9253-55fc0a910290

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/327271.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号