说明:
spark版本:3.0.3
scala版本:2.12.11
一.创建Maven项目,增加scala插件:
新建一个Maven项目,版本号1.0.0,然后下一步中的项目名和Artifactid中名称一样。
然后在main.java下新建包com.sparkcore:
为项目添加scala-sdk,使之拥有scala环境:
下面为项目添加框架支持:在Add frameworks Support中勾选scala
验证scala是否成功,编写一个scala程序验证:
可看到scala程序成功运行
二.增加spark依赖关系:
在pom.xml里面添加以下代码:
org.apache.spark spark-core_2.123.0.3 net.alchim31.maven scala-maven-plugin3.2.2 testCompile org.apache.maven.plugins maven-assembly-plugin3.1.0 jar-with-dependencies make-assembly package single 然后编写spark程序spark01_wordCount,用来统计单词数,需要统计的文件在datas里面,运行结果见图片所示(此处带日志信息)
package com.sparkcore import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} object spark01_wordCount { def main(args: Array[String]): Unit = { val sparkConf = new SparkConf().setMaster("local").setAppName("wordCountApp") val sc = new SparkContext(sparkConf) val lines = sc.textFile("datas") val words = lines.flatMap(_.split(" ")) val wordGroup1 = words.map(word => (word ,1)).reduceByKey((a,b) => a+b) wordGroup1.collect() wordGroup1.foreach(println) sc.stop() } }
三.控制台除去日志信息:
在在项目的resources目录中创建log4j.properties文件,并添加日志配置信息:
log4j.rootCategory=ERROR, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Set the default spark-shell log level to ERROR. When running the spark-shell, the # log level for this class is used to overwrite the root logger's log level, so that # the user can have different defaults for the shell and regular Spark apps. log4j.logger.org.apache.spark.repl.Main=ERROR # Settings to quiet third party logs that are too verbose log4j.logger.org.spark_project.jetty=ERROR log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR log4j.logger.org.apache.parquet=ERROR log4j.logger.parquet=ERROR # SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR除去日志信息后的效果(只保留程序运行结果):



