报错这个报错是因为RDD的transformation中嵌套transformation或action,导致计算失败
可以先从报错那一行找到嵌套的transformation或action操作,把这个操作拿出来运算
我这里的错误就是第一种情况,附上代码Caused by: org.apache.spark.SparkException: This RDD lacks a SparkContext. It could happen in the following cases:
(1) RDD transformations and actions are NOT invoked by the driver, but inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
(2) When a Spark Streaming job recovers from checkpoint, this exception will be hit if a reference to an RDD not defined by the streaming job is used in DStream operations. For more information, See SPARK-13758.原因:org.apache.spark.SparkException:此RDD缺少SparkContext。在以下情况下可能发生这种情况:
(1) RDD转换和操作不是由驱动程序调用的,而是在其他转换内部调用的;例如,rdd1.map(x=>rdd2.values.count()*x)无效,因为无法在rdd1.map转换内部执行值转换和计数操作。有关更多信息,请参阅SPARK-5063。
(2) 当Spark流作业从检查点恢复时,如果在数据流操作中使用了对未由流作业定义的RDD的引用,则会发生此异常。有关更多信息,请参阅SPARK-13758。
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
object Practice2 {
def main(args: Array[String]): Unit = {
val sparkContext = new SparkContext(new SparkConf().setAppName("Test").setMaster("local"))
val value: RDD[String] = sparkContext.textFile("file//day1012Practice2")
sparkContext.setLogLevel("error")
val value1: RDD[(String, Int)] = value.flatMap(_.split("\s+|\.")).map((_, 1)).reduceByKey(_ + _)
value1.cache()
//输出Spark出现的次数
value1.filter(_._1=="Spark").map(_._2).foreach(println)
//获取最大出现次数
val max: Int = value1.map(_._2).max()
value1.filter(_._2==max).foreach(println)
}
}
运行结果
源文件格式
以空格和.分割的单词



