栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Spark集群Standalone模式下从本地文件系统创建RDD报错找不到本地文件

Spark集群Standalone模式下从本地文件系统创建RDD报错找不到本地文件

scala> val lines = sc.textFile("file:///root/wc.txt")
lines: org.apache.spark.rdd.RDD[String] = file:///root/wc.txt MapPartitionsRDD[11] at textFile at :24

scala> lines.count
[Stage 7:>                                                          (0 + 2) / 2]22/01/07 13:23:31 WARN TaskSetManager: Lost task 1.0 in stage 7.0 (TID 36, 192.168.80.123, executor 1): java.io.FileNotFoundException: File file:/root/wc.txt does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFilelinkStatusInternal(RawLocalFileSystem.java:857)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
        at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:914)
        at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109)
        at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:267)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:266)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:224)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:95)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:123)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

22/01/07 13:23:32 ERROR TaskSetManager: Task 0 in stage 7.0 failed 4 times; aborting job
[Stage 7:>                                                          (0 + 1) / 2]org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 41, 192.168.80.123, executor 1): java.io.FileNotFoundException: File file:/root/wc.txt does not exist

原因:本地文件只放到了集群中的一个节点上面,而在Spark集群上,提交完任务,在哪个节点执行不确定,如果在其他节点执行,其他节点没有该文件,则会报错“文件不存在“
解决方案:在Standalone模式下,把需要的本地文件在所有节点上都放一份。

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/698437.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号