栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

[原创][脚本]解决hdfs openforwrite租约问题,定时检查

[原创][脚本]解决hdfs openforwrite租约问题,定时检查

异常信息

Hive外部表执行或HDFS集群拷贝异常: Cannot obtain block length for LocatedBlock

org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1647235016030_0015_1_00, diagnostics=[Task failed, taskId=task_1647235016030_0015_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1647235016030_0015_1_00_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.io.IOException: Cannot obtain block length for LocatedBlock{BP-658896538-172.16.0.231-1618368143316:blk_1074121079_380284; getBlockSize()=1754; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[172.16.0.160:50010,DS-c2b56f6d-70e8-41c2-aa83-752ef9c283de,DISK], DatanodeInfoWithStorage[172.16.0.6:50010,DS-17d41643-588f-4e03-a460-30c3469511f5,DISK]]} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) at org.apache.tez.runtime.LogicalIOProcessorRuntimetask.run(LogicalIOProcessorRuntimetask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: java.io.IOException: java.io.IOException: Cannot obtain block length for LocatedBlock{BP-658896538-172.16.0.231-1618368143316:blk_1074121079_380284; getBlockSize()=1754; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[172.16.0.160:50010,DS-c2b56f6d-70e8-41c2-aa83-752ef9c283de,DISK], DatanodeInfoWithStorage[172.16.0.6:50010,DS-17d41643-588f-4e03-a460-30c3469511f5,DISK]]} at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) ... 14 more 

hdfs fsck命令排查异常文件

hdfs fsck / –openforwrite

查看是否存在openforwrite状态文件,其中/表示要检查的根目录。

如果只想打印文件名,可参考如下

hadoop fsck / -openforwrite | egrep -v '^.+$' | egrep "MISSING|OPENFORWRITE" | grep -o "/[^ ]*" 

hdfs recoverLease命令释放租约:

hdfs debug recoverLease -path [-retries ]

使用上述命令来修复异常文件,注意-path后参数使用文件绝对路径,不能是文件夹名称。
如hdfs debug recoverLease -path /ods/events/access/2021-10-20/flume.1634720226917.log -retries 5


编写脚本

我们在实际应用中,可能每天定时执行hive,如果存在这种异常状态文件,就会影响任务运行,因此可以写个每天的定时脚本,来自动修复。

#!/bin/bash
# 取今天时间
MYDATE=`date +%F`
# 执行fsck命令,获取openforwrite状态文件列表,最后grep -v ${MYDATE}表示不检查当前date的文件(因为正常都用的按日分区表,可按需修改)
FILELIST=`hadoop fsck /ods/events/ -openforwrite | egrep -v '^.+$' | egrep "MISSING|OPENFORWRITE" | grep -o "/[^ ]*" | sed -e "s/:$//" | grep -v ${MYDATE}`
# 遍历上面文件列表,并挨个执行修复
for mypath in ${FILELIST}
do
  hdfs debug recoverLease -path ${mypath} -retries 5
done

最后可由contrab或者常用任务编排工具来定时执行,此处不再赘述。

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/780375.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号