栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Java

Flink 1.14.4 standalone 问题

Java 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Flink 1.14.4 standalone 问题

 官方配置:Configuration | Apache Flink

1、TM进程过一段时间就停止

报错信息:org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Task did not exit gracefully within 180 + seconds.
org.apache.flink.util.FlinkRuntimeException: Task did not exit gracefully within 180 + seconds.
        at org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1791) [flink-dist_2.11-1.14.4.jar:1.14.4]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_291]

原因:任务取消超时

解决:TM配置文件${FLINK_HOME}/conf/flink-conf.yml

#取消任务取消watchdog

task.cancellation.timeout: 0

参数说明:Timeout in milliseconds after which a task cancellation times out and leads to a fatal TaskManager error. A value of 0 deactivates the watch dog. Notice that a task cancellation is different from both a task failure and a clean shutdown. Task cancellation timeout only applies to task cancellation and does not apply to task closing/clean-up caused by a task failure or a clean shutdown.

2、web端上传的jar包,在独立集群重启后全部丢失

原因:文件默认保存在/tmp目录,会被清除

解决:JM配置文件${FLINK_HOME}/conf/flink-conf.yml

web.upload.dir: /usr/local/flink/upload
web.tmpdir: /usr/local/flink/tmpdir

3、JM stop-cluster.sh stop不能停止独立集群

原因:pid文件默认保存在/tmp目录,会被清除导致脚本找不到pid结束进程

解决:JM配置文件${FLINK_HOME}/conf/flink-conf.yml

env.pid.dir: /usr/local/flink/piddir

4、zookeeper存储value太长,zookeeper集群down掉导致TM全部down掉,zookeeper报错信息:

Unexpected exception causing shutdown while sock still open
java.io.IOException: Unreasonable length = 1970218037

    at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:95)
    at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
    at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
    at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:249)


Zookeeper server went down in HA cluster. Please replay if there is any work around.     

You can attempt to increase your jute.maxbuffer Java System Property on the ZK servers to a value higher than 2-3 GB (in bytes) to overcome this. It appears a very large record was somehow placed into your ZK by an application, which appears to have then caused this issue.

解决方法:配置zookeeper的jute.maxbuffer参数到合适的长度

5、java.lang.OutOfMemoryError: Metaspace. 详细报错信息:

java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak in user code or some of its dependencies which has to be investigated and fixed. The task executor has to be shutdown...
        at java.lang.ClassLoader.defineClass1(Native Method) ~[?:1.8.0_291]
        at java.lang.ClassLoader.defineClass(ClassLoader.java:756) ~[?:1.8.0_291]
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[?:1.8.0_291]
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) ~[?:1.8.0_291]
        at java.net.URLClassLoader.access$100(URLClassLoader.java:74) ~[?:1.8.0_291]
        at java.net.URLClassLoader$1.run(URLClassLoader.java:369) ~[?:1.8.0_291]
        at java.net.URLClassLoader$1.run(URLClassLoader.java:363) ~[?:1.8.0_291]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_291]
        at java.net.URLClassLoader.findClass(URLClassLoader.java:362) ~[?:1.8.0_291]

原因:没有找到具体原因,持续观察,网上搜索有两种说法:代码阻塞、背压

短期解决方案:TM配置文件${FLINK_HOME}/conf/flink-conf.yml

修改配置(默认256m)taskmanager.memory.jvm-metaspace.size: 512m

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/879989.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号