目录
1、配置文件 hdfs-site.xml
2、退役节点名单
3、刷新NameNode、刷新ResourceManager
4、检查Web浏览器,退役节点的状态为decommission in progress(退役中)
5、调整参数加速Decommission DataNode
Hadoop集群中管理员经常需要向集群中添加节点,或从集群中移除节点,例如:为了扩大存储容量,需要上线一个境界点,相反的,如果想要缩小集群规模,则需要解除节点,如果某些节点出现反常,例如故障率过高或者性能过于低下,则需要下线节点,在上线新节点(而且保证不关闭集群和不损害集群中某一天机器的数据节点数据块丢失情况下),我们需要采用以下方式来解决这些问题
1、配置文件 hdfs-site.xml
在Active Namenode节点,把需要Decommission的DataNode的主机名加入到dfs.hosts.exclude(该配置项在hdfs-site.xml)所指定的文件中,有多个Decommission DataNode以换行分割,建议一次Decommission节点小于hdfs备份数。
2、退役节点名单
3、刷新NameNode、刷新ResourceManager
在Active NameNode节点,执行以下命令:
[root@winner-offline-namenode ~]# hdfs dfsadmin -refreshNodes [root@winner-offline-namenode ~]# yarn rmadmin -refreshNodes
4、检查Web浏览器,退役节点的状态为decommission in progress(退役中)
5、调整参数加速Decommission DataNode
| 参数名称 | 默认值 | 参数含义 |
|---|---|---|
| dfs.namenode.decommission.interval | 30 | 每次启动monitor线程处理退服节点的间隔 |
| dfs.namenode.decommission.blocks.per.interval | 500000 | 每个批次最多处理多少个文件块 |
| dfs.namenode.decommission.max.concurrent.tracked.nodes | 100 | 同时处理退服的节点个数 |
| dfs.namenode.replication.work.multiplier.per.iteration | 32 | 每次复制的块的个数为 dn的个数* 该参数 |
| dfs.namenode.replication.max-streams | 64 | 进行复制任务分配时,单个DN 任务的最大值 |
| dfs.namenode.replication.max-streams-hard-limit | 128 | 若DN 的复制任务大于改值时,不会将其选为复制的源节点 |
dfs.namenode.replication.max-streams 2 Hard limit for the number of highest-priority replication streams. dfs.namenode.replication.max-streams-hard-limit 4 Hard limit for all replication streams. dfs.namenode.replication.work.multiplier.per.iteration 2 *Note*: Advanced property. Change with caution. This determines the total amount of block transfers to begin in parallel at a DN, for replication, when such a command list is being sent over a DN heartbeat by the NN. The actual number is obtained by multiplying this multiplier with the total number of live nodes in the cluster. The result number is the number of blocks to begin transfers immediately for, per DN heartbeat. This number can be any positive, non-zero integer.
dfs.namenode.replication.work.multiplier.per.iteration,默认为2,即每次处理datanode数量*2个block



