HDFS读写流程以及多节点、单节点磁盘负载均衡

副本放置策略

生产上建议3个副本。

第一个副本：

假如上传节点为DN节点，优先放置本节点；否则就随机挑选一台磁盘不太慢，CPU不太繁忙的节点。

第二个副本：

放置在于第一个副本的不同的机架的节点上

第三个副本：

放置于第二个副本相同机架的不同节点上

正常很多公司单独选择一个节点，作为client node，没有DN和NN，只有集群的XML文件，可以做通信，能知道数据提交到什么地方

CDH机架有一个默认机架，这个机架看成一个大的、虚拟的概念；CDH一般不调整这种默认机架。

文件写流程

hadoop fs -put xxx.log /
对于用户是无感知的，并不知道底层是怎么做的，运用到FSDataOutputStream对象

当执行hadoop fs -put xxx.log / 这条命令的时候，Client调用FileSystem.create(filePath)分布式文件系统的方法，也就是NN进行【rpc】通信，NN检测路径是否存在、否有权限取创建；假如路径存在，或者权限不足就会返回错误信息。
假如ok，就创建一个新文件，不关联任何的block块，返回一个FSDataOutputStream对象（核心）
Client调用FSDataOutputStream对象的write()方法，先将第一块的第一个副本写到第一个DN，第一个副本写完，就传输给第二个DN；第二个副本写完，就传输给第三个DN；第三个副本写完，就返回一个ack packet确认包给第二个DN，第二个DN接收到第三个的ack packet确认包，加上自身如果ok，就返回一个ack packet确认包给第一个DN，第一个DN接收到第二个DN的ack packet确认包加上自身ok，就返回ack packet确认包给FSDataOutputStream对象，标志第一个块3个副本写完。
然后余下的块依次这样写。
当向文件写入数据完成后，Client调用FSDataOutputStream.close()方法，关闭输出流。
再调用FileSystem.complete()方法，告诉NN该文件写入成功。

文件读流程

运用到FSDataOutputStream对象

Client调用FileSystem.open(filePath)方法，与NN进行【rpc】通信，返回该文件的部分或者全部的block列表（也就是返回FSDataInputStream对象）
Client调用FSDataInputStream对象read()方法：

a. 与第一个块最近的DN进行read（优先是自己），读取完成后，会check；
假如成功，就关闭与当前DN的通信；假如失败，会记录失败块+DN信息，损坏的下次不会再读取；那么会去该块的第二个DN地址读取。
b. 然后去第二个块的最近的DN上通信读取，check后。
假如成功，就关闭与当前DN的通信；假如失败，会记录失败块+DN信息，损坏的下次不会再读取；那么会去该块的第三个DN地址读取。
c. 假如block列表读取完成后，文件还未结束，就再次调用FileSystem，会从NN获取该文件的下一批次的block列表。（这个操作感觉就是连续的数据流，对于客户端操作是透明无感知的）

3. Client调用FSDataInputStream.close()方法，关闭输入流。

pid文件

默认存储在/tmp目录（但是tmp有30天的机制jps也会有影响）
处理方法：
mkdir /home/ruoze/tmp
chmod -R 777 /home/ruoze/tmp
hadoop-env.sh中修改：
export HADOOP_PID_DIR=/home/ruoze/tmp
yarn-env.sh中修改：
export YARN_PID_DIR=/home/ruoze/tmp

常规命令

hadoop fs==> hdfs dfs

[mao@JD hadoop]$ hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile  ... ]
	[-cat [-ignoreCrc]  ...]
	[-checksum  ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R]  PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l]  ... ]
	[-copyToLocal [-p] [-ignoreCrc] [-crc]  ... ]
	[-count [-q] [-h] [-v] [-x]  ...]
	[-cp [-f] [-p | -p[topax]]  ... ]
	[-createSnapshot  []]
	[-deleteSnapshot  ]
	[-df [-h] [ ...]]
	[-du [-s] [-h] [-x]  ...]
	[-expunge]
	[-find  ...  ...]
	[-get [-p] [-ignoreCrc] [-crc]  ... ]
	[-getfacl [-R] ]
	[-getfattr [-R] {-n name | -d} [-e en] ]
	[-getmerge [-nl]  ]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [ ...]]
	[-mkdir [-p]  ...]
	[-moveFromLocal  ... ]
	[-moveToLocal  ]
	[-mv  ... ]
	[-put [-f] [-p] [-l]  ... ]
	[-renameSnapshot   ]
	[-rm [-f] [-r|-R] [-skipTrash]  ...]
	[-rmdir [--ignore-fail-on-non-empty]  ...]
	[-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set  ]]
	[-setfattr {-n name [-v value] | -x name} ]
	[-setrep [-R] [-w]   ...]
	[-stat [format]  ...]
	[-tail [-f] ]
	[-test -[defsz] ]
	[-text [-ignoreCrc]  ...]
	[-touchz  ...]
	[-usage [cmd ...]]

Generic options supported are
-conf      specify an application configuration file
-D             use value for given property
-fs       specify a namenode
-jt     specify a ResourceManager
-files     specify comma separated files to be copied to the map reduce cluster
-libjars     specify comma separated jar files to include in the classpath.
-archives     specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

这些命令正常记住如下这些就可以了：

[-cat [-ignoreCrc]  ...]
[-chmod [-R]  PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l]  ... ]
[-copyToLocal [-p] [-ignoreCrc] [-crc]  ... ]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [ ...]]
[-mkdir [-p]  ...]
[-put [-f] [-p] [-l]  ... ]
[-rm [-f] [-r|-R] [-skipTrash]  ...]

但是切记检查生产环境是否开启回收站，CDH中回收站是默认是开启的


        fs.trash.interval
        100

1
2
3

[mao@JD hadoop]$ hdfs dfs -rm /wordcount/input/1.log
19/12/06 00:21:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/06 00:21:16 INFO fs.TrashPolicyDefault: Moved: 'hdfs://JD:9000/wordcount/input/1.log' to trash at: hdfs://JD:9000/user/mao/.Trash/Current/wordcount/input/1.log

开了回收站，慎用 -skipTrash
这个命令一定不要用hdfs dfs -rm -skipTrash /rz.log！
一定使用hdfs dfs -rm /rz.log放入回收站，CDH默认保留保留7天，7天后会自动删除
fs.trash.interval 10080 7天（分钟为单位）

HDFS安全模式应用：（hdfs dfsadmin -safemode）

HDFS集群故障启动NN LOG显示，进入safe mode，正常手动让其离开安全模式。
手动这做很少，一般是在上游设置一个开关，把数据截流。

手动启动安全模式：hdfs dfsadmin -safemode enter

1
2
3

[mao@JD root]$ hdfs dfsadmin -safemode enter
19/12/06 18:54:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is ON

安全模式：只对写有影响

[mao@JD software]$ hdfs dfs -put mao.log /
19/12/06 19:41:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: Cannot create file/mao.log._COPYING_. Name node is in safe mode.
[mao@JD software]$ hdfs dfs -ls /
19/12/06 19:42:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwx------   - mao supergroup          0 2019-12-02 19:40 /tmp
drwx------   - mao supergroup          0 2019-12-06 00:21 /user
drwxr-xr-x   - mao supergroup          0 2019-12-02 19:40 /wordcount
[mao@JD software]$ hdfs dfs -cat /wordcount/output1/part-r-00000
19/12/06 19:44:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1	2
2	2
3	2
a	3
aaaa	1
afei	1
b	1
bcd	1
c	1

离开安全模式的命令：hdfs dfsadmin -safemode leave

[mao@JD root]$ hdfs dfsadmin -safemode leave
19/12/06 19:56:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is OFF

[mao@JD root]$ hdfs fsck /
19/12/06 19:57:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://JD:50070/fsck?ugi=mao&path=%2F
FSCK started by mao (auth:SIMPLE) from /192.168.0.3 for path / at Fri Dec 06 19:57:07 CST 2019
......Status: HEALTHY
 Total size:	175079 B
 Total dirs:	18
 Total files:	6
 Total symlinks:		0
 Total blocks (validated):	5 (avg. block size 35015 B)
 Minimally replicated blocks:	5 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	1
 Average block replication:	1.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		1
 Number of racks:		1
FSCK ended at Fri Dec 06 19:57:07 CST 2019 in 4 milliseconds

The filesystem under path '/' is HEALTHY

各DN节点的数据均衡：

[mao@JD sbin]$ start-balancer.sh 
starting balancer, logging to /home/mao/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-mao-balancer-JD.out
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
实际shell脚本的代码
[mao@JD root]$ cat /home/mao/app/hadoop/sbin/start-balancer.sh

#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR ConDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hdfs-config.sh

# Start balancer daemon.

"$HADOOP_PREFIX"/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start balancer $@

"$HADOOP_PREFIX"/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script “$bin”/hdfs start balancer $@

注意：
hdfs 命令里并没有start balancer这个命令

[mao@JD root]$ hdfs
Usage: hdfs [--config confdir] COMMAND
       where COMMAND is one of:
  dfs                  run a filesystem command on the file systems supported in Hadoop.
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  journalnode          run the DFS journalnode
  zkfc                 run the ZK Failover Controller daemon
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  diskbalancer         Distributes data evenly among disks on a given node
  haadmin              run a DFS HA admin client
  fsck                 run a DFS filesystem checking utility
  balancer             run a cluster balancing utility
  jmxget               get JMX exported values from NameNode or DataNode.
  mover                run a utility to move block replicas across
                       storage types
  oiv                  apply the offline fsimage viewer to an fsimage
  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
  oev                  apply the offline edits viewer to an edits file
  fetchdt              fetch a delegation token from the NameNode
  getconf              get config values from configuration
  groups               get the groups which users belong to
  snapshotDiff         diff two snapshots of a directory or diff the
                       current directory contents with a snapshot
  lsSnapshottableDir   list all snapshottable dirs owned by the current user
						Use -help to see options
  portmap              run a portmap service
  nfs3                 run an NFS version 3 gateway
  cacheadmin           configure the HDFS cache
  crypto               configure HDFS encryption zones
  storagepolicies      list/get/set block storage policies
  version              print the version

Most commands print help when invoked w/o parameters.

阈值：
threshold = 10.0
然后取used平均值：
90+60+80 = 230/3 = 76%
所有节点的磁盘used与集群的平均used之差要小于这个阈值
90-76=14
60-76=16
80-76=4

而磁盘之间迁移多少是程序自己控制：
dfs.datanode.balance.bandwidthPerSec 30m

执行：（start-balancer.sh）

[mao@JD root]$ start-balancer.sh
starting balancer, logging to /home/mao/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-mao-balancer-JD.out
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved 
[mao@JD root]$ start-balancer.sh -threshold 5
starting balancer, logging to /home/mao/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-mao-balancer-JD.out
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved

crontab job 每天凌晨调度

停止执行：（stop-balancer.sh）

1 2	[mao@JD root]$ stop-balancer.sh no balancer to stop

作用：
每天调度，数据平衡，毛刺修正，使得数据在一定的区间里

一个DN节点的多个磁盘的数据均衡：

官网说明：
Apache Hadoop 3.3.1 – HDFS Disk Balancer

比如：
df -h
/data01 90%
/data02 60%
/data03 80%
/data04 0%
hdfs-site.xml中的dfs.disk.balancer.enabled参数要设置为true

需要三步操作：

hdfs diskbalancer -plan JD 生成JD.plan.json
hdfs diskbalancer -execute JD.plan.json 执行
hdfs diskbalancer -queryJD 查询状态

这个工作什么时候手动或调度执行？

新盘加入
监控服务器的磁盘剩余空间，小于阈值10%，发邮件预警，然后进行手动执行

多硬盘参数的配置：

dfs.datanode.data.dir /data01,/data02,/data03,/data04（默认是一块盘）
comma-delimited（中间用逗号分隔）

/data01 disk1
/data02 disk2
/data03 disk3

DN的生产上挂载多个物理的磁盘目录的目的：

为了高效率写、高效率读，提前规划好2-3年存储量，避免后期加磁盘维护的工作量

HDFS读写流程以及多节点、单节点磁盘负载均衡

大数据系统相关栏目本月热门文章