HDFS初探_大数据系统

1、HDFS的写数据流程 1.1、Hadoop-client

hadoop的FileSystem类中,遍历文件目录的三种方法(源码和区别)，通过FileSystem对象操作HDFS的方法就不过多讲了，我的上一篇博客给了一个操作HDFS的入门教程，其他的请参考官网示例。

1.2、剖析文件写入

Block、Packet、Chunk的概念
1.Hadoop-client询问NameNode文件是否可以上传，若可以，则获取数据存在那些DataNode；
2.Block分解成Packet上传到其中一个DataNode，第一个收到数据的DataNode传给下一个DataNode；
3.传输完成一个Block后，请求NameNode上传第二个Block给DataNode，直至传输完成；

1.3、HDFS机架感知，网络拓扑，节点距离计算

第一个数据副本在Client所处的节点，客户端在集群外，随机选取一个
第二个节点在另一个机架的随机一个节点
第三个副本在第二个副本所在机架的随机节点

2、HDFS的读数据流程

Hadoop-client请求NameNode下载文件，获取文件元数据以得到DataNode地址
Hadoop-client挑选最近的一台DataNode请求读取数据
Hadoop-client接收DataNode以Packet为单位传输过来的数据，缓存到一定量的数据再写入目标文件

3、NameNode和SecondaryNameNode

NameNode格式化时创建【编辑日志Edits】和【镜像文件FsImage】；
NameNode中的元数据存放在内存中，使用镜像文件FsImage定期备份数据到磁盘；
使用编辑日志Edits（只能追加）记录NameNode中的元数据的增删改（查询不会更改元数据）；
NameNode内存全量数据 = Edits + FsImage；
NameNode在内存中的元数据与编辑日志Edits同步更新；
SecondaryNamenode合并Edits和FsImage（定时到了，或Edits满了），形成fsImage.chkpoint；
SecondaryNamenode使用新的fsImage.chkpoint替换NameNode旧的FsImage

PS：NameNode启动和SecondaryNamenode合并Edits和FsImage时，会滚动Edits，创建一个空的edits.inprogress，以便新的操作都写入edits.inprogress。NameNode只会记载Fsimage和未合并的Edits，合并过的Edits在Fsimage中。

[atguigu@hadoop102 current]$ pwd
/opt/module/hadoop-3.1.3/data/dfs/name/current
[atguigu@hadoop102 current]$ ll
总用量 12352
-rw-rw-r--. 1 atguigu atguigu      42 12月 12 18:00 edits_0000000000000000001-0000000000000000002
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 12 18:00 edits_0000000000000000003-0000000000000000003
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 12 18:08 edits_0000000000000000004-0000000000000000004
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 12 20:34 edits_0000000000000000005-0000000000000000005
-rw-rw-r--. 1 atguigu atguigu      42 12月 12 21:22 edits_0000000000000000006-0000000000000000007
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 12 21:22 edits_0000000000000000008-0000000000000000008
-rw-rw-r--. 1 atguigu atguigu      42 12月 12 21:26 edits_0000000000000000009-0000000000000000010
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 12 21:26 edits_0000000000000000011-0000000000000000011
-rw-rw-r--. 1 atguigu atguigu     690 12月 12 21:49 edits_0000000000000000012-0000000000000000020
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 12 21:49 edits_0000000000000000021-0000000000000000021
-rw-rw-r--. 1 atguigu atguigu      42 12月 12 22:00 edits_0000000000000000022-0000000000000000023
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 12 22:10 edits_0000000000000000024-0000000000000000186
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 13 22:49 edits_0000000000000000187-0000000000000000187
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 13 22:54 edits_0000000000000000188-0000000000000000188
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 14 20:36 edits_0000000000000000189-0000000000000000189
-rw-rw-r--. 1 atguigu atguigu    3741 12月 14 21:38 edits_0000000000000000190-0000000000000000231
-rw-rw-r--. 1 atguigu atguigu      42 12月 14 22:38 edits_0000000000000000232-0000000000000000233
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 14 23:17 edits_0000000000000000234-0000000000000000248
-rw-rw-r--. 1 atguigu atguigu      42 12月 15 20:51 edits_0000000000000000249-0000000000000000250
-rw-rw-r--. 1 atguigu atguigu     235 12月 15 21:52 edits_0000000000000000251-0000000000000000255
-rw-rw-r--. 1 atguigu atguigu      42 12月 15 22:52 edits_0000000000000000256-0000000000000000257
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 17 21:24 edits_inprogress_0000000000000000258
-rw-rw-r--. 1 atguigu atguigu    2752 12月 15 21:52 fsimage_0000000000000000255
-rw-rw-r--. 1 atguigu atguigu      62 12月 15 21:52 fsimage_0000000000000000255.md5
-rw-rw-r--. 1 atguigu atguigu    2752 12月 15 22:52 fsimage_0000000000000000257
-rw-rw-r--. 1 atguigu atguigu      62 12月 15 22:52 fsimage_0000000000000000257.md5
-rw-rw-r--. 1 atguigu atguigu       4 12月 15 22:52 seen_txid
-rw-rw-r--. 1 atguigu atguigu     216 12月 15 20:50 VERSION
[atguigu@hadoop102 current]$ pwd
/opt/module/hadoop-3.1.3/data/dfs/name/current
[atguigu@hadoop102 current]$ cat seen_txid 
266

3.1、NameNode故障处理

[atguigu@hadoop104 current]$ pwd
/opt/module/hadoop-3.1.3/data/dfs/namesecondary/current
[atguigu@hadoop104 current]$ ll
总用量 5180
-rw-rw-r--. 1 atguigu atguigu      42 12月 12 18:00 edits_0000000000000000001-0000000000000000002
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 12 21:22 edits_0000000000000000005-0000000000000000005
-rw-rw-r--. 1 atguigu atguigu      42 12月 12 21:22 edits_0000000000000000006-0000000000000000007
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 12 21:26 edits_0000000000000000008-0000000000000000008
-rw-rw-r--. 1 atguigu atguigu      42 12月 12 21:26 edits_0000000000000000009-0000000000000000010
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 12 21:49 edits_0000000000000000011-0000000000000000011
-rw-rw-r--. 1 atguigu atguigu     690 12月 12 21:49 edits_0000000000000000012-0000000000000000020
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 12 22:00 edits_0000000000000000021-0000000000000000021
-rw-rw-r--. 1 atguigu atguigu      42 12月 12 22:00 edits_0000000000000000022-0000000000000000023
-rw-rw-r--. 1 atguigu atguigu 1048576 12月 14 21:38 edits_0000000000000000189-0000000000000000189
-rw-rw-r--. 1 atguigu atguigu    3741 12月 14 21:38 edits_0000000000000000190-0000000000000000231
-rw-rw-r--. 1 atguigu atguigu      42 12月 14 22:38 edits_0000000000000000232-0000000000000000233
-rw-rw-r--. 1 atguigu atguigu      42 12月 15 20:51 edits_0000000000000000249-0000000000000000250
-rw-rw-r--. 1 atguigu atguigu     235 12月 15 21:52 edits_0000000000000000251-0000000000000000255
-rw-rw-r--. 1 atguigu atguigu      42 12月 15 22:52 edits_0000000000000000256-0000000000000000257
-rw-rw-r--. 1 atguigu atguigu    2752 12月 15 21:52 fsimage_0000000000000000255
-rw-rw-r--. 1 atguigu atguigu      62 12月 15 21:52 fsimage_0000000000000000255.md5
-rw-rw-r--. 1 atguigu atguigu    2752 12月 15 22:52 fsimage_0000000000000000257
-rw-rw-r--. 1 atguigu atguigu      62 12月 15 22:52 fsimage_0000000000000000257.md5
-rw-rw-r--. 1 atguigu atguigu     216 12月 15 22:52 VERSION

SecondaryNameNode相对NameNode缺少的文件
seen_txid、VERSION
edits_inprogress_0000000000000000258

-rw-rw-r--. 1 atguigu atguigu 1048576 12月 17 21:24 edits_inprogress_0000000000000000258
-rw-rw-r--. 1 atguigu atguigu       4 12月 15 22:52 seen_txid
-rw-rw-r--. 1 atguigu atguigu     216 12月 15 20:50 VERSION

恢复操作：将SecondaryNameNode中数据拷贝到NameNode存储数据的目录

[atguigu@hadoop102 hadoop-3.1.3]$ kill -9 NameNode
[atguigu@hadoop102 hadoop-3.1.3]$ rm -rf /opt/module/hadoop-3.1.3/data/dfs/name/*
[atguigu@hadoop102 dfs]$ scp -r atguigu@hadoop104:/opt/module/hadoop-3.1.3/data/dfs/namesecondary/* ./name/
[atguigu@hadoop102 hadoop-3.1.3]$ hdfs --daemon start namenode

另一种恢复操作：使用-importCheckpoint选项启动NameNode守护进程，从而将SecondaryNameNode中数据拷贝到NameNode目录中。

3.2、oiv、oev查看fsimage、edits

Fsimage中没有记录块所对应DataNode，为什么？

[atguigu@hadoop102 current]$ hdfs oiv -p XML -i fsimage_0000000000000000257 -o /opt/data/fsimage.xml
[atguigu@hadoop102 current]$ cat /opt/data/fsimage.xml

NameNode如何确定下次开机启动的时候合并哪些Edits？

[atguigu@hadoop102 current]$ hdfs oiv -p XML -i fsimage_0000000000000000257 -o /opt/data/edits.xml
[atguigu@hadoop102 current]$ cat /opt/data/edits.xml

SecondaryNameNode什么时候执行

# hdfs-default.xml
# 第一个触发条件

  dfs.namenode.checkpoint.period
  3600s

# 第二个触发条件

  dfs.namenode.checkpoint.txns
  1000000
操作动作次数


  dfs.namenode.checkpoint.check.period
  60s
 1分钟检查一次操作次数

4、集群安全模式期间

NameNode合并Fsimag和Edits，在内存中建立文件系统元数据映像，期间NameNode处于只读模式
Fsimag中时不会存储块映射信息的，NameNode接收DataNode自动发送过来的块映射信息，在内存中建立块位置信息
当文件系统的块99.9%满足最小副本级别（默认值dfs.replication.min=1），NameNode在30s内退出安全模式

[atguigu@hadoop102 current]$ hdfs dfsadmin -safemode get --查看是否处正安全模式
Safe mode is OFF
[atguigu@hadoop102 current]$ hdfs dfsadmin -safemode enter --进入安全模式
Safe mode is ON
[atguigu@hadoop102 current]$ hdfs dfsadmin -safemode wait --等待安全模式结束后，执行后续操作
^Z
[1]+  已停止               hdfs dfsadmin -safemode wait
[atguigu@hadoop102 current]$ hdfs dfsadmin -safemode leave --离开安全模式
Safe mode is OFF
[atguigu@hadoop102 current]$ hdfs dfsadmin -safemode wait --离开安全模式后不再阻塞
Safe mode is OFF

5、DataNode工作机制

5.1、集群运行中可以安全加入和退出一些机器

查看HDFS各节点状态

[atguigu@hadoop102 ~]$ hdfs dfsadmin -report
Configured Capacity: 133819293696 (124.63 GB)
Present Capacity: 110024376320 (102.47 GB)
DFS Remaining: 110022897664 (102.47 GB)
DFS Used: 1478656 (1.41 MB)
DFS Used%: 0.00%
Replicated Blocks:
	Under replicated blocks: 1
	Blocks with corrupt replicas: 0
	Missing blocks: 0
	Missing blocks (with replication factor 1): 0
	Low redundancy blocks with highest priority to recover: 0
	Pending deletion blocks: 0
Erasure Coded Block Groups: 
	Low redundancy block groups: 0
	Block groups with corrupt internal blocks: 0
	Missing block groups: 0
	Low redundancy blocks with highest priority to recover: 0
	Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 192.168.2.34:9866 (hadoop102)
Hostname: hadoop102
Decommission Status : Normal
Configured Capacity: 43197251584 (40.23 GB)
DFS Used: 495616 (484 KB)
Non DFS Used: 5825630208 (5.43 GB)
DFS Remaining: 35153231872 (32.74 GB)
DFS Used%: 0.00%
DFS Remaining%: 81.38%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Dec 17 22:42:43 CST 2021
Last Block Report: Wed Dec 15 22:40:44 CST 2021
Num of Blocks: 10


Name: 192.168.2.35:9866 (hadoop103)
Hostname: hadoop103
Decommission Status : Normal
Configured Capacity: 45311021056 (42.20 GB)
DFS Used: 487424 (476 KB)
Non DFS Used: 5547835392 (5.17 GB)
DFS Remaining: 37437431808 (34.87 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.62%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Dec 17 22:42:44 CST 2021
Last Block Report: Wed Dec 15 23:02:16 CST 2021
Num of Blocks: 9


Name: 192.168.2.36:9866 (hadoop104)
Hostname: hadoop104
Decommission Status : Normal
Configured Capacity: 45311021056 (42.20 GB)
DFS Used: 495616 (484 KB)
Non DFS Used: 5553025024 (5.17 GB)
DFS Remaining: 37432233984 (34.86 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.61%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Dec 17 22:42:43 CST 2021
Last Block Report: Wed Dec 15 20:51:14 CST 2021
Num of Blocks: 10

查看YARN各节点状态

[atguigu@hadoop102 ~]$ yarn node -list
2021-12-17 22:45:36,642 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.2.35:8032
Total Nodes:3
         Node-Id	     Node-State	Node-Http-Address	Number-of-Running-Containers
 hadoop103:35192	        RUNNING	   hadoop103:8042	                           0
 hadoop104:42409	        RUNNING	   hadoop104:8042	                           0
 hadoop102:37691	        RUNNING	   hadoop102:8042	                           0

5.1.1、添加DataNode和NodeManager节点

# hadoop102、hadoop103、hadoop104、hadoop105在worker中添加hadoop105
# 在workers中配置的节点才能群起群停，否则需要单独起停
[atguigu@hadoop102 ~]$ cat /opt/module/hadoop-3.1.3/etc/hadoop/workers
hadoop102
hadoop103
hadoop104
hadoop105
# 在新的节点启动DataNode和NodeManager
[atguigu@hadoop105 ~]$ hdfs --daemon start datanode
[atguigu@hadoop105 ~]$ yarn --daemon start nodemanager
# 刷新NameNode、刷新ResourceManager
[atguigu@hadoop102 ~]$ hdfs dfsadmin -refreshNodes
[atguigu@hadoop102 ~]$ yarn rmadmin -refreshNodes
# 如果数据不均衡，可以用命令实现集群的再平衡
[atguigu@hadoop102 ~]$ start-balancer.sh
# 查看集群状态
[atguigu@hadoop102 ~]$ hdfs dfsadmin -report
[atguigu@hadoop102 ~]$ yarn node -list

5.1.2、动态删除DataNode节点与NodeManager节点

# 停止DataNode和NodeManager进程
[atguigu@hadoop105 ~]$ hdfs --daemon stop datanode
[atguigu@hadoop105 ~]$ yarn --daemon stop nodemanager
# hadoop102、hadoop103、hadoop104、hadoop105在worker中删除hadoop105
[atguigu@hadoop102 ~]$ cat /opt/module/hadoop-3.1.3/etc/hadoop/workers
hadoop102
hadoop103
hadoop104
# 刷新NameNode、刷新ResourceManager
[atguigu@hadoop102 ~]$ hdfs dfsadmin -refreshNodes
[atguigu@hadoop102 ~]$ yarn rmadmin -refreshNodes
# 如果数据不均衡，可以用命令实现集群的再平衡
[atguigu@hadoop102 ~]$ start-balancer.sh
# 查看集群状态
[atguigu@hadoop102 ~]$ hdfs dfsadmin -report
[atguigu@hadoop102 ~]$ yarn node -list

5.2、Hadoop3数据容错技术（纠删码）

随着大数据技术的发展，HDFS作为Hadoop的核心模块之一得到了广泛的应用。为了数据的可靠性，HDFS通过多副本机制来保证。在HDFS中的每一份数据都有两个副本，1TB的原始数据需要占用3TB的磁盘空间，存储利用率只有1/3。而且系统中大部分是使用频率非常低的冷数据，却和热数据一样存储3个副本，给存储空间和网络带宽带来了很大的压力。因此，在保证可靠性的前提下如何提高存储利用率已成为当前HDFS面对的主要问题之一。Hadoop 3.0 引入了纠删码技术（Erasure Coding），它可以提高50%以上的存储利用率，并且保证数据的可靠性。纠删码技术（Erasure coding）简称EC，是一种编码容错技术。最早用于通信行业，数据传输中的数据恢复。它通过对数据进行分块，然后计算出校验数据，使得各个部分的数据产生关联性。当一部分数据块丢失时，可以通过剩余的数据块和校验块计算出丢失的数据块。数据完整性（即检查存储的数据是否损坏）是另一个需要解决的问题，DataNode在数据读取Block时会计算CheckSum，此外，DataNode在其文件创建后周期验证CheckSum，常见的校验算法 crc（32）、md5（128）、sha1（160）。

5.3、掉线时限参数设置

耗时一个月，整理出这份Hadoop吐血宝典

5.4、添加白名单和黑名单

白名单和黑名单是hadoop管理集群主机的一种机制。
添加到白名单的主机节点，都允许访问NameNode，不在白名单的主机节点，都会被退出。添加到黑名单的主机节点，不允许访问NameNode，会在数据迁移后退出。
实际情况下，白名单用于确定允许访问NameNode的DataNode节点，内容配置一般与workers文件内容一致。
黑名单用于在集群运行过程中退役DataNode节点。

5.4.1、创建白名单、黑名单

[atguigu@hadoop102 hadoop]$ pwd
/opt/module/hadoop-3.1.3/etc/hadoop
[atguigu@hadoop102 hadoop]$ touch whitelist
[atguigu@hadoop102 hadoop]$ touch blacklist
[atguigu@hadoop102 hadoop]$ cat blacklist
hadoop102
hadoop103
hadoop104
hadoop105

5.4.2、在hdfs-site.xml配置文件中增加dfs.hosts和 dfs.hosts.exclude配置参数


	dfs.hosts
	/opt/module/hadoop-3.1.3/etc/hadoop/whitelist



	dfs.hosts.exclude
	/opt/module/hadoop-3.1.3/etc/hadoop/blacklist

5.4.3、分发配置文件whitelist，blacklist，hdfs-site.xml 分发配置文件whitelist，blacklist，hdfs-site.xml (注意：105节点也要发一份)

[atguigu@hadoop102 etc]$ xsync hadoop/ 
[atguigu@hadoop102 etc]$ rsync -av hadoop/ atguigu@hadoop105:/opt/module/hadoop-3.1.3/etc/hadoop/

5.4.4、重新启动集群(注意：105节点没有添加到workers，因此要单独起停)

[atguigu@hadoop102 hadoop-3.1.3]$ stop-dfs.sh
[atguigu@hadoop102 hadoop-3.1.3]$ start-dfs.sh
[atguigu@hadoop105 hadoop-3.1.3]$ hdfs –daemon start datanode
# 如果数据不均衡，可以用命令实现集群的再平衡
[atguigu@hadoop102 hadoop-3.1.3]$ sbin/start-balancer.sh

5.4.5、在web浏览器上查看目前正常工作的DN节点 5.4.6、编辑/opt/module/hadoop-3.1.3/etc/hadoop目录下的blacklist文件

PS：不允许白名单和黑名单中同时出现同一个主机名称，例如使用了黑名单blacklist成功退役了hadoop105节点，因此要将白名单whitelist里面的hadoop105去掉。

[atguigu@hadoop102 hadoop] cat blacklist
hadoop105
[atguigu@hadoop102 hadoop]$ cat blacklist
hadoop102
hadoop103
hadoop104

5.4.7、分发blacklist到所有节点

[atguigu@hadoop102 etc]$ xsync hadoop/ 
[atguigu@hadoop102 etc]$ rsync -av hadoop/ atguigu@hadoop105:/opt/module/hadoop-3.1.3/etc/hadoop/

5.4.8、刷新NameNode、刷新ResourceManager

[atguigu@hadoop102 hadoop-3.1.3]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful
[atguigu@hadoop102 hadoop-3.1.3]$ yarn rmadmin -refreshNodes
17/06/24 14:55:56 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.1.103:8033

5.4.9、检查Web浏览器，退役节点的状态为decommission in progress（退役中），说明数据节点正在复制块到其他节点 5.4.10、等待退役节点状态为decommissioned（所有块已经复制完成），停止该节点及节点资源管理器。注意：如果副本数是3，服役的节点小于等于3，是不能退役成功的，需要修改副本数后才能退役

[atguigu@hadoop105 hadoop-3.1.3]$ hdfs --daemon stop datanode
stopping datanode
[atguigu@hadoop105 hadoop-3.1.3]$ yarn --daemon stop nodemanager
stopping nodemanager
# 如果数据不均衡，可以用命令实现集群的再平衡
[atguigu@hadoop102 hadoop-3.1.3]$ sbin/start-balancer.sh

HDFS初探

大数据系统相关栏目本月热门文章