HDFS的Shell操作以及Java API操作

1 HDFS的Shell操作 1.1 hdfs命令

基本语法

hadoop fs 具体命令   
# OR  
hdfs dfs 具体命令

以上两个命令是完全相同的。

命令大全

$ bin/hadoop fs

[-appendToFile  ... ]
        [-cat [-ignoreCrc]  ...]
        [-checksum  ...]
        [-chgrp [-R] GROUP PATH...]
        [-chmod [-R]  PATH...]
        [-chown [-R] [OWNER][:[GROUP]] PATH...]
        [-copyFromLocal [-f] [-p]  ... ]
        [-copyToLocal [-p] [-ignoreCrc] [-crc]  ... ]
        [-count [-q]  ...]
        [-cp [-f] [-p]  ... ]
        [-createSnapshot  []]
        [-deleteSnapshot  ]
        [-df [-h] [ ...]]
        [-du [-s] [-h]  ...]
        [-expunge]
        [-get [-p] [-ignoreCrc] [-crc]  ... ]
        [-getfacl [-R] ]
        [-getmerge [-nl]  ]
        [-help [cmd ...]]
        [-ls [-d] [-h] [-R] [ ...]]
        [-mkdir [-p]  ...]
        [-moveFromLocal  ... ]
        [-moveToLocal  ]
        [-mv  ... ]
        [-put [-f] [-p]  ... ]
        [-renameSnapshot   ]
        [-rm [-f] [-r|-R] [-skipTrash]  ...]
        [-rmdir [--ignore-fail-on-non-empty]  ...]
        [-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set  ]]
        [-setrep [-R] [-w]   ...]
        [-stat [format]  ...]
        [-tail [-f] ]
        [-test -[defsz] ]
        [-text [-ignoreCrc]  ...]
        [-touchz  ...]
        [-usage [cmd ...]]

1.2 上传

首先需要启动hdsf

start-dfs.sh

在浏览器输入hadoop102:9870访问hdfs网页，创建sanguo目录

-copyFromLocal：从本地文件系统中拷贝文件到HDFS路径去

vim liubei.txt # 写入相关文字
hadoop fs -copyFromLocal ./liubei.txt /sanguo # 执行后可在网页查看

-moveFromLocal：从本地剪切粘贴到HDFS

vim guanyu.txt # 写入相关文字
hadoop fs -moveFromLocal ./guanyu.txt /sanguo  # 执行后可在网页查看，本地文件上传后就被删除了

-appendToFile：追加一个文件到已经存在的文件末尾

vim zhangfei.txt # 写入相关文字
hadoop fs -appendToFile ./zhangfei /sanguo/liubei.txt 
# 将zhangfei.txt内容追加到liubei.txt中

-put：等同于copyFromLocal

hadoop fs -put ./zhangfei.txt /sanguo

1.3 下载

-copyToLocal：从HDFS拷贝到本地

hadoop fs -copyToLocal /sanguo/guanyu.txt ./
# 将guanyu.txt删除，方便下面测试
rm -rf guanyu.txt

-get：等同于copyToLocal，就是从HDFS下载文件到本地

hadoop fs -get /sanguo/guanyu.txt ./

-getmerge：合并下载多个文件，比如HDFS的目录 /sanguo下有多个文件:liubei.txt, guanyu.txt,zhangfei.txt,下载到本地xiongdi.txt

hadoop fs -getmerge /sanguo/liubei.txt /sanguo/guanyu.txt /sanguo/zhagfei.txt ./xiongdi.txt

1.4 HDFS直接操作

-ls: 显示目录信息

hadoop fs -ls /

-mkdir：在HDFS上创建目录

hadoop fs -mkdir /xiyou # 创建目录
hadoop fs -mkdir -p /shuihu/liangsan #创建多层目录需要加上 -p

-cat：显示文件内容

hadoop fs -cat /sanguo/guanyu.txt

-chgrp 、-chmod、-chown：Linux文件系统中的用法一样，修改文件所属权限

hadoop fs  -chmod  666 /sanguo/guanyu.txt
hadoop fs  -chown  xu1an:xu1an   /sanguo/zhangfei.txt

-cp ：从HDFS的一个路径拷贝到HDFS的另一个路径

hadoop fs -cp /sanguo/zhangfei.txt /xiyou # 将zhangfei.txt复制到xiyou下

-mv：在HDFS目录中移动文件

hadoop fs -mv /sanguo/guanyu.txt /xiyou # 将guanyu.txt移动到xiyou下
hadoop fs -mv /sanguo/liubei.txt /sanguo/zhugong.txt #将liubei.txt改名为zhugong.txt

-tail：显示一个文件的末尾1kb的数据

hadoop fs -tail /sanguo/zhangfei.txt # 显示头部 不支持 -n 
hadoop fs -head /sanhuo/zhangfei.txt # 显示尾部

-rm：删除文件或文件夹

hadoop fs -rm /xiyou/zhangfei.txt # 删除文件
hadoop fs -rm -r /xiyou # 删除文件夹

-rmdir：删除空目录

hadoop fs -rmdir /shuihu/liangsan

-du统计文件夹的大小信息

hadoop fs -du /
hadoop fs -du -h / #有单位 52  156  /sanguo 三倍关系：有三个副本

-setrep：设置HDFS中文件的副本数量

hadoop fs -setrep 6 /sanguo/zhangfei.txt # 设置副本数为6

这里设置的副本数只是记录在NameNode的元数据中，是否真的会有这么多副本，还得看DataNode的数量。因为目前只有3台设备，最多也就3个副本，只有节点数的增加到10台时，副本数才能达到10。

2 HDFS客户端操作 2.1 准备Windows开发环境

配置环境变量（和java配置环境变量流程一样，这里就略过啦）

使用Idea（学生试用1年）创建Maven项目，并导入相应的依赖


    
        junit
        junit
        4.12
    
    
        org.apache.logging.log4j
        log4j-slf4j-impl
        2.12.0
    
    
        org.apache.hadoop
        hadoop-client
        3.1.3

在项目的src/main/resources目录下，新建一个文件，命名为“log4j2.xml”，在文件中填入

创建com.xu1an.hdfs.HdfsClinet类

public class HdfsClient{	
@Test
public void testMkdirs() throws IOException, InterruptedException, URISyntaxException{
		
		// 1 获取文件系统
		Configuration configuration = new Configuration();
		// 配置在集群上运行
		// configuration.set("fs.defaultFS", "hdfs://hadoop102:9820");
		// FileSystem fs = FileSystem.get(configuration);

		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9820"), configuration, "xu1an");
		
		// 2 创建目录
		fs.mkdirs(new Path("/1108/daxian/banzhang"));
		
		// 3 关闭资源
		fs.close();
	}
}

3.2 HDFS的API操作 3.2.1 HDFS文件上传(测试参数优先级)

编写源代码

@test
public void testCopyFromLocalFile() throws IOException(){
	//1、 获取文件系统
	Configuration conf = new Configuration();
	conf.set("dfs.replication","5");
    FileSystem fs = FileSystem.get(new URI(hdfs://hadoop:9820), conf, "xu1an" );
    //2、上传文件
   	fs.copyFromLocalFile(false,true,new Path("D:\web\bigdataDemo\src\main\resources\hello.txt"),new Path("/client_test"));
     //3、 关闭资源
     fs.close;         
    
}

将hdfs-site.xml拷贝到项目的根目录下





	
		dfs.replication
         1

参数优先级

测试配置的优先级 Confrguration > hdfs-site.xml > hdfs-default.xml

3.2.2 HDFS文件下载

@Test
public void testCopyToLocalFile() throws IOException, InterruptedException, URISyntaxException{

		// 1 获取文件系统
		Configuration configuration = new Configuration();
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9820"), configuration, "xu1an");
		
		// 2 执行下载操作
		// boolean delSrc 指是否将原文件删除
		// Path src 指要下载的文件路径
		// Path dst 指将文件下载到的路径
		// boolean useRawLocalFileSystem 是否开启文件校验
		fs.copyToLocalFile(false, new Path("/banzhang.txt"), new Path("e:/banhua.txt"), true);
		
		// 3 关闭资源
		fs.close();
}

3.2.3 HDFS删除文件和目录

recursive=true递归删除目录路径下的文件夹和文件

@Test
public void testDelete() throws IOException, InterruptedException, URISyntaxException{

	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9820"), configuration, "xu1an");
		
	// 2 执行删除
	fs.delete(new Path("/0508/"), true);
		
	// 3 关闭资源
	fs.close();
}

3.2.4 HDFS文件更名和移动

@Test
public void testRename() throws IOException, InterruptedException, URISyntaxException{

	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9820"), configuration, "atguigu"); 
		
	// 2 修改文件名称
	fs.rename(new Path("/banzhang.txt"), new Path("/banhua.txt"));
		
	// 3 关闭资源
	fs.close();
}

3.2.5 HDFS文件详情查看

recursive=true递归查看目录路径及其子路径下文件详情

@Test
public void testListFiles() throws IOException, InterruptedException, URISyntaxException{

	// 1获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9820"), configuration, "xu1an"); 
		
	// 2 获取文件详情
	RemoteIterator listFiles = fs.listFiles(new Path("/"), true);
		
	while(listFiles.hasNext()){
		LocatedFileStatus status = listFiles.next();
			
		// 输出详情
		// 文件名称
		System.out.println(status.getPath().getName());
		// 长度
		System.out.println(status.getLen());
		// 权限
		System.out.println(status.getPermission());
		// 分组
		System.out.println(status.getGroup());
			
		// 获取存储的块信息
		BlockLocation[] blockLocations = status.getBlockLocations();
			
		for (BlockLocation blockLocation : blockLocations) {
				
			// 获取块存储的主机节点
			String[] hosts = blockLocation.getHosts();
				
			for (String host : hosts) {
				System.out.println(host);
			}
		}
			
		System.out.println("-----------班长的分割线----------");
	}

// 3 关闭资源
fs.close();
}

3.2.6 HDFS文件和文件夹判断

只会判读同个目录下的文件或文件夹

@Test
public void testListStatus() throws IOException, InterruptedException, URISyntaxException{
		
	// 1 获取文件配置信息
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9820"), configuration, "xu1an");
		
	// 2 判断是文件还是文件夹
	FileStatus[] listStatus = fs.listStatus(new Path("/"));
		
	for (FileStatus fileStatus : listStatus) {
		
		// 如果是文件
		if (fileStatus.isFile()) {
				System.out.println("f:"+fileStatus.getPath().getName());
			}else {
				System.out.println("d:"+fileStatus.getPath().getName());
			}
		}
		
	// 3 关闭资源
	fs.close();
}

3 总结

我们在这篇文章利用shell命令和Java API体验的HDFS（hadoop分布式文件系统）的相关命令操作。分布式文件系统的操作类似Linux的操作命令，但其内部的实现与Linux（单机文件系统）是不一样的。分布式需要考虑更多的异常情况（或者说数据安全机制）。后续将对HDFS相关流程进行分析。

HDFS的Shell操作以及Java API操作

大数据系统相关栏目本月热门文章