HDFS Shell命令练习_大数据系统

前提条件

拥有hadoop2环境，可参考：CentOS7安装hadoop2.7.3伪分布式

步骤

HDFS常用的操作命令是dfs命令。

查看hdfs dfs所有命令

[hadoop@node1 ~]$ hdfs dfs 
Usage: hadoop fs [generic options]
    [-appendToFile  ... ]
    [-cat [-ignoreCrc]  ...]
    [-checksum  ...]
    [-chgrp [-R] GROUP PATH...]
    [-chmod [-R]  PATH...]
    [-chown [-R] [OWNER][:[GROUP]] PATH...]
    [-copyFromLocal [-f] [-p] [-l]  ... ]
    [-copyToLocal [-p] [-ignoreCrc] [-crc]  ... ]
    [-count [-q] [-h]  ...]
    [-cp [-f] [-p | -p[topax]]  ... ]
    [-createSnapshot  []]
    [-deleteSnapshot  ]
    [-df [-h] [ ...]]
    [-du [-s] [-h]  ...]
    [-expunge]
    [-find  ...  ...]
    [-get [-p] [-ignoreCrc] [-crc]  ... ]
    [-getfacl [-R] ]
    [-getfattr [-R] {-n name | -d} [-e en] ]
    [-getmerge [-nl]  ]
    [-help [cmd ...]]
    [-ls [-d] [-h] [-R] [ ...]]
    [-mkdir [-p]  ...]
    [-moveFromLocal  ... ]
    [-moveToLocal  ]
    [-mv  ... ]
    [-put [-f] [-p] [-l]  ... ]
    [-renameSnapshot   ]
    [-rm [-f] [-r|-R] [-skipTrash]  ...]
    [-rmdir [--ignore-fail-on-non-empty]  ...]
    [-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set  ]]
    [-setfattr {-n name [-v value] | -x name} ]
    [-setrep [-R] [-w]   ...]
    [-stat [format]  ...]
    [-tail [-f] ]
    [-test -[defsz] ]
    [-text [-ignoreCrc]  ...]
    [-touchz  ...]
    [-truncate [-w]   ...]
    [-usage [cmd ...]]

Generic options supported are
-conf      specify an application configuration file
-D             use value for given property
-fs       specify a namenode
-jt     specify a ResourceManager
-files     specify comma separated files to be copied to the map reduce cluster
-libjars     specify comma separated jar files to include in the classpath.
-archives     specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

启动hdfs

start-dfs.sh

1.帮助命令

查看hdfs dfs命令帮助

hdfs dfs -help [cmd ...]

案例

# 查看-ls命令帮助
[hadoop@node1 ~]$ hdfs dfs -help ls
-ls [-d] [-h] [-R] [ ...] :
  List the contents that match the specified file pattern. If path is not
  specified, the contents of /user/ will be listed. Directory entries
  are of the form:
    permissions - userId groupId sizeOfDirectory(in bytes)
  modificationDate(yyyy-MM-dd HH:mm) directoryName
  
  and file entries are of the form:
    permissions numberOfReplicas userId groupId sizeOfFile(in bytes)
  modificationDate(yyyy-MM-dd HH:mm) fileName
                                                                                 
  -d  Directories are listed as plain files.                                     
  -h  Formats the sizes of files in a human-readable fashion rather than a number
      of bytes.                                                                  
  -R  Recursively list the contents of directories.

2.查看目录内容命令

hdfs dfs -ls [-d] [-h] [-R] [ ...]

列出path目录下的内容，包括文件名、权限、所有者、大小、和修改时间

[hadoop@node1 ~]$ hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:14 /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:16 /output
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp

注意：这里ls 查看hdfs根目录查看到的目录和文件，这里示例查到了3个目录，如果之前没有使用过hdfs，应该查不到任何文件和目录。



# 可以查看多个路径，这里的/input /output根据hdfs实际情况来的，如果没有这些目录，可以先创建一些目录和上传一些文件到hdfs
[hadoop@node1 ~]$ hdfs dfs -ls /input /output
Found 1 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-11 08:13 /input/1.txt
Found 2 items
-rw-r--r--   1 hadoop supergroup          0 2022-03-11 08:16 /output/_SUCCESS
-rw-r--r--   1 hadoop supergroup         25 2022-03-11 08:16 /output/part-r-00000

# 把目录当成普通文件
[hadoop@node1 ~]$ hdfs dfs -ls -d /
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 18:52 /
[hadoop@node1 ~]$ hdfs dfs -ls -d /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:14 /input

# 递归查询
[hadoop@node1 ~]$ hdfs dfs -ls -R /tmp
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp/hadoop-yarn
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp/hadoop-yarn/staging
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp/hadoop-yarn/staging/hadoop
drwx------   - hadoop supergroup          0 2022-03-11 08:16 /tmp/hadoop-yarn/staging/hadoop/.staging
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:15 /tmp/hadoop-yarn/staging/history
drwxrwxrwt   - hadoop supergroup          0 2022-03-11 08:15 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxrwx---   - hadoop supergroup          0 2022-03-11 08:16 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop
-rwxrwx---   1 hadoop supergroup      33282 2022-03-11 08:16 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop/job_1646957562104_0001-1646957751727-hadoop-mywordcount%2D1.0%2DSNAPSHOT.jar-1646957769948-1-1-SUCCEEDED-default-1646957758708.jhist
-rwxrwx---   1 hadoop supergroup        367 2022-03-11 08:16 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop/job_1646957562104_0001.summary
-rwxrwx---   1 hadoop supergroup     116953 2022-03-11 08:16 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop/job_1646957562104_0001_conf.xml

3.创建文件夹

创建hdfs文件夹，使用-mkdir命令，命令语法及帮助如下：

hdfs dfs -mkdir [-p] ...

[hadoop@node1 ~]$ hdfs dfs -help mkdir
-mkdir [-p]  ... :
  Create a directory in specified location.
                                                  
  -p  Do not fail if the directory already exists

案例

# 在已存在的目录下，创建文件夹
[hadoop@node1 ~]$ hdfs dfs -mkdir /test

# 查看是否创建成功
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 4 items
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:14 /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:16 /output
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:11 /test
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp

# 在不存在的目录下，创建文件夹，需要加-p
[hadoop@node1 ~]$ hdfs dfs -mkdir /a/b
mkdir: `/a/b': No such file or directory
[hadoop@node1 ~]$ hdfs dfs -mkdir -p /a/b
[hadoop@node1 ~]$ hdfs dfs -ls -R /a
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:15 /a/b

4.上传命令

从本地上传文件到hdfs，可以使用如下-put命令或者-copyFromLocal命令

hdfs dfs -put [-f] [-p] [-l] ...

hdfs dfs -copyFromLocal [-f] [-p] [-l] ...

# -put命令帮助
[hadoop@node1 ~]$ hdfs dfs -help put
-put [-f] [-p] [-l]  ...  :
  Copy files from the local file system into fs. Copying fails if the file already
  exists, unless the -f flag is given.
  Flags:
                                                                       
  -p  Preserves access and modification times, ownership and the mode. 
  -f  Overwrites the destination if it already exists.                 
  -l  Allow DataNode to lazily persist the file to disk. Forces        
         replication factor of 1. This flag will result in reduced
         durability. Use with care.

 -p：保留访问和修改时间、所有权和权限。（假定权限可以通过文件系统传播）
 -f:如果目标文件已经存在则覆盖
 -l: 允许datanode延迟持久化文件到磁盘，强制复制因子为1,。这个标志将阀值复制因子持久性降低。小心使用。


# -copyFromLocal命令帮助
[hadoop@node1 ~]$ hdfs dfs -help copyFromLocal
-copyFromLocal [-f] [-p] [-l]  ...  :
  Identical to the -put command.

案例

[hadoop@node1 ~]$ cat 1.txt 
hello world
hello hadoop
[hadoop@node1 ~]$ hdfs dfs -put 1.txt /
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 6 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-14 09:23 /1.txt
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:15 /a
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:14 /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:16 /output
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:11 /test
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp



[hadoop@node1 ~]$ hdfs dfs -copyFromLocal 1.txt /a
[hadoop@node1 ~]$ hdfs dfs -ls /a
Found 2 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-14 09:35 /a/1.txt
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:15 /a/b

5.查看文件内容

查看文件内容，用-cat或者-text命令

hdfs dfs -cat [-ignoreCrc] ...

hdfs dfs -text [-ignoreCrc] ...

[hadoop@node1 ~]$ hdfs dfs -help cat
-cat [-ignoreCrc]  ... :
  Fetch all files that match the file pattern  and display their content on
  stdout.


[hadoop@node1 ~]$ hdfs dfs -help text
-text [-ignoreCrc]  ... :
  Takes a source file and outputs the file in text format.
  The allowed formats are zip and TextRecordInputStream and Avro.

案例

# 用cat命令查看文件内容
[hadoop@node1 ~]$ hdfs dfs -cat /1.txt
hello world
hello hadoop

[hadoop@node1 ~]$ hdfs dfs -cat /output/part-r-00000
hadoop  1
hello   2
world   1

# 也可以查多个文件
[hadoop@node1 ~]$ hdfs dfs -cat /1.txt /output/part-r-00000
hello world
hello hadoop
hadoop  1
hello   2
world   1


# 用text命令查看文件内容
[hadoop@node1 ~]$ hdfs dfs -text /1.txt
hello world
hello hadoop

6.下载命令

复制hdfs文件到本地，使用-get或者-copyToLocal命令

hdfs dfs -get [-p] [-ignoreCrc] [-crc] ...

[hadoop@node1 ~]$ hdfs dfs -help get
-get [-p] [-ignoreCrc] [-crc]  ...  :
  Copy files that match the file pattern  to the local name.   is kept. 
  When copying multiple files, the destination must be a directory. Passing -p
  preserves access and modification times, ownership and the mode.
  
  
[hadoop@node1 ~]$ hdfs dfs -help copyToLocal
-copyToLocal [-p] [-ignoreCrc] [-crc]  ...  :
  Identical to the -get command.

案例

# -cat命令
# 省略localdst自动下载到当前目录
[hadoop@node1 ~]$ hdfs dfs -get /output/part-r-00000
[hadoop@node1 ~]$ ls 
1.txt  2.txt  installfile  mywordcount-1.0-SNAPSHOT.jar  part-r-00000  soft

# 指定下载后的存放目录
[hadoop@node1 ~]$ hdfs dfs -get /output/part-r-00000 installfile
[hadoop@node1 ~]$ ls installfile/
hadoop-2.7.3.tar.gz  jdk-8u271-linux-x64.tar.gz  part-r-00000

# 重命名下载文件
[hadoop@node1 ~]$ hdfs dfs -get /output/part-r-00000 installfile/newname-part-r-00000
[hadoop@node1 ~]$ ls installfile/
hadoop-2.7.3.tar.gz  jdk-8u271-linux-x64.tar.gz  newname-part-r-00000  part-r-00000


# -copyToLocal命令
[hadoop@node1 ~]$ hdfs dfs -copyToLocal /1.txt 1-copyToLocal.txt
[hadoop@node1 ~]$ ls
1-copyToLocal.txt  1.txt  2.txt  installfile  mywordcount-1.0-SNAPSHOT.jar  part-r-00000  soft
[hadoop@node1 ~]$

7.删除文件命令

使用-rm命令删除指定文件

hdfs dfs -rm [-f] [-r|-R] [-skipTrash] ...

[hadoop@node1 ~]$ hdfs dfs -help rm
-rm [-f] [-r|-R] [-skipTrash]  ... :
  Delete all files that match the specified file pattern. Equivalent to the Unix
  command "rm "
                                                                                 
  -skipTrash  option bypasses trash, if enabled, and immediately deletes    
  -f          If the file does not exist, do not display a diagnostic message or 
              modify the exit status to reflect an error.                        
  -[rR]       Recursively deletes directories

案例

# 普通删除
# 查看
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 6 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-14 09:23 /1.txt
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:35 /a
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:14 /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:16 /output
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:11 /test
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp

# 删除/1.txt
[hadoop@node1 ~]$ hdfs dfs -rm /1.txt
22/03/14 10:12:45 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /1.txt

# 查看/1.txt已经被删除
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 5 items
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:35 /a
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:14 /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:16 /output
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:11 /test
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp


# 递归删除
# 查看/a是一个多层目录
[hadoop@node1 ~]$ hdfs dfs -ls /a
Found 2 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-14 09:35 /a/1.txt
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:15 /a/b

# 递归删除
[hadoop@node1 ~]$ hdfs dfs -rm -r /a
22/03/14 10:14:59 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /a

# 查看是否删除成功
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 4 items
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:14 /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:16 /output
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:11 /test
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp

8.合并hdfs文件到本地

使用getmerge命令合并hdfs文件到Linux本地

hdfs dfs -getmerge [-nl]

[hadoop@node1 ~]$ hdfs dfs -help getmerge 
-getmerge [-nl]   :
  Get all the files in the directories that match the source file pattern and
  merge and sort them to only one file on local fs.  is kept.
                                                        
  -nl  Add a newline character at the end of each file.

案例

# 数据准备
[hadoop@node1 ~]$ cat 1.txt 
hello world
hello hadoop

[hadoop@node1 ~]$ cat 2.txt 
hi hadoop
hadoop is funny
[hadoop@node1 ~]$ hdfs dfs -put 1.txt 2.txt /
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 6 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-14 10:27 /1.txt
-rw-r--r--   1 hadoop supergroup         26 2022-03-14 10:27 /2.txt
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:14 /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:16 /output
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 09:11 /test
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp

# getmerge合并文件到本地
# 合并多个文件
[hadoop@node1 ~]$ hdfs dfs -getmerge /1.txt /2.txt getmerge.txt
[hadoop@node1 ~]$ ls
1-copyToLocal.txt  2.txt         installfile                   part-r-00000
1.txt              getmerge.txt  mywordcount-1.0-SNAPSHOT.jar  soft
[hadoop@node1 ~]$ cat getmerge.txt 
hello world
hello hadoop
hi hadoop
hadoop is funny

# 合并目录下所有文件
[hadoop@node1 ~]$ hdfs dfs -put 1.txt 2.txt /test
[hadoop@node1 ~]$ hdfs dfs -getmerge /test getmergedir.txt
[hadoop@node1 ~]$ ls
1-copyToLocal.txt  2.txt            getmerge.txt  mywordcount-1.0-SNAPSHOT.jar  soft
1.txt              getmergedir.txt  installfile   part-r-00000
# 查看合并到本地的文件内容
[hadoop@node1 ~]$ cat getmergedir.txt 
hello world
hello hadoop
hi hadoop
hadoop is funny

9.移动命令

使用-mv命令，将hdfs一个目录下的文件移动到hdfs的另一个目录（hdfs内部的文件移动）

hdfs dfs -mv ...

[hadoop@node1 ~]$ hdfs dfs -help mv
-mv  ...  :
  Move files that match the specified file pattern  to a destination . 
  When moving multiple files, the destination must be a directory.

案例

# 查看/目录下，有2.txt文件
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 6 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-14 10:27 /1.txt
-rw-r--r--   1 hadoop supergroup         26 2022-03-14 10:27 /2.txt
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:14 /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:16 /output
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 10:32 /test
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp

# /input目录没有2.txt文件
[hadoop@node1 ~]$ hdfs dfs -ls /input
Found 1 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-11 08:13 /input/1.txt

# 将/2.txt文件移动到/input目录
[hadoop@node1 ~]$ hdfs dfs -mv /2.txt /input

# 查看/目录，没有了2.txt
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 5 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-14 10:27 /1.txt
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 10:39 /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:16 /output
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 10:32 /test
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp

# 查看/input 含有2.txt，说明文件移动成功
[hadoop@node1 ~]$ hdfs dfs -ls /input
Found 2 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-11 08:13 /input/1.txt
-rw-r--r--   1 hadoop supergroup         26 2022-03-14 10:27 /input/2.txt

10.复制命令

使用-cp命令，将hdfs一个文件复制到hdfs另一个目录

hdfs dfs -cp [-f] [-p | -p[topax]] ...

[hadoop@node1 ~]$ hdfs dfs -help cp
-cp [-f] [-p | -p[topax]]  ...  :
  Copy files that match the file pattern  to a destination.  When copying
  multiple files, the destination must be a directory. Passing -p preserves status
  [topax] (timestamps, ownership, permission, ACLs, XAttr). If -p is specified
  with no , then preserves timestamps, ownership, permission. If -pa is
  specified, then preserves permission also because ACL is a super-set of
  permission. Passing -f overwrites the destination if it already exists. raw
  namespace extended attributes are preserved if (1) they are supported (HDFS
  only) and, (2) all of the source and target pathnames are in the /.reserved/raw
  hierarchy. raw namespace xattr preservation is determined solely by the presence
  (or absence) of the /.reserved/raw prefix and not by the -p option.

案例

# 查看hdfs /目录下没有2.txt
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 5 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-14 10:27 /1.txt
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 10:39 /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:16 /output
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 10:32 /test
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp
# 查看/test目录下有2.txt
[hadoop@node1 ~]$ hdfs dfs -ls /test
Found 2 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-14 10:32 /test/1.txt
-rw-r--r--   1 hadoop supergroup         26 2022-03-14 10:32 /test/2.txt
# 将/test/2.txt复制到/目录下
[hadoop@node1 ~]$ hdfs dfs -cp /test/2.txt /
# 查看hdfs /目录下有2.txt
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 6 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-14 10:27 /1.txt
-rw-r--r--   1 hadoop supergroup         26 2022-03-14 10:48 /2.txt
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 10:39 /input
drwxr-xr-x   - hadoop supergroup          0 2022-03-11 08:16 /output
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 10:32 /test
drwx------   - hadoop supergroup          0 2022-03-11 08:15 /tmp
# 查看hdfs /test还有2.txt
[hadoop@node1 ~]$ hdfs dfs -ls /test
Found 2 items
-rw-r--r--   1 hadoop supergroup         25 2022-03-14 10:32 /test/1.txt
-rw-r--r--   1 hadoop supergroup         26 2022-03-14 10:32 /test/2.txt

完成！enjoy it!

HDFS Shell命令练习

大数据系统相关栏目本月热门文章