拥有hadoop2环境,可参考:CentOS7安装hadoop2.7.3伪分布式
步骤HDFS常用的操作命令是dfs命令。
查看hdfs dfs所有命令
[hadoop@node1 ~]$ hdfs dfs
Usage: hadoop fs [generic options]
[-appendToFile ... ]
[-cat [-ignoreCrc] ...]
[-checksum ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] ... ]
[-copyToLocal [-p] [-ignoreCrc] [-crc] ... ]
[-count [-q] [-h] ...]
[-cp [-f] [-p | -p[topax]] ... ]
[-createSnapshot []]
[-deleteSnapshot ]
[-df [-h] [ ...]]
[-du [-s] [-h] ...]
[-expunge]
[-find ... ...]
[-get [-p] [-ignoreCrc] [-crc] ... ]
[-getfacl [-R] ]
[-getfattr [-R] {-n name | -d} [-e en] ]
[-getmerge [-nl] ]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [ ...]]
[-mkdir [-p] ...]
[-moveFromLocal ... ]
[-moveToLocal ]
[-mv ... ]
[-put [-f] [-p] [-l] ... ]
[-renameSnapshot ]
[-rm [-f] [-r|-R] [-skipTrash] ...]
[-rmdir [--ignore-fail-on-non-empty] ...]
[-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set ]]
[-setfattr {-n name [-v value] | -x name} ]
[-setrep [-R] [-w] ...]
[-stat [format] ...]
[-tail [-f] ]
[-test -[defsz] ]
[-text [-ignoreCrc] ...]
[-touchz ...]
[-truncate [-w] ...]
[-usage [cmd ...]]
Generic options supported are
-conf specify an application configuration file
-D use value for given property
-fs specify a namenode
-jt specify a ResourceManager
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
启动hdfs
start-dfs.sh1.帮助命令
查看hdfs dfs命令帮助
hdfs dfs -help [cmd ...]
案例
# 查看-ls命令帮助 [hadoop@node1 ~]$ hdfs dfs -help ls -ls [-d] [-h] [-R] [2.查看目录内容命令...] : List the contents that match the specified file pattern. If path is not specified, the contents of /user/ will be listed. Directory entries are of the form: permissions - userId groupId sizeOfDirectory(in bytes) modificationDate(yyyy-MM-dd HH:mm) directoryName and file entries are of the form: permissions numberOfReplicas userId groupId sizeOfFile(in bytes) modificationDate(yyyy-MM-dd HH:mm) fileName -d Directories are listed as plain files. -h Formats the sizes of files in a human-readable fashion rather than a number of bytes. -R Recursively list the contents of directories.
hdfs dfs -ls [-d] [-h] [-R] [
列出path目录下的内容,包括文件名、权限、所有者、大小、和修改时间
[hadoop@node1 ~]$ hdfs dfs -ls / Found 3 items drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:14 /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:16 /output drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp 注意:这里ls 查看hdfs根目录查看到的目录和文件,这里示例查到了3个目录,如果之前没有使用过hdfs,应该查不到任何文件和目录。 # 可以查看多个路径,这里的/input /output根据hdfs实际情况来的,如果没有这些目录,可以先创建一些目录和上传一些文件到hdfs [hadoop@node1 ~]$ hdfs dfs -ls /input /output Found 1 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-11 08:13 /input/1.txt Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2022-03-11 08:16 /output/_SUCCESS -rw-r--r-- 1 hadoop supergroup 25 2022-03-11 08:16 /output/part-r-00000 # 把目录当成普通文件 [hadoop@node1 ~]$ hdfs dfs -ls -d / drwxr-xr-x - hadoop supergroup 0 2022-03-11 18:52 / [hadoop@node1 ~]$ hdfs dfs -ls -d /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:14 /input # 递归查询 [hadoop@node1 ~]$ hdfs dfs -ls -R /tmp drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp/hadoop-yarn drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp/hadoop-yarn/staging drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp/hadoop-yarn/staging/hadoop drwx------ - hadoop supergroup 0 2022-03-11 08:16 /tmp/hadoop-yarn/staging/hadoop/.staging drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:15 /tmp/hadoop-yarn/staging/history drwxrwxrwt - hadoop supergroup 0 2022-03-11 08:15 /tmp/hadoop-yarn/staging/history/done_intermediate drwxrwx--- - hadoop supergroup 0 2022-03-11 08:16 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop -rwxrwx--- 1 hadoop supergroup 33282 2022-03-11 08:16 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop/job_1646957562104_0001-1646957751727-hadoop-mywordcount%2D1.0%2DSNAPSHOT.jar-1646957769948-1-1-SUCCEEDED-default-1646957758708.jhist -rwxrwx--- 1 hadoop supergroup 367 2022-03-11 08:16 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop/job_1646957562104_0001.summary -rwxrwx--- 1 hadoop supergroup 116953 2022-03-11 08:16 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop/job_1646957562104_0001_conf.xml 3.创建文件夹
创建hdfs文件夹,使用-mkdir命令,命令语法及帮助如下:
hdfs dfs -mkdir [-p]
[hadoop@node1 ~]$ hdfs dfs -help mkdir -mkdir [-p]... : Create a directory in specified location. -p Do not fail if the directory already exists
案例
# 在已存在的目录下,创建文件夹 [hadoop@node1 ~]$ hdfs dfs -mkdir /test # 查看是否创建成功 [hadoop@node1 ~]$ hdfs dfs -ls / Found 4 items drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:14 /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:16 /output drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:11 /test drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp # 在不存在的目录下,创建文件夹,需要加-p [hadoop@node1 ~]$ hdfs dfs -mkdir /a/b mkdir: `/a/b': No such file or directory [hadoop@node1 ~]$ hdfs dfs -mkdir -p /a/b [hadoop@node1 ~]$ hdfs dfs -ls -R /a drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:15 /a/b 4.上传命令
从本地上传文件到hdfs,可以使用如下-put命令或者-copyFromLocal命令
hdfs dfs -put [-f] [-p] [-l]
hdfs dfs -copyFromLocal [-f] [-p] [-l]
# -put命令帮助 [hadoop@node1 ~]$ hdfs dfs -help put -put [-f] [-p] [-l]... : Copy files from the local file system into fs. Copying fails if the file already exists, unless the -f flag is given. Flags: -p Preserves access and modification times, ownership and the mode. -f Overwrites the destination if it already exists. -l Allow DataNode to lazily persist the file to disk. Forces replication factor of 1. This flag will result in reduced durability. Use with care. -p:保留访问和修改时间、所有权和权限。(假定权限可以通过文件系统传播) -f:如果目标文件已经存在则覆盖 -l: 允许datanode延迟持久化文件到磁盘,强制复制因子为1,。这个标志将阀值复制因子持久性降低。小心使用。 # -copyFromLocal命令帮助 [hadoop@node1 ~]$ hdfs dfs -help copyFromLocal -copyFromLocal [-f] [-p] [-l] ... : Identical to the -put command.
案例
[hadoop@node1 ~]$ cat 1.txt hello world hello hadoop [hadoop@node1 ~]$ hdfs dfs -put 1.txt / [hadoop@node1 ~]$ hdfs dfs -ls / Found 6 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-14 09:23 /1.txt drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:15 /a drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:14 /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:16 /output drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:11 /test drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp [hadoop@node1 ~]$ hdfs dfs -copyFromLocal 1.txt /a [hadoop@node1 ~]$ hdfs dfs -ls /a Found 2 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-14 09:35 /a/1.txt drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:15 /a/b 5.查看文件内容
查看文件内容,用-cat或者-text命令
hdfs dfs -cat [-ignoreCrc]
hdfs dfs -text [-ignoreCrc]
[hadoop@node1 ~]$ hdfs dfs -help cat -cat [-ignoreCrc]... : Fetch all files that match the file pattern and display their content on stdout. [hadoop@node1 ~]$ hdfs dfs -help text -text [-ignoreCrc] ... : Takes a source file and outputs the file in text format. The allowed formats are zip and TextRecordInputStream and Avro.
案例
# 用cat命令查看文件内容 [hadoop@node1 ~]$ hdfs dfs -cat /1.txt hello world hello hadoop [hadoop@node1 ~]$ hdfs dfs -cat /output/part-r-00000 hadoop 1 hello 2 world 1 # 也可以查多个文件 [hadoop@node1 ~]$ hdfs dfs -cat /1.txt /output/part-r-00000 hello world hello hadoop hadoop 1 hello 2 world 1 # 用text命令查看文件内容 [hadoop@node1 ~]$ hdfs dfs -text /1.txt hello world hello hadoop 6.下载命令
复制hdfs文件到本地,使用-get或者-copyToLocal命令
hdfs dfs -get [-p] [-ignoreCrc] [-crc]
[hadoop@node1 ~]$ hdfs dfs -help get -get [-p] [-ignoreCrc] [-crc]... : Copy files that match the file pattern to the local name. is kept. When copying multiple files, the destination must be a directory. Passing -p preserves access and modification times, ownership and the mode. [hadoop@node1 ~]$ hdfs dfs -help copyToLocal -copyToLocal [-p] [-ignoreCrc] [-crc] ... : Identical to the -get command.
案例
# -cat命令 # 省略localdst自动下载到当前目录 [hadoop@node1 ~]$ hdfs dfs -get /output/part-r-00000 [hadoop@node1 ~]$ ls 1.txt 2.txt installfile mywordcount-1.0-SNAPSHOT.jar part-r-00000 soft # 指定下载后的存放目录 [hadoop@node1 ~]$ hdfs dfs -get /output/part-r-00000 installfile [hadoop@node1 ~]$ ls installfile/ hadoop-2.7.3.tar.gz jdk-8u271-linux-x64.tar.gz part-r-00000 # 重命名下载文件 [hadoop@node1 ~]$ hdfs dfs -get /output/part-r-00000 installfile/newname-part-r-00000 [hadoop@node1 ~]$ ls installfile/ hadoop-2.7.3.tar.gz jdk-8u271-linux-x64.tar.gz newname-part-r-00000 part-r-00000 # -copyToLocal命令 [hadoop@node1 ~]$ hdfs dfs -copyToLocal /1.txt 1-copyToLocal.txt [hadoop@node1 ~]$ ls 1-copyToLocal.txt 1.txt 2.txt installfile mywordcount-1.0-SNAPSHOT.jar part-r-00000 soft [hadoop@node1 ~]$7.删除文件命令
使用-rm命令删除指定文件
hdfs dfs -rm [-f] [-r|-R] [-skipTrash]
[hadoop@node1 ~]$ hdfs dfs -help rm -rm [-f] [-r|-R] [-skipTrash]... : Delete all files that match the specified file pattern. Equivalent to the Unix command "rm " -skipTrash option bypasses trash, if enabled, and immediately deletes -f If the file does not exist, do not display a diagnostic message or modify the exit status to reflect an error. -[rR] Recursively deletes directories
案例
# 普通删除 # 查看 [hadoop@node1 ~]$ hdfs dfs -ls / Found 6 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-14 09:23 /1.txt drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:35 /a drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:14 /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:16 /output drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:11 /test drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp # 删除/1.txt [hadoop@node1 ~]$ hdfs dfs -rm /1.txt 22/03/14 10:12:45 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /1.txt # 查看/1.txt已经被删除 [hadoop@node1 ~]$ hdfs dfs -ls / Found 5 items drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:35 /a drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:14 /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:16 /output drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:11 /test drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp # 递归删除 # 查看/a是一个多层目录 [hadoop@node1 ~]$ hdfs dfs -ls /a Found 2 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-14 09:35 /a/1.txt drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:15 /a/b # 递归删除 [hadoop@node1 ~]$ hdfs dfs -rm -r /a 22/03/14 10:14:59 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /a # 查看是否删除成功 [hadoop@node1 ~]$ hdfs dfs -ls / Found 4 items drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:14 /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:16 /output drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:11 /test drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp 8.合并hdfs文件到本地
使用getmerge命令合并hdfs文件到Linux本地
hdfs dfs -getmerge [-nl]
[hadoop@node1 ~]$ hdfs dfs -help getmerge -getmerge [-nl]: Get all the files in the directories that match the source file pattern and merge and sort them to only one file on local fs. is kept. -nl Add a newline character at the end of each file.
案例
# 数据准备 [hadoop@node1 ~]$ cat 1.txt hello world hello hadoop [hadoop@node1 ~]$ cat 2.txt hi hadoop hadoop is funny [hadoop@node1 ~]$ hdfs dfs -put 1.txt 2.txt / [hadoop@node1 ~]$ hdfs dfs -ls / Found 6 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-14 10:27 /1.txt -rw-r--r-- 1 hadoop supergroup 26 2022-03-14 10:27 /2.txt drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:14 /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:16 /output drwxr-xr-x - hadoop supergroup 0 2022-03-14 09:11 /test drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp # getmerge合并文件到本地 # 合并多个文件 [hadoop@node1 ~]$ hdfs dfs -getmerge /1.txt /2.txt getmerge.txt [hadoop@node1 ~]$ ls 1-copyToLocal.txt 2.txt installfile part-r-00000 1.txt getmerge.txt mywordcount-1.0-SNAPSHOT.jar soft [hadoop@node1 ~]$ cat getmerge.txt hello world hello hadoop hi hadoop hadoop is funny # 合并目录下所有文件 [hadoop@node1 ~]$ hdfs dfs -put 1.txt 2.txt /test [hadoop@node1 ~]$ hdfs dfs -getmerge /test getmergedir.txt [hadoop@node1 ~]$ ls 1-copyToLocal.txt 2.txt getmerge.txt mywordcount-1.0-SNAPSHOT.jar soft 1.txt getmergedir.txt installfile part-r-00000 # 查看合并到本地的文件内容 [hadoop@node1 ~]$ cat getmergedir.txt hello world hello hadoop hi hadoop hadoop is funny 9.移动命令
使用-mv命令,将hdfs一个目录下的文件移动到hdfs的另一个目录(hdfs内部的文件移动)
hdfs dfs -mv
[hadoop@node1 ~]$ hdfs dfs -help mv -mv... : Move files that match the specified file pattern to a destination . When moving multiple files, the destination must be a directory.
案例
# 查看/目录下,有2.txt文件 [hadoop@node1 ~]$ hdfs dfs -ls / Found 6 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-14 10:27 /1.txt -rw-r--r-- 1 hadoop supergroup 26 2022-03-14 10:27 /2.txt drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:14 /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:16 /output drwxr-xr-x - hadoop supergroup 0 2022-03-14 10:32 /test drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp # /input目录没有2.txt文件 [hadoop@node1 ~]$ hdfs dfs -ls /input Found 1 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-11 08:13 /input/1.txt # 将/2.txt文件移动到/input目录 [hadoop@node1 ~]$ hdfs dfs -mv /2.txt /input # 查看/目录,没有了2.txt [hadoop@node1 ~]$ hdfs dfs -ls / Found 5 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-14 10:27 /1.txt drwxr-xr-x - hadoop supergroup 0 2022-03-14 10:39 /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:16 /output drwxr-xr-x - hadoop supergroup 0 2022-03-14 10:32 /test drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp # 查看/input 含有2.txt,说明文件移动成功 [hadoop@node1 ~]$ hdfs dfs -ls /input Found 2 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-11 08:13 /input/1.txt -rw-r--r-- 1 hadoop supergroup 26 2022-03-14 10:27 /input/2.txt 10.复制命令
使用-cp命令,将hdfs一个文件复制到hdfs另一个目录
hdfs dfs -cp [-f] [-p | -p[topax]]
[hadoop@node1 ~]$ hdfs dfs -help cp -cp [-f] [-p | -p[topax]]... : Copy files that match the file pattern to a destination. When copying multiple files, the destination must be a directory. Passing -p preserves status [topax] (timestamps, ownership, permission, ACLs, XAttr). If -p is specified with no , then preserves timestamps, ownership, permission. If -pa is specified, then preserves permission also because ACL is a super-set of permission. Passing -f overwrites the destination if it already exists. raw namespace extended attributes are preserved if (1) they are supported (HDFS only) and, (2) all of the source and target pathnames are in the /.reserved/raw hierarchy. raw namespace xattr preservation is determined solely by the presence (or absence) of the /.reserved/raw prefix and not by the -p option.
案例
# 查看hdfs /目录下没有2.txt [hadoop@node1 ~]$ hdfs dfs -ls / Found 5 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-14 10:27 /1.txt drwxr-xr-x - hadoop supergroup 0 2022-03-14 10:39 /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:16 /output drwxr-xr-x - hadoop supergroup 0 2022-03-14 10:32 /test drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp # 查看/test目录下有2.txt [hadoop@node1 ~]$ hdfs dfs -ls /test Found 2 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-14 10:32 /test/1.txt -rw-r--r-- 1 hadoop supergroup 26 2022-03-14 10:32 /test/2.txt # 将/test/2.txt复制到/目录下 [hadoop@node1 ~]$ hdfs dfs -cp /test/2.txt / # 查看hdfs /目录下有2.txt [hadoop@node1 ~]$ hdfs dfs -ls / Found 6 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-14 10:27 /1.txt -rw-r--r-- 1 hadoop supergroup 26 2022-03-14 10:48 /2.txt drwxr-xr-x - hadoop supergroup 0 2022-03-14 10:39 /input drwxr-xr-x - hadoop supergroup 0 2022-03-11 08:16 /output drwxr-xr-x - hadoop supergroup 0 2022-03-14 10:32 /test drwx------ - hadoop supergroup 0 2022-03-11 08:15 /tmp # 查看hdfs /test还有2.txt [hadoop@node1 ~]$ hdfs dfs -ls /test Found 2 items -rw-r--r-- 1 hadoop supergroup 25 2022-03-14 10:32 /test/1.txt -rw-r--r-- 1 hadoop supergroup 26 2022-03-14 10:32 /test/2.txt
完成!enjoy it!



