栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Hadoop之HDFS

Hadoop之HDFS

1、初识HDFS

HDFS:Hadoop Distributed File System
场景:一次写入多次读出,其本身不支持修改,适合用来做数据分析,并不适合做网盘
优点:高容错行、适合大数据处理、构建在廉价机上
缺点:不适合低延时访问、小文件存储不高效、不支持并发写入和文件随机修改

1.1、HDFS组成架构
  • NameNode:管理数据块映射、配置副本策略、处理客户端读写请求;
  • DataNode:存储实际的数据块、执行数据块的读/写操作;
  • Client:对上传HDFS的文件切块、从NameNode获取文件位置信息、从DataNode读写数据、通过命令管理(NameNode格式化)和访问HDFS(增删改查);
  • Secondary NameNode:不是NameNode的热备,而是为其分担工作(定期合并Fsimage和Edits),可辅助恢复NameNode;
1.2、HDFS文件块大小

配置文件hdfs-default.xml中dfs.blocksize来配置,设置方案:BlockSize = Block寻址时间 * Block传输速率,这样寻址和传输可以无缝连接(PS:HDFS不允许多线程写) 。
思考:Block太大时磁盘传输时间大于寻址时间,Block太小时寻址时间大于磁盘传输时间,这两种情况都会导致HDFS程序的等待。


  dfs.blocksize
  134217728
  
      The default block size for new files, in bytes.
      You can use the following suffix (case insensitive):
      k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.),
      Or provide complete size in bytes (such as 134217728 for 128 MB).
  

1.3、HDFS的shell操作

效果相同的命令:hadoop fs 具体命令 OR hdfs dfs 具体命令

# 查看支持的命令
[atguigu@hadoop102 ~]$ hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile  ... ]
	[-cat [-ignoreCrc]  ...]
	[-checksum  ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R]  PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] [-d] [-t ]  ... ]
	[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc]  ... ]
	[-count [-q] [-h] [-v] [-t []] [-u] [-x] [-e]  ...]
	[-cp [-f] [-p | -p[topax]] [-d]  ... ]
	[-createSnapshot  []]
	[-deleteSnapshot  ]
	[-df [-h] [ ...]]
	[-du [-s] [-h] [-v] [-x]  ...]
	[-expunge]
	[-find  ...  ...]
	[-get [-f] [-p] [-ignoreCrc] [-crc]  ... ]
	[-getfacl [-R] ]
	[-getfattr [-R] {-n name | -d} [-e en] ]
	[-getmerge [-nl] [-skip-empty-file]  ]
	[-head ]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [ ...]]
	[-mkdir [-p]  ...]
	[-moveFromLocal  ... ]
	[-moveToLocal  ]
	[-mv  ... ]
	[-put [-f] [-p] [-l] [-d]  ... ]
	[-renameSnapshot   ]
	[-rm [-f] [-r|-R] [-skipTrash] [-safely]  ...]
	[-rmdir [--ignore-fail-on-non-empty]  ...]
	[-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set  ]]
	[-setfattr {-n name [-v value] | -x name} ]
	[-setrep [-R] [-w]   ...]
	[-stat [format]  ...]
	[-tail [-f] [-s ] ]
	[-test -[defsz] ]
	[-text [-ignoreCrc]  ...]
	[-touch [-a] [-m] [-t TIMESTAMP ] [-c]  ...]
	[-touchz  ...]
	[-truncate [-w]   ...]
	[-usage [cmd ...]]

Generic options supported are:
-conf         specify an application configuration file
-D                define a value for a given property
-fs  specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt   specify a ResourceManager
-files                 specify a comma-separated list of files to be copied to the map reduce cluster
-libjars                specify a comma-separated list of jar files to be included in the classpath
-archives           specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]
查看命令的帮助信息
[atguigu@hadoop102 ~]$ hadoop fs -help rm
-rm [-f] [-r|-R] [-skipTrash] [-safely]  ... :
  Delete all files that match the specified file pattern. Equivalent to the Unix
  command "rm "
                                                                                 
  -f          If the file does not exist, do not display a diagnostic message or 
              modify the exit status to reflect an error.                        
  -[rR]       Recursively deletes directories.                                   
  -skipTrash  option bypasses trash, if enabled, and immediately deletes .  
  -safely     option requires safety /confirm/iation, if enabled, requires          
              confirmation before deleting large directory with more than        
               files. Delay is expected when
              walking over large directory recursively to count the number of    
              files to be deleted before the /confirm/iation. 
HDFS与本地进行文件交互
# 剪切本地文件到HDFS
[atguigu@hadoop102 mydata]$ hadoop fs -moveFromLocal ./kongming.txt /sanguo/shuguo
# 复制本地文件到HDFS:copyFromLocal = put 
[atguigu@hadoop102 mydata]$ hadoop fs -copyFromLocal liubei.txt /sanguo/shuguo
[atguigu@hadoop102 mydata]$ hadoop fs -put liubei.txt /sanguo/shuguo
# 追加本地文件内容到HDFS文件,HDFS文件不存在则创建
[atguigu@hadoop102 mydata]$ hadoop fs -appendToFile liubei.txt /sanguo/shuguo/kongming.txt
# 从HDFS拷贝文件到本地:copyToLocal = get
[atguigu@hadoop102 mydata]$ hadoop fs -copyToLocal /sanguo/shuguo/kongming.txt /opt/module/hadoop-3.1.3/mydata/
[atguigu@hadoop102 mydata]$ hadoop fs -get /sanguo/shuguo/kongming.txt /opt/module/hadoop-3.1.3/
# 合并下载HDFS文件到本地一个文件中
[atguigu@hadoop102 mydata]$ hadoop fs -getmerge /sanguo/shuguo/* /opt/module/hadoop-3.1.3/mydata/text.txt
直接操作HDFS
# ls、mkdir、cat、chgrp、chmod、chown、cp、mv、rm、rmdir同Linux文件系统一样
[atguigu@hadoop102 mydata]$ hadoop fs -ls /
Found 4 items
drwxr-xr-x   - atguigu supergroup          0 2021-12-12 22:09 /input
drwxr-xr-x   - atguigu supergroup          0 2021-12-12 22:10 /output
drwxr-xr-x   - atguigu supergroup          0 2021-12-14 21:21 /sanguo
drwxrwx---   - atguigu supergroup          0 2021-12-12 22:09 /tmp
# tail显示文件末尾的1kb数据
[atguigu@hadoop102 mydata]$ hadoop fs -tail /sanguo/shuguo/kongming.txt
# 统计文件夹大小信息
[atguigu@hadoop102 mydata]$ hadoop fs -du /sanguo/shuguo
35  105  /sanguo/shuguo/kongming.txt
13  39   /sanguo/shuguo/liubei.txt
[atguigu@hadoop102 mydata]$ hadoop fs -du -s /sanguo/shuguo
48  144  /sanguo/shuguo
# 设置文件的副本数量:真正的副本数不会多于节点数
[atguigu@hadoop102 mydata]$ hadoop fs -setrep 10 /sanguo/shuguo/kongming.txt
2、WIN10安装配置Hadoop

下载winutils 3.1.0,Windows安装hadoop需要这部分文件,配置好环境变量:

2.1、创建Maven工程 pom.xml导入相关的依赖坐标

    
    	
        junit
        junit
        4.12
    
    
   		
        org.apache.logging.log4j
        log4j-slf4j-impl
        2.12.0
    
    
    	
        org.apache.hadoop
        hadoop-client
        3.1.3
    

src/main/resources目录下创建log4j2.xml

Log4j2详解——XML配置示例(带详细注释)
Log4j2配置文件说明



    
        
        
            
            
        
    

    
        
        
            
        

        
        
            
        
    

第一个Hadoop程序
package com.atguigu.hdfs;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.junit.Test;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

public class HdfsClient {
    @Test
    public void testMkdirs() throws IOException, InterruptedException, URISyntaxException {

        // 1 获取文件系统
        Configuration configuration = new Configuration();
        // 配置在集群上运行
        // configuration.set("fs.defaultFS", "hdfs://hadoop102:9820");
        // FileSystem fs = FileSystem.get(configuration);
		// HDFS的访问路径 配置对象 操作用户
        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9820"), configuration, "atguigu");

        // 2 创建目录
        fs.mkdirs(new Path("/1108/daxian/banzhang"));

        // 3 关闭资源
        fs.close();
    }
}
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/662842.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号