栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

网络文件系统——Hadoop的分布式搭建

网络文件系统——Hadoop的分布式搭建

一、hadoop简介 Hadoop起源于 Google的三大论文:

GFS:Google的分布式文件系统Google File System
MapReduce:Google的MapReduce开源分布式并行计算框架
BigTable:一个大型的分布式数据库

演变关系:

GFS—>HDFS
Google MapReduce—-->Hadoop MapReduce
BigTable—->Hbase
Hadoop名字不是一个缩写,是Hadpop之父Doug Cutting儿子毛绒玩具象命名的。

hadoop主流版本:

Apache基金会hadoop

Cloudera版本(Cloudera's Distribution Including Apache Hadoop,简称“CDH”)

Hortonworks版本(Hortonworks Data Platform,简称“HDP”)

Apache hadoopCDHHDP
管理工具手工Cloudera  ManagerAmbari
收费情况开源社区版免费,企业版收费免费

 Hadoop的框架最核心的设计就是:HDFS和MapReduce。 HDFS为海量的数据提供了存储,MapReduce为海量的数据提供了计算。

Hadoop框架包括以下四个模块:

Hadoop Common:这些是其他Hadoop模块所需的Java库和实用程序。这些库提供文件系统和操作系统级抽象,并包含启动Hadoop所需的Java文件和脚本。

Hadoop YARN:这是一个用于作业调度和集群资源管理的框架。

Hadoop Distributed File System (HDFS):分布式文件系统,提供对应用程序数据的高吞吐量访问。Hadoop MapReduce;这是基于YARN的用于并行处理大数据集的系统

Hadoop应用场景:

在线旅游、移动数据、电子商务、能源开采与节能、基础架构管理、图像处理、诈骗检测、IT安全、医疗保健

官网地址:

https://hadoop.apache.org/docs/

二、Hadoop文件系统的搭建(单机)
[hadoop@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz 
[hadoop@server1 ~]$ ls
hadoop-3.2.1.tar.gz  jdk1.8.0_181  jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ln -s jdk1.8.0_181/ java
[hadoop@server1 ~]$ tar zxf hadoop-3.2.1.tar.gz 
[hadoop@server1 ~]$ ln -s hadoop-3.2.1 hadoop
[hadoop@server1 ~]$ ll
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ ls
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share
[hadoop@server1 hadoop]$ cd etc/
[hadoop@server1 etc]$ ls
hadoop
[hadoop@server1 etc]$ cd hadoop/

 

[hadoop@server1 hadoop]$ vim hadoop-env.sh
export JAVA_HOME=/home/hadoop/java
export HADOOP_HOME=/home/hadoop/hadoop

[hadoop@server1 ~]$ cd hadoop/
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ mkdir input
[hadoop@server1 hadoop]$ ls
bin  etc  include  input  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share
[hadoop@server1 hadoop]$ cp etc/hadoop/*.xml input
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'

[hadoop@server1 hadoop]$ ls
bin  etc  include  input  lib  libexec  LICENSE.txt  NOTICE.txt  output  README.txt  sbin  share
[hadoop@server1 hadoop]$ cd output/
[hadoop@server1 output]$ ls
part-r-00000  _SUCCESS
[hadoop@server1 output]$ cat *
1	dfsadmin
[hadoop@server1 output]$ 

 

[hadoop@server1 hadoop]$ vim core-site.xml 


    
        fs.defaultFS
        hdfs://localhost:9000
    


[hadoop@server1 hadoop]$ vim hdfs-site.xml 


    
        dfs.replication
        1
    

 编辑配置文件:

 

[hadoop@server1 hadoop]$ ssh-keygen 
[hadoop@server1 hadoop]$ cd
[hadoop@server1 ~]$ ls
hadoop  hadoop-3.2.1  hadoop-3.2.1.tar.gz  java  jdk1.8.0_181  jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ cd .ssh/
[hadoop@server1 .ssh]$ ls
id_rsa  id_rsa.pub  known_hosts
[hadoop@server1 .ssh]$ cp id_rsa.pub authorized_keys
[hadoop@server1 .ssh]$ chmod 600 authorized_keys 
[hadoop@server1 .ssh]$ ll
[hadoop@server1 .ssh]$ ssh localhost

[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ ls /tmp/
hadoop  hadoop-hadoop  hadoop-hadoop-namenode.pid  hsperfdata_hadoop
[hadoop@server1 hadoop]$ ls
bin  include  lib      LICENSE.txt  NOTICE.txt  README.txt  share
etc  input    libexec  logs         output      sbin
[hadoop@server1 hadoop]$ sbin/start-dfs.sh  ##开启
[hadoop@server1 hadoop]$ cd 
[hadoop@server1 ~]$ cd java/
[hadoop@server1 java]$ ls
[hadoop@server1 java]$ cd bin/
[hadoop@server1 bin]$ ls

[hadoop@server1 ~]$ vim .bash_profile 
[hadoop@server1 ~]$ source .bash_profile 
[hadoop@server1 ~]$ jps
4118 NameNode
4438 SecondaryNameNode
4232 DataNode
4600 Jps
[hadoop@server1 ~]$ ps ax

 

 在浏览器访问 http://172.25.52.1:9870

查看文件系统的根目录:

查看日志:

[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user  ##建立目录
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ id
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls   ##查看文件系统中hapoop用户家目录下的文件
[hadoop@server1 hadoop]$ bin/hdfs dfs -put input  ##上传文件到文件系统

[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2021-12-28 10:41 input
[hadoop@server1 hadoop]$ 

[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output 

 在前端查看:

 

[hadoop@server1 hadoop]$ bin/hdfs dfs -get output
2021-12-28 10:46:32,661 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
[hadoop@server1 hadoop]$ ls
bin  etc  include  input  lib  libexec  LICENSE.txt  logs  NOTICE.txt  output  README.txt  sbin  share
[hadoop@server1 hadoop]$ cd output/
[hadoop@server1 output]$ ls
part-r-00000  _SUCCESS
[hadoop@server1 output]$ cat part-r-00000  _SUCCESS

[hadoop@server1 output]$ cd ..
[hadoop@server1 hadoop]$ ls
bin  include  lib      LICENSE.txt  NOTICE.txt  README.txt  share
etc  input    libexec  logs         output      sbin
[hadoop@server1 hadoop]$ rm -fr output/
[hadoop@server1 hadoop]$ ls
bin  include  lib      LICENSE.txt  NOTICE.txt  sbin
etc  input    libexec  logs         README.txt  share
[hadoop@server1 hadoop]$ 
 三、Hadoop分布式文件系统的搭建

新建虚拟机server2和server3:

[root@server2 ~]# useradd hadoop
[root@server2 ~]# id hadoop
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
[root@server2 ~]# yum install -y nfs-utils

[root@server3 ~]# useradd hadoop
[root@server3 ~]# id hadoop
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
[root@server3 ~]# yum install -y nfs-utils 

 在server1中:

[hadoop@server1 hadoop]$ jps
4118 NameNode
4438 SecondaryNameNode
4232 DataNode
15337 Jps
[hadoop@server1 hadoop]$ ls
bin  etc  include  input  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh   ##停止
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [server1]
[hadoop@server1 hadoop]$ exit
logout
[root@server1 ~]# yum install -y nfs-utils
[root@server1 ~]# vim /etc/exports

/home/hadoop    *(rw,anonuid=1000,anongid=1000)

[root@server1 ~]# id hadoop
uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)
[root@server1 ~]# systemctl start nfs
[root@server1 ~]# showmount -e
Export list for server1:
/home/hadoop *
[root@server1 ~]# 

在server2和3中 :

[root@server2 ~]# showmount -e 172.25.52.1
Export list for 172.25.52.1:
/home/hadoop *
[root@server2 ~]# mount 172.25.52.1:/home/hadoop/ /home/hadoop/
[root@server2 ~]# df
Filesystem               1K-blocks    Used Available Use% Mounted on
/dev/mapper/rhel-root     17811456 1168600  16642856   7% /
devtmpfs                   1011400       0   1011400   0% /dev
tmpfs                      1023464       0   1023464   0% /dev/shm
tmpfs                      1023464   17036   1006428   2% /run
tmpfs                      1023464       0   1023464   0% /sys/fs/cgroup
/dev/vda1                  1038336  135172    903164  14% /boot
tmpfs                       204696       0    204696   0% /run/user/0
172.25.52.1:/home/hadoop  17811456 3003648  14807808  17% /home/hadoop
[root@server2 ~]# su - hadoop 
[hadoop@server2 ~]$ ls
hadoop  hadoop-3.2.1  hadoop-3.2.1.tar.gz  java  jdk1.8.0_181  jdk-8u181-linux-x64.tar.gz
[hadoop@server2 ~]$ 

[root@server1 ~]# su - hadoop 
Last login: Tue Dec 28 10:18:38 CST 2021 from localhost on pts/1
[hadoop@server1 ~]$ ls
hadoop  hadoop-3.2.1  hadoop-3.2.1.tar.gz  java  jdk1.8.0_181  jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ ssh server1
Last login: Tue Dec 28 11:19:15 2021
[hadoop@server1 ~]$ exit
logout
Connection to server1 closed.
[hadoop@server1 ~]$ ssh server2

[hadoop@server1 ~]$ ssh server3

[hadoop@server1 ~]$ cd hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ vim hdfs-site.xml 
[hadoop@server1 hadoop]$ vim core-site.xml 
[hadoop@server1 hadoop]$ vim workers 
server2
server3
[hadoop@server1 hadoop]$ cd
[hadoop@server1 ~]$ cd /tmp/
[hadoop@server1 tmp]$ ls
hadoop  hadoop-hadoop  hsperfdata_hadoop
[hadoop@server1 tmp]$ rm -fr *
[hadoop@server1 tmp]$ ls
[hadoop@server1 tmp]$ 

 [hadoop@server1 hadoop]$ vim hdfs-site.xml

 [hadoop@server1 hadoop]$ vim core-site.xml

[hadoop@server1 hadoop]$ bin/hdfs namenode -format 

在前端查看分布式网络文件系统是否部署成功:

  文件系统中现在还没有文件:

开启文件系统,建立目录并上传文件:

[hadoop@server1 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [server1]
Starting datanodes
Starting secondary namenodes [server1]
[hadoop@server1 hadoop]$ jps
16480 NameNode
16824 Jps
16703 SecondaryNameNode
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -put input
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2021-12-28 11:41 input
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output 

在前端查看:

 

 在新增加一个节点:

[root@server4 ~]# useradd hadoop
[root@server4 ~]# yum install -y nfs-utils
[root@server4 ~]# mount 172.25.52.1:/home/hadoop/ /home/hadoop/
[root@server4 ~]# df
[root@server4 ~]# su - hadoop 
[hadoop@server4 ~]$ ls
hadoop  hadoop-3.2.1  hadoop-3.2.1.tar.gz  java  jdk1.8.0_181  jdk-8u181-linux-x64.tar.gz
[hadoop@server4 ~]$ cd hadoop/etc/hadoop/
[hadoop@server4 hadoop]$ ls

 新节点添加成功:

[hadoop@server1 ~]$ cd hadoop/etc/
[hadoop@server1 etc]$ ls
hadoop
[hadoop@server1 etc]$ cd hadoop/
[hadoop@server1 hadoop]$ vim mapred-site.xml 


    
        mapreduce.framework.name
        yarn
    
    
        mapreduce.application.classpath
        $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
    


[hadoop@server1 hadoop]$ vim yarn-site.xml


    
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    
    
        yarn.nodemanager.env-whitelist
        JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME
    

[hadoop@server1 hadoop]$ vim hadoop-env.sh 

export HADOOP_MAPRED_HOME=/home/hadoop/hadoop

[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ sbin/start-yarn.sh
[hadoop@server1 hadoop]$ ssh server4

[hadoop@server4 ~]$ exit
logout
Connection to server4 closed.
[hadoop@server1 hadoop]$ jps

 去server2、server3、server4查看:

前端查看:

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/695707.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号