Hadoop框架主要解决海量数据的分布式存储和分布式计算问题
一、发行版介绍
- 原生Hadoop:开源,无技术支持CDH:Cloudera Hadoop,商业版,收费,提供技术支持和界面操作,方便运维管理HDP:HortonWorks,开源,提供界面操作
二、Hadoop版本演变
版本架构:
Hadoop 1.x:HDFS+MapReduce
HDFS:分布式存储MapReduce:资源管理+分布式计算
Hadoop 2.x:HDFS+YARN+MapReduce
HDFS:分布式存储YARN:资源管理和调度MapReduce:分布式计算
Hadoop 3.x:架构与Hadoop2.x相同,在细节上进行优化
Java版本要求由Java7变为Java8支持纠删码,节省存储空间支持多NameNode,Hadoop中只支持2个NameNodeMapReduce任务级本地优化,MapReduce添加了映射输出收集器的本地化实现的支持,对于shuffle操作,性能提升30%修改了多重服务的默认端口
三、Hadoop安装
基础环境配置
静态IP
hostname
防火墙
ssh免密码登录
生成密钥文件:ssh-keygen -t rsa
将/.ssh下生成的公钥文件拷贝到需要免密码登录的节点/.ssh目录中,文件命名为:authorized_keys
========================================生成密钥======================================= [root@bigdata01 ~]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:8a26A10tNcfiBaw2lBfubOmqizGSelSIawmI9rpIBsU root@bigdata01 The key's randomart image is: +---[RSA 2048]----+ | oo+ | | . o.* + | |o E. . .. *.= | |+o. . . o*+o. | |o..o . .So.o* | |. +.... . + | | +..o o. . . | |+. ... +.. . | |..o. . ==. | +----[SHA256]-----+ ========================================拷贝公钥文件======================================= [root@bigdata01 ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ========================================免密码登录======================================= [root@bigdata01 ~]# ssh bigdata01 Last login: Fri Feb 25 18:28:27 2022 from fe80::ceb4:7878:9280:960c%ens33 [root@bigdata01 ~]#
伪分布式集群
架构
上传并解压hadoop安装包
[root@bigdata01 ~]# cd /data/soft [root@bigdata01 ~]# tar -zxvf hadoop-3.2.0 ========================================解压后文件夹结构======================================= [root@bigdata01 soft]# ll hadoop-3.2.0 total 184 drwxr-xr-x. 2 1001 1002 203 Jan 8 2019 bin drwxr-xr-x. 3 1001 1002 20 Jan 8 2019 etc drwxr-xr-x. 2 1001 1002 106 Jan 8 2019 include drwxr-xr-x. 3 1001 1002 20 Jan 8 2019 lib drwxr-xr-x. 4 1001 1002 4096 Jan 8 2019 libexec -rw-rw-r--. 1 1001 1002 150569 Oct 19 2018 LICENSE.txt -rw-rw-r--. 1 1001 1002 22125 Oct 19 2018 NOTICE.txt -rw-rw-r--. 1 1001 1002 1361 Oct 19 2018 README.txt drwxr-xr-x. 3 1001 1002 4096 Jan 8 2019 sbin drwxr-xr-x. 4 1001 1002 31 Jan 8 2019 share
bin目录下存放hadoop、hdfs、yarn等命令sbin目录下存放start-all、stop-all等启动或停止Hadoop组件的命令
修改系统环境变量,将hadoop中bin和sbin加入环境变量,并重新加载系统配置文件
[root@bigdata01 soft]# vi /etc/profile ========================================添加以下内容======================================= export HADOOP_HOME=/data/soft/hadoop-3.2.0 export PATH=.:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH [root@bigdata01 hadoop]# source /etc/profile
修改hadoop配置文件
hadoop-env.sh
[root@bigdata01 hadoop]# vi hadoop-env.sh ========================================添加以下内容======================================= export JAVA_HOME=/data/soft/jdk1.8 export HADOOP_LOG_DIR=/data/hadoop-repo/logs/hadoop
core-site.xml
fs.defaultFS hdfs://bigdata01:9000 hadoop.tmp.dir /data/hadoop_repo
hdfs-site.xml
dfs.replication 1
mapred-site.xml
mapreduce.framework.name yarn
yarn-site.xml
yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.env-whitelist JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
works
bigdata01
格式化hdfs
[root@bigdata01 hadoop]# hdfs namenode -format
修改启动文件,添加用户等相关信息
start-dfs.sh
HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
stop-dfs.sh
HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
start-yarn.sh
YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
stop-yarn.sh
YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
启动集群
[root@bigdata01 sbin]# start-all.sh
验证是否启动成功:使用jps命令,除jps外有:DataNode、ResourceManager、NameNode、SecondaryNameNode、NodeManager五个进程即为成功
[root@bigdata01 sbin]# jps 1732 DataNode 2197 ResourceManager 1622 NameNode 2664 Jps 2333 NodeManager 1951 SecondaryNameNode
分布式集群
架构
配置/etc/hosts,使节点能识别其他节点的主机名
[root@bigdata01 data]# vi /etc/hosts =====================================增加内容======================================= 192.168.214.100 bigdata01 192.168.214.101 bigdata02 192.168.214.102 bigdata03
节点之间时间同步
安装ntpdata命令
[root@bigdata01 data]# yum install -y ntpdate
时间同步命令
[root@bigdata01 data]# ntpdate -u ntp.sjtu.edu.cn
写入crontab:每分钟执行一次
* * * * * root /usr/sbin/ntpdate -u ntp.jitu.edu.cn
ssh免密码登录:将主节点的公钥信息拷贝到从节点中,追加在从节点authorized-keys文件中
=====================================在主节点中进行操作======================================= [root@bigdata01 ~]# scp ~/.ssh/authorized_keys bigdata02:~/ [root@bigdata01 ~]# scp ~/.ssh/authorized_keys bigdata03:~/ =====================================在从节点中进行操作======================================= [root@bigdata02 ~]# cat ~/authorized-keys >> ~/.ssh/authorized-keys [root@bigdata03 ~]# cat ~/authorized-keys >> ~/.ssh/authorized-keys
在主节点安装配置hadoop
配置与伪分布式部分内容有区别
1、hadoop/etc/hadoop/hadoop-env.sh配置与伪分布式相同 2、hadoop/etc/hadoop/core-site.xml配置:3、hadoop/etc/hadoop/hdfs-site.xml配置: fs.defaultFS hdfs://bigdata01:9000 hadoop.tmp.dir /data/hadoop_repo 4、hadoop/etc/hadoop/mapred-site.xml配置: dfs.replication 2 dfs.namenode.secondary.http-address bigdata01:50090 5、hadoop/etc/hadoop/yarn-site.xml配置: mapreduce.framework.name yarn 6、hadoop/etc/hadoop/workers配置:配置从节点主机名 bigdata02 bigdata03 7、修改hadoop/sbin/start-dfs.sh: HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root 8、修改hadoop/sbin/stop-dfs.sh: HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root 9、修改hadoop/sbin/start-yarn.sh: YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root 10、修改hadoop/sbin/stop-yarn.sh: YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.env-whitelist JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME yarn.resourcemanager.hostname bigdata01
将主节点上的hadoop安装包拷贝到从节点
[root@bigdata01 soft]# scp -rq hadoop-3.2.0 bigdata02:/data/soft/ [root@bigdata01 soft]# scp -rq hadoop-3.2.0 bigdata03:/data/soft/
在主节点格式化hdfs
[root@bigdata01 soft]# hdfs namenode -format
在主节点启动集群
[root@bigdata01 soft]# start-all.sh
检查是否启动成功
==========================主节点========================== [root@bigdata01 soft]# jps 2553 ResourceManager 2873 Jps 2317 SecondaryNameNode 2063 NameNode ==========================从节点========================== [root@bigdata02 soft]# jps 2576 Jps 2453 NodeManager 2348 DataNode [root@bigdata03 soft]# jps 2369 NodeManager 2498 Jps 2264 DataNode
停止集群:在主节点执行stop-all.sh



