1.检查状态:hadoop fsck /1.2 Linux常用:
1.排除端口查看netsh interface ipv4 show excludedportrange protocol=tcp1.2 基础环境准备 1.2.1 创建基础的centos7镜像
拉取官方centos7镜像
docker pull centos:centos7
通过build Dockfile生成带ssh功能的centos镜像
创建Dockerfile文件
vi Dockerfile
将如下内容写入Dockerfile
FROM centos:centos7 MAINTAINER mwf RUN yum install -y openssh-server sudo RUN sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config RUN yum install -y openssh-clients RUN echo "root:root123" | chpasswd RUN echo "root ALL=(ALL) ALL" >> /etc/sudoers RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key RUN mkdir /var/run/sshd EXPOSE 22 CMD ["/usr/sbin/sshd", "-D"]
上述内容大概意思是:以centos镜像为基础,设置密码为root123,安装ssh服务并启动
构建Dockerfile
docker build -t="centos7-ssh" .
将生成一个名为centos7-ssh的镜像,可以通过docker images查看
1.2.2 生成有hadoop和jdk环境的镜像将准备好的包放在当前目录下。hadoop-2.7.7.tar.gz和jdk-8u65-linux-x64.tar.gz
通过build Dockfile生成带hadoop和jdk环境的centos镜像
刚才已经创建了一个Dockerfile了,先将他移开。
mv Dockerfile Dockerfile.bak
创建Dockerfile
vi Dockerfile
将以下内容写入:
FROM centos7-ssh ADD jdk-8u202-linux-x64.tar.gz /usr/local/ RUN mv /usr/local/jdk1.8.0_202 /usr/local/jdk1.8 ENV JAVA_HOME /usr/local/jdk1.8 ENV PATH $JAVA_HOME/bin:$PATH ADD hadoop-2.7.7.tar.gz /usr/local RUN mv /usr/local/hadoop-2.7.7 /usr/local/hadoop ENV HADOOP_HOME /usr/local/hadoop ENV PATH $HADOOP_HOME/bin:$PATH ENV TIME_ZONE=Asia/Shanghai RUN ln -snf /usr/share/zoneinfo/$TIME_ZONE /etc/localtime && echo $TIME_ZONE > /etc/timezone RUN yum install -y which sudo
上述内容大概意思是:以上面生成的centos7-ssh为基础,将hadoop和jdk包放进去,然后配好环境变量。
构建Dockerfile
docker build -t="hadoop" .
将生成一个名为hadoop的镜像
1.3 配置网络,并启动docker容器因为集群间必须要能网络连通,所以要先配置好网络。
创建网络
docker network create --driver bridge hadoop-br
以上命令创建了一个名为hadoop-br的bridge类型的网络
启动docker时指定网络
docker run -itd --network hadoop-br --name hadoop1 -p 9000:9000 -p 5070:50070 -p9870:9870 -p 8088:8088 --privileged hadoop /usr/sbin/init docker run -itd --network hadoop-br --name hadoop2 --privileged hadoop /usr/sbin/init docker run -itd --network hadoop-br --name hadoop3 --privileged hadoop /usr/sbin/init
以上命令启动了3台机器,网络都指定为hadoop-br,hadoop1还开启了端口映射。
这里可能会选到不可用的端口
报错:
docker: Error response from daemon: Ports are not available: listen tcp 0.0.0.0:50070: bind: An attempt was made to access a socket in a way forbidden by its access permissions.可以使用
netsh interface ipv4 show excludedportrange protocol=tcp命令查看不可用端口列表,改掉映射端口
查看网络情况
docker network inspect hadoop-br
执行以上命令就可以看到对应的网络信息:
[
{
“Name”: “hadoop-br”,
“Id”: “62c0518d40ab5b52cc0eb469ab4eb0b76e64a70f04f6ab2c2fa783eac660d8fb”,
“Created”: “2022-04-12T08:47:19.7065376Z”,
“Scope”: “local”,
“Driver”: “bridge”,
“EnableIPv6”: false,
“IPAM”: {
“Driver”: “default”,
“Options”: {},
“Config”: [
{
“Subnet”: “172.18.0.0/16”,
“Gateway”: “172.18.0.1”
}
]
},
“Internal”: false,
“Attachable”: false,
“Ingress”: false,
“ConfigFrom”: {
“Network”: “”
},
“ConfigOnly”: false,
“Containers”: {
“84612a464c146cd60bfbf36fcb8282782da11531e5376e1a533d870ff3639905”: {
“Name”: “hadoop3”,
“EndpointID”: “f7e20cfa78d24ed2c131be0fc9ba8339439334a935b8f7db4c53e483c67eb942”,
“MacAddress”: “02:42:ac:12:00:04”,
“IPv4Address”: “172.18.0.4/16”,
“IPv6Address”: “”
},
“95c0aa8e782af5a8a95fd48ab3d76cb6a268bc0a74a666141d2d6690a5db090e”: {
“Name”: “hadoop2”,
“EndpointID”: “16d4951c9333baeb4e3097fc180fcc7f1a5901c427ee26793629414e8b20dc70”,
“MacAddress”: “02:42:ac:12:00:03”,
“IPv4Address”: “172.18.0.3/16”,
“IPv6Address”: “”
},
“dbe1e7ffdca98f312b0163e76b29d56d7b646250b5af3e408b1ebf7b6a38b981”: {
“Name”: “hadoop1”,
“EndpointID”: “72d826ff126f8a3874203c3ee49dafea1b6b4c08f2a8d49ed60822305ce823ab”,
“MacAddress”: “02:42:ac:12:00:02”,
“IPv4Address”: “172.18.0.2/16”,
“IPv6Address”: “”
}
},
“Options”: {},
“Labels”: {}
}
]
我们可以得知3台机器对应的ip:
- 172.18.0.2 hadoop1
- 172.18.0.3 hadoop2
- 172.18.0.4 hadoop3
登录docker容器,互相之间就可以ping通了。
-
docker exec -it hadoop1 bash
-
docker exec -it hadoop2 bash
-
docker exec -it hadoop3 bash
分别在每台修改每台机器的host
vi /etc/hosts
将以下内容写入(注:docker分出来的ip对于每个人可能不一样,填你自己的):
172.18.0.2 hadoop1 172.18.0.3 hadoop2 172.18.0.4 hadoop31.4.2 ssh免密登录
因为上面在镜像中已经安装了ssh服务,所以直接分别在每台机器上执行以下命令:
ssh-keygen
一路回车
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@hadoop1
输入密码,如果按我的来得话就是root123(同DokerFile中root密码)
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@hadoop2
输入密码,如果按我的来得话就是root123
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@hadoop3
输入密码,如果按我的来得话就是root123
1.4.3 测试是否配置成功- ping hadoop1
- ping hadoop2
- ping hadoop3
- ssh hadoop1
- ssh hadoop2
- ssh hadoop3
进入hadoop1
docker exec -it hadoop1 bash
创建一些文件夹,一会在配置中要用到
mkdir /home/hadoop mkdir /home/hadoop/tmp /home/hadoop/hdfs_name /home/hadoop/hdfs_data
切换到hadoop配置的目录
cd $HADOOP_HOME/etc/hadoop/
编辑core-site.xml
fs.defaultFS hdfs://hadoop1:9000 hadoop.tmp.dir file:/home/hadoop/tmp io.file.buffer.size 131702
编辑hdfs-site.xml
dfs.namenode.name.dir file:/home/hadoop/hdfs_name dfs.datanode.data.dir file:/home/hadoop/hdfs_data dfs.replication 2 dfs.namenode.secondary.http-address hadoop1:9001 dfs.webhdfs.enabled true dfs.client.use.datanode.hostname true dfs.datanode.hostname hadoop2 dfs.datanode.hostname hadoop3
编辑mapred-site.xml
mapred-site.xml默认不存在,要执行cp mapred-site.xml.template mapred-site.xml
mapreduce.framework.name yarn mapreduce.jobhistory.address hadoop1:10020 mapreduce.jobhistory.webapp.address hadoop1:19888
编辑yarn-site.xml
yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.auxservices.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.address hadoop1:8032 yarn.resourcemanager.scheduler.address hadoop1:8030 yarn.resourcemanager.resource-tracker.address hadoop1:8031 yarn.resourcemanager.admin.address hadoop1:8033 yarn.resourcemanager.webapp.address hadoop1:8088
编辑slaves
我这里把hadoop1当成主节点,hadoop2、3作为从节点
hadoop2 hadoop3
把文件拷贝到hadoop2和hadoop3上
依次执行以下命令:
scp -r $HADOOP_HOME/ hadoop2:/usr/local/ scp -r $HADOOP_HOME/ hadoop3:/usr/local/ scp -r /home/hadoop hadoop2:/ scp -r /home/hadoop hadoop3:/1.5.2 在每台机器上操作
分别连接每台机器
docker exec -it hadoop1 bash docker exec -it hadoop2 bash docker exec -it hadoop3 bash
配置hadoop sbin目录的环境变量
因为hadoop bin目录在之前创建镜像时就配好了,但是sbin目录没有配,所以要单独配置。分配为每台机器配置:
vi ~/.bashrc
追加如下内容:
export PATH=$PATH:$HADOOP_HOME/sbin
执行:
source ~/.bashrc1.5.3 启动hadoop
集群模式需要再声明JAVA_HOME
echo "export HADOOP_HOME=/usr/local/hadoop" >> /etc/bashrc echo "export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin" >> /etc/bashrc 注意:执行完这两行后source ~/.bashrc,执行echo $HADOOP_HOME会输出/usr/local/hadoop则表示成功 echo "export JAVA_HOME=/usr/local/jdk1.8" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh echo "export HADOOP_HOME=/usr/local/hadoop" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh echo "export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop/" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh source $HADOOP_HOME/etc/hadoop/hadoop-env.sh
!!!在hadoop1上执行以下命令:
格式化hdfs
hdfs namenode -format
执行命令式发生:bash: hdfs: command not found
解决方案:
1. 检查一下 /etc/profile中的路径是否配置正确($符不能忘,:是冒号,不是分号)
添加
export JAVA_HOEM=/usr/local/jdk1.8 export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin 3. source /etc/profile
一键启动
start-all.sh
不出错的话,就可以庆祝一下了。出错的话,加油。
1.6 测试使用hadoop查看状态:
jps
hadoop1
1748 Jps
490 NameNode
846 ResourceManager
686 SecondaryNameNodehadoop2
400 DataNode
721 Jps
509 NodeManagerhadoop3
425 NodeManager
316 DataNode
591 Jps
上传文件:
# 显示根目录 / 下的文件和子目录,绝对路径 hadoop fs -ls / # 新建文件夹,绝对路径 hadoop fs -mkdir /hello # 上传文件 hadoop fs -put hello.txt /hello/ # 下载文件 hadoop fs -get /hello/hello.txt # 输出文件内容 hadoop fs -cat /hello/hello.txt1.7错误解决 错误1:多次格式化后clusterId不匹配
File /hello/hello.txt.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
问题原因:hadoop集群clusterID不一致( namenode格式化过多次)。
解决办法:删掉namenode和各datanode节点的current目录,再重新格式化,命令:hdfs namenode -format
dfs.namenode.name.dir file:/home/hadoop/hdfs_name dfs.datanode.data.dir file:/home/hadoop/hdfs_data
重启start-all.sh
错误2:Failed to connect to hadoop2/ip:port for block11:19:58.517 [main] WARN org.apache.hadoop.hdfs.DFSClient - Failed to connect to hadoop2/172.18.0.3:50010 for block BP-1446294434-172.18.0.2-1649818239147:blk_1073741825_1001, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
查看DataNode的源码,我们发现,Hadoop是用getHostName方法去查找Hostname的:
private static String getHostName(Configuration config)
throws UnknownHostException {
String name = config.get(DFS_DATANODE_HOST_NAME_KEY);
if (name == null) {
name = DNS.getDefaultHost(
config.get(DFS_DATANODE_DNS_INTERFACE_KEY,
DFS_DATANODE_DNS_INTERFACE_DEFAULT),
config.get(DFS_DATANODE_DNS_NAMESERVER_KEY,
DFS_DATANODE_DNS_NAMESERVER_DEFAULT));
}
return name;
}
进而定位到DFS_DATANODE_HOST_NAME_KEY这个静态常量:
public static final String DFS_DATANODE_HOST_NAME_KEY = "dfs.datanode.hostname";
而且代码中加了注释说://Following keys have no defaults
-> https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
所以,我们可以在$HADOOP_HOME/etc/hadoop/hdfs-site.xml加入如下片段:
错误3:时区问题dfs.datanode.hostname hadoop3
执行以下命令切换时区:
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime错误4:ClusterId不匹配
删除core-site.xml 和hdfs-site.xml文件中配置的文件夹(注意:这一步在三台机器上都要做): rm -rf /home/hadoop/ mkdir /home/hadoop/tmp /home/hadoop/hdfs_name /home/hadoop/hdfs_data1.8管理界面 1. hadoop管理页面
http://ip地址:8088/cluster/nodes
效果如下:
2. hdfs 管理页面http://ip地址:50070/
点击datanode,效果如下:
浏览文件系统
默认有2个文件夹,这里面的文件是看不到的。
由于默认开启了安全默认,默认是没有权限查看文件的。需要关闭安全模式才行!
关闭安全模式进入hadoop-master容器,执行命令:
hadoop dfsadmin -safemode leave
授权tmp文件权限
hdfs dfs -chmod -R 755 /tmp
刷新页面,点击tmp
返回上一级目录,进入/user/root/input,就可以看到脚本创建的2个文件了!
注意:hdfs存放目录为:/root/hdfs。如果需要做持久化,将此目录映射出来即可!



