HDFS(分布式存储系统)
NameNode:HDFS集群主节点
SecondaryNamenode:NameNode 的冷备份
DateNode:是 HDFS 集群从节点
YARN(资源管理器)----MapReduce、Storm,spark,flink
ResourceManager
NodeManager
MapReduce(分布式并行计算系统)
1)HDFS的基础架构
2)HDFS的特点
一次写入多次读取 不适合低延迟数据访问 无法高效存储大量小文件 不支持多用户写入及任意修改文件
3)高可用(容灾设计)
在NameNode和DateNode之间维持心跳检测,如果NameNode不能正常收到DataNode的心跳,则认为该节点挂了。NameNode会检索副本数目小于设置值文件块,复制其新的副本,并分发到其他数据节点上。检测文件块的完整性:HDFS会记录每个新创建的文件的所有块的校验,检索时会优先提取校验和记录相同的副本。集群的负载均衡:当某 个数据节点的空闲空间大于一个临界值的时候,HDFS 会自动从其他数据节点迁移数据过来。
4)基本操作
CLI基本操作
创建目录
hdfs dfs -mkdir -p /user/test/input
上传文件
hdfs dfs -put etc/hadoopsession optional pam_loginuid.so/g' /etc/pam.d/sshd
/bin/echo -e "LANG="en_US.UTF-8"" > /etc/default/local
启动服务
/usr/sbin/sshd -D
检测是否可以免密SSH登录localhost,不果不能,则执行以下命令
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
3)域名配置
编辑hosts文件,添加域名:
命令:
vi /etc/hosts
内容:
192.168.31.10 master
192.168.31.11 slave01
192.168.31.12 slave02
4)JDK安装
1.下载JDK1.8安装包,上传至服务器,并解压。
2.环境变量配置
vim /etc/profile
在文件最后添加
export JAVA_HOME=/home/hadoop/java/jdk1.8.0_161
export PATH=$PATH:$JAVA_HOME/bin
3.刷新使生效
source /etc/profile
5)Hadoop安装
下载安装包,并解压至服务器。
下载地址:
https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
官网下载极慢,建议使用下面链接
https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/
6)Hadoop配置
hadoop-env.sh
export JAVA_HOME=/usr/local/software/java8/jdk1.8.0_311/
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
-----------------------------------
core-site.xml
hadoop.tmp.dir
file:/usr/local/hadoop/tmp
Abase for other temporary directories.
fs.defaultFS
hdfs://master:9000
-----------------------------------
hdfs-site.xml
dfs.namenode.name.dir
file:/usr/local/software/hadoop/hadoop-3.3.1/namenode_dir
dfs.datanode.data.dir
file:/usr/local/software/hadoop/hadoop-3.3.1/datanode_dir
dfs.replication
3
dfs.http.address
master:50070
----------------------------------------
mapred-site.xml
mapreduce.framework.name
yarn
yarn.app.mapreduce.am.env
HADOOP_MAPRED_HOME=${HADOOP_HOME}
mapreduce.map.env
HADOOP_MAPRED_HOME=${HADOOP_HOME}
mapreduce.reduce.env
HADOOP_MAPRED_HOME=${HADOOP_HOME}
----------------------------------------
yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.hostname
master
yarn.nodemanager.resource.memory-mb
20480
yarn.scheduler.minimum-allocation-mb
2048
yarn.nodemanager.vmem-pmem-ratio
2.1
------------------------------------
配置域名
修改主节点文件:etc/hadoop/workers,将 localhost 替换成两个 slave 的主机名
slave01
slave02
7)启动hadoop集群
格式化
hdfs namenode -format
启动hdfs集群
sbin/start-dfs.sh
启动yarn集群
sbin/start-yarn.sh
8)踩坑点说明
1.50070端口不能访问 1)hdfs-site.xml天机端口配置6.基于IDEA搭建Windows开发环境2)删除节点数据 3)格式化 4)启动hdfs集群 5)启动yarn集群 2.报错:Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster mapred-site.xml中添加配置: dfs.http.address master:50070 yarn.app.mapreduce.am.env HADOOP_MAPRED_HOME=${HADOOP_HOME} mapreduce.map.env HADOOP_MAPRED_HOME=${HADOOP_HOME} 3.http://192.168.31.10:50070页面查看文件失败 在客户端机器上,更改C:WindowsSystem32driversetchosts文件,添加hadoop集群域名信息: 192.168.31.10 master 192.168.31.11 slave01 192.168.31.12 slave02 4.页面查看文件失败,报错:Couldn’t preview the file. NetworkError: Failed to execute ‘send’ on ‘XMLHttpRequest’: Failed to load ‘http://slave1:9864/webhdfs/v1/HelloHadoop.txt?op=OPEN&namenoderpcaddress=master:9820&offset=0&_=1609724219001’. [root@master ~]# vim /usr/bigdata/hadoop-3.3.0/etc/hadoop/hdfs-site.xml mapreduce.reduce.env HADOOP_MAPRED_HOME=${HADOOP_HOME} 重启集群 dfs.webhdfs.enabled true
1)前提条件
已成功安装JDK1.8、Maven
2)资源下载
hadoop安装包
winutils
3)部署
使用7-Zip解压hadoop安装包至指定目录(两次提取)
解压winutils,并用指定版本的bin目录覆盖hadoop的bin目录。
4)Hadoop配置
Hadoop安装目录:D:SOFTBigDatahadoop-3.3.1
hadoop-env.cmd文件末尾添加一行:
set HADOOP_IDENT_STRING="Administrator"
----------------------------------------------
core-site.xml
hadoop.tmp.dir
/D:/SOFT/BigData/hadoop-3.3.1/workplace/tmp
dfs.name.dir
/D:/SOFT/BigData/hadoop-3.3.1/workplace/name
fs.defaultFS
hdfs://localhost:9000
----------------------------------------------
hdfs-site.xml
dfs.replication
1
dfs.data.dir
/D:/SOFT/BigData/hadoop-3.3.1/workplace/data
----------------------------------------------
mapred-site.xml
dfs.replication
1
dfs.data.dir
/D:/Environment/hadoop/workplace/data
----------------------------------------------
yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
5)项目开发
pom依赖
org.apache.hadoop hadoop-common3.3.1 org.apache.hadoop hadoop-client3.3.1 org.slf4j slf4j-log4j121.7.30 org.slf4j slf4j-nop1.7.30
log4j.properties
log4j.rootLogger=WARN, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
WordCount.java
package com.xxx.mapreducer;
import org.apache.commons.io.FileUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.File;
import java.io.IOException;
public class WordCount {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
File file=new File(args[1]);
if (file.exists()){
FileUtils.deleteDirectory(file);
}
System.setProperty("HADOOP_USER_NAME","root");
Configuration configuration = new Configuration();
configuration.set("hadoop.tmp.dir", "D:/SOFT/BigData/hadoop-3.3.1/workplace/tmp");
Job job = Job.getInstance(configuration, "wordCount");
job.setJarByClass(WordCount.class);
job.setMapperClass(MyMapper.class);
job.setCombinerClass(MyCombiner.class);
job.setReducerClass(MyCombiner.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
MyMapper.java
package com.xxx.mapreducer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; import java.util.StringTokenizer; public class MyMapper extends Mapper
MyCombiner.java
package com.xxx.mapreducer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; public class MyCombiner extends Reducer{ private IntWritable result = new IntWritable(); @Override public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }
6)调试
7)踩坑点:
代码运行报错:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z 检查环境变量是否配置正确,且Hadoop bin目录下hadoop.dll和winutils.exe 拷贝hadoop.dll至c:windowssystem32目录



