GFS:Google的分布式文件系统Google File System
MapReduce:Google的MapReduce开源分布式并行计算框架
BigTable:一个大型的分布式数据库
GFS—>HDFS
Google MapReduce—-->Hadoop MapReduce
BigTable—->Hbase
Hadoop名字不是一个缩写,是Hadpop之父Doug Cutting儿子毛绒玩具象命名的。
Apache基金会hadoop
Cloudera版本(Cloudera's Distribution Including Apache Hadoop,简称“CDH”)
Hortonworks版本(Hortonworks Data Platform,简称“HDP”)
| Apache hadoop | CDH | HDP | |
| 管理工具 | 手工 | Cloudera Manager | Ambari |
| 收费情况 | 开源 | 社区版免费,企业版收费 | 免费 |
Hadoop的框架最核心的设计就是:HDFS和MapReduce。 HDFS为海量的数据提供了存储,MapReduce为海量的数据提供了计算。
Hadoop框架包括以下四个模块:Hadoop Common:这些是其他Hadoop模块所需的Java库和实用程序。这些库提供文件系统和操作系统级抽象,并包含启动Hadoop所需的Java文件和脚本。
Hadoop YARN:这是一个用于作业调度和集群资源管理的框架。
Hadoop Distributed File System (HDFS):分布式文件系统,提供对应用程序数据的高吞吐量访问。Hadoop MapReduce;这是基于YARN的用于并行处理大数据集的系统
Hadoop应用场景:在线旅游、移动数据、电子商务、能源开采与节能、基础架构管理、图像处理、诈骗检测、IT安全、医疗保健
官网地址:https://hadoop.apache.org/docs/
二、Hadoop文件系统的搭建(单机)[hadoop@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz [hadoop@server1 ~]$ ls hadoop-3.2.1.tar.gz jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz [hadoop@server1 ~]$ ln -s jdk1.8.0_181/ java [hadoop@server1 ~]$ tar zxf hadoop-3.2.1.tar.gz [hadoop@server1 ~]$ ln -s hadoop-3.2.1 hadoop [hadoop@server1 ~]$ ll [hadoop@server1 ~]$ cd hadoop [hadoop@server1 hadoop]$ ls bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share [hadoop@server1 hadoop]$ cd etc/ [hadoop@server1 etc]$ ls hadoop [hadoop@server1 etc]$ cd hadoop/
[hadoop@server1 hadoop]$ vim hadoop-env.sh
export JAVA_HOME=/home/hadoop/java
export HADOOP_HOME=/home/hadoop/hadoop
[hadoop@server1 ~]$ cd hadoop/ [hadoop@server1 hadoop]$ pwd /home/hadoop/hadoop [hadoop@server1 hadoop]$ mkdir input [hadoop@server1 hadoop]$ ls bin etc include input lib libexec LICENSE.txt NOTICE.txt README.txt sbin share [hadoop@server1 hadoop]$ cp etc/hadoop/*.xml input [hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+' [hadoop@server1 hadoop]$ ls bin etc include input lib libexec LICENSE.txt NOTICE.txt output README.txt sbin share [hadoop@server1 hadoop]$ cd output/ [hadoop@server1 output]$ ls part-r-00000 _SUCCESS [hadoop@server1 output]$ cat * 1 dfsadmin [hadoop@server1 output]$
[hadoop@server1 hadoop]$ vim core-site.xml[hadoop@server1 hadoop]$ vim hdfs-site.xml fs.defaultFS hdfs://localhost:9000 dfs.replication 1
编辑配置文件:
[hadoop@server1 hadoop]$ ssh-keygen [hadoop@server1 hadoop]$ cd [hadoop@server1 ~]$ ls hadoop hadoop-3.2.1 hadoop-3.2.1.tar.gz java jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz [hadoop@server1 ~]$ cd .ssh/ [hadoop@server1 .ssh]$ ls id_rsa id_rsa.pub known_hosts [hadoop@server1 .ssh]$ cp id_rsa.pub authorized_keys [hadoop@server1 .ssh]$ chmod 600 authorized_keys [hadoop@server1 .ssh]$ ll [hadoop@server1 .ssh]$ ssh localhost
[hadoop@server1 ~]$ cd hadoop [hadoop@server1 hadoop]$ bin/hdfs namenode -format [hadoop@server1 hadoop]$ ls /tmp/ hadoop hadoop-hadoop hadoop-hadoop-namenode.pid hsperfdata_hadoop [hadoop@server1 hadoop]$ ls bin include lib LICENSE.txt NOTICE.txt README.txt share etc input libexec logs output sbin [hadoop@server1 hadoop]$ sbin/start-dfs.sh ##开启 [hadoop@server1 hadoop]$ cd [hadoop@server1 ~]$ cd java/ [hadoop@server1 java]$ ls [hadoop@server1 java]$ cd bin/ [hadoop@server1 bin]$ ls
[hadoop@server1 ~]$ vim .bash_profile [hadoop@server1 ~]$ source .bash_profile [hadoop@server1 ~]$ jps 4118 NameNode 4438 SecondaryNameNode 4232 DataNode 4600 Jps [hadoop@server1 ~]$ ps ax
在浏览器访问 http://172.25.52.1:9870
查看文件系统的根目录:
查看日志:
[hadoop@server1 ~]$ cd hadoop [hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user ##建立目录 [hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop [hadoop@server1 hadoop]$ id uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop) [hadoop@server1 hadoop]$ bin/hdfs dfs -ls ##查看文件系统中hapoop用户家目录下的文件 [hadoop@server1 hadoop]$ bin/hdfs dfs -put input ##上传文件到文件系统 [hadoop@server1 hadoop]$ bin/hdfs dfs -ls Found 1 items drwxr-xr-x - hadoop supergroup 0 2021-12-28 10:41 input [hadoop@server1 hadoop]$ [hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output
在前端查看:
[hadoop@server1 hadoop]$ bin/hdfs dfs -get output 2021-12-28 10:46:32,661 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false [hadoop@server1 hadoop]$ ls bin etc include input lib libexec LICENSE.txt logs NOTICE.txt output README.txt sbin share [hadoop@server1 hadoop]$ cd output/ [hadoop@server1 output]$ ls part-r-00000 _SUCCESS [hadoop@server1 output]$ cat part-r-00000 _SUCCESS
[hadoop@server1 output]$ cd .. [hadoop@server1 hadoop]$ ls bin include lib LICENSE.txt NOTICE.txt README.txt share etc input libexec logs output sbin [hadoop@server1 hadoop]$ rm -fr output/ [hadoop@server1 hadoop]$ ls bin include lib LICENSE.txt NOTICE.txt sbin etc input libexec logs README.txt share [hadoop@server1 hadoop]$三、Hadoop分布式文件系统的搭建
新建虚拟机server2和server3:
[root@server2 ~]# useradd hadoop [root@server2 ~]# id hadoop uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop) [root@server2 ~]# yum install -y nfs-utils [root@server3 ~]# useradd hadoop [root@server3 ~]# id hadoop uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop) [root@server3 ~]# yum install -y nfs-utils
在server1中:
[hadoop@server1 hadoop]$ jps 4118 NameNode 4438 SecondaryNameNode 4232 DataNode 15337 Jps [hadoop@server1 hadoop]$ ls bin etc include input lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share [hadoop@server1 hadoop]$ sbin/stop-dfs.sh ##停止 Stopping namenodes on [localhost] Stopping datanodes Stopping secondary namenodes [server1] [hadoop@server1 hadoop]$ exit logout [root@server1 ~]# yum install -y nfs-utils [root@server1 ~]# vim /etc/exports /home/hadoop *(rw,anonuid=1000,anongid=1000) [root@server1 ~]# id hadoop uid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop) [root@server1 ~]# systemctl start nfs [root@server1 ~]# showmount -e Export list for server1: /home/hadoop * [root@server1 ~]#
在server2和3中 :
[root@server2 ~]# showmount -e 172.25.52.1 Export list for 172.25.52.1: /home/hadoop * [root@server2 ~]# mount 172.25.52.1:/home/hadoop/ /home/hadoop/ [root@server2 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/rhel-root 17811456 1168600 16642856 7% / devtmpfs 1011400 0 1011400 0% /dev tmpfs 1023464 0 1023464 0% /dev/shm tmpfs 1023464 17036 1006428 2% /run tmpfs 1023464 0 1023464 0% /sys/fs/cgroup /dev/vda1 1038336 135172 903164 14% /boot tmpfs 204696 0 204696 0% /run/user/0 172.25.52.1:/home/hadoop 17811456 3003648 14807808 17% /home/hadoop [root@server2 ~]# su - hadoop [hadoop@server2 ~]$ ls hadoop hadoop-3.2.1 hadoop-3.2.1.tar.gz java jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz [hadoop@server2 ~]$
[root@server1 ~]# su - hadoop Last login: Tue Dec 28 10:18:38 CST 2021 from localhost on pts/1 [hadoop@server1 ~]$ ls hadoop hadoop-3.2.1 hadoop-3.2.1.tar.gz java jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz [hadoop@server1 ~]$ ssh server1 Last login: Tue Dec 28 11:19:15 2021 [hadoop@server1 ~]$ exit logout Connection to server1 closed. [hadoop@server1 ~]$ ssh server2 [hadoop@server1 ~]$ ssh server3
[hadoop@server1 ~]$ cd hadoop/etc/hadoop/ [hadoop@server1 hadoop]$ vim hdfs-site.xml [hadoop@server1 hadoop]$ vim core-site.xml [hadoop@server1 hadoop]$ vim workers server2 server3 [hadoop@server1 hadoop]$ cd [hadoop@server1 ~]$ cd /tmp/ [hadoop@server1 tmp]$ ls hadoop hadoop-hadoop hsperfdata_hadoop [hadoop@server1 tmp]$ rm -fr * [hadoop@server1 tmp]$ ls [hadoop@server1 tmp]$
[hadoop@server1 hadoop]$ vim hdfs-site.xml
[hadoop@server1 hadoop]$ vim core-site.xml
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
在前端查看分布式网络文件系统是否部署成功:
文件系统中现在还没有文件:
开启文件系统,建立目录并上传文件:
[hadoop@server1 hadoop]$ sbin/start-dfs.sh Starting namenodes on [server1] Starting datanodes Starting secondary namenodes [server1] [hadoop@server1 hadoop]$ jps 16480 NameNode 16824 Jps 16703 SecondaryNameNode [hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user [hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop [hadoop@server1 hadoop]$ bin/hdfs dfs -put input [hadoop@server1 hadoop]$ bin/hdfs dfs -ls Found 1 items drwxr-xr-x - hadoop supergroup 0 2021-12-28 11:41 input [hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount input output
在前端查看:
在新增加一个节点:
[root@server4 ~]# useradd hadoop [root@server4 ~]# yum install -y nfs-utils [root@server4 ~]# mount 172.25.52.1:/home/hadoop/ /home/hadoop/ [root@server4 ~]# df [root@server4 ~]# su - hadoop [hadoop@server4 ~]$ ls hadoop hadoop-3.2.1 hadoop-3.2.1.tar.gz java jdk1.8.0_181 jdk-8u181-linux-x64.tar.gz [hadoop@server4 ~]$ cd hadoop/etc/hadoop/ [hadoop@server4 hadoop]$ ls
新节点添加成功:
[hadoop@server1 ~]$ cd hadoop/etc/ [hadoop@server1 etc]$ ls hadoop [hadoop@server1 etc]$ cd hadoop/ [hadoop@server1 hadoop]$ vim mapred-site.xml[hadoop@server1 hadoop]$ vim yarn-site.xml mapreduce.framework.name yarn mapreduce.application.classpath $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/* [hadoop@server1 hadoop]$ vim hadoop-env.sh export HADOOP_MAPRED_HOME=/home/hadoop/hadoop yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.env-whitelist JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME
[hadoop@server1 ~]$ cd hadoop [hadoop@server1 hadoop]$ sbin/start-yarn.sh [hadoop@server1 hadoop]$ ssh server4 [hadoop@server4 ~]$ exit logout Connection to server4 closed. [hadoop@server1 hadoop]$ jps
去server2、server3、server4查看:
前端查看:



