全网最详细的Hadoop大数据集群搭建并进行项目分析(基于完全分布式)---第二部分

##所有需要的资料全部已上传到百度网盘上，请自行下载##

获取镜像，https://pan.baidu.com/s/1ho4hMrvIu1V6W4wWdH8nIA，提取码：ygyg
获取Xshell，https://pan.baidu.com/s/1xWRle9chuNtBpE0fDa7DHA，提取码：u3s6
获取Hadoop，https://pan.baidu.com/s/1a5M23KlUMtqKOoWqDnZBHQ，提取码：y1y3
获取jdk，https://pan.baidu.com/s/1ftofkxBKIYuOhooe2tj_1A，提取码：z9y4
获取 MySQL，https://pan.baidu.com/s/19wa564c6Pln1ReJOmHbh-g，提取码：y4k7
获取MySQL的jdbc驱动jar包，https://pan.baidu.com/s/1vFCKEttZNnd5ZfyeompcqQ，提取码：dsj8
获取Hive，https://pan.baidu.com/s/1YcnL07UVg_Czr1mMFfgJsQ，提取码：n4i7
获取Sqoop，https://pan.baidu.com/s/1wY5NcbI6hwKDt6r9BWu0Hg,提取码：u3x9
获取Zeppelin ，https://pan.baidu.com/s/1xjqbw3FO1sNClLSgd1iFhw，提取码：yw52

##所有需要的资料全部已上传到百度网盘上，请自行下载## 第二部分：大数据集群搭建完全分布式(共分四部分)

第七章、安装配置MySQL

1、卸载Centos7自带mariadb..........................2、创建mysql安装包存放点.................................3、上传mysql-5.7.29安装包到上述文件夹下、解压............4、执行安装..............................................5、初始化mysql...........................................6、更改所属组.............................................7、启动mysql..............................................8、查看生成的临时root密码...............................9、这行日志的最后就是随机生成的临时密码..................10、修改mysql root密码、授权远程访问....................11、更新root密码设置为hadoop..........................12、授权.................................................13、mysql的启动和关闭状态查看...........................14、建议设置为开机自启动服务.............................15、查看是否已经设置自启动成功........................... 第八章、安装Hive并配置

1、解压hive文件.........................................2、解决hadoop、hive之间guava版本差异...................3、添加mysql jdbc驱动到hive安装包lib/文件下............4、修改hive环境变量文件添加Hadoop_HOME.................5、新增hive-site.xml 配置mysql等相关信息................6、初始化metadata........................................7、在node3上安装配置Hive................................8、配置logs文件.........................................9、配置beeline连接报错..................................10、启动集群、Hive.......................................11、Hive命令表操作.......................................

第二部分：大数据集群搭建完全分布式(共分四部分) 第七章、安装配置MySQL 1、卸载Centos7自带mariadb…

#执行命令
rpm -qa|grep mariadb
mariadb-libs-5.5.56-2.el7.x86_64
rpm -e mariadb-libs-5.5.56-2.el7.x86_64 --nodeps

2、创建mysql安装包存放点…

mkdir /export/software/mysql

3、上传mysql-5.7.29安装包到上述文件夹下、解压…

tar xvf mysql-5.7.29-1.el7.x86_64.rpm-bundle.tar

4、执行安装…

yum -y install libaio
yum -y install net-tools 
rpm -ivh mysql-community-common-5.7.29-1.el7.x86_64.rpm mysql-community-libs-5.7.29-1.el7.x86_64.rpm mysql-community-client-5.7.29-1.el7.x86_64.rpm mysql-community-server-5.7.29-1.el7.x86_64.rpm

5、初始化mysql…

mysqld --initialize

6、更改所属组…

chown mysql:mysql /var/lib/mysql -R

7、启动mysql…

systemctl start mysqld.service

8、查看生成的临时root密码…

cat  /var/log/mysqld.log

9、这行日志的最后就是随机生成的临时密码…

[Note] A temporary password is generated for root@localhost: /JOFe7,c&jj0

10、修改mysql root密码、授权远程访问…

mysql -u root -p
Enter password:     #这里输入在日志中生成的临时密码

11、更新root密码设置为hadoop…

mysql> alter user user() identified by "hadoop";
Query OK, 0 rows affected (0.00 sec)

12、授权…

mysql> use mysql;
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'hadoop' WITH GRANT OPTION;
mysql> FLUSH PRIVILEGES;

13、mysql的启动和关闭状态查看…

systemctl stop mysqld
systemctl status mysqld
systemctl start mysqld

14、建议设置为开机自启动服务…

systemctl enable  mysqld

15、查看是否已经设置自启动成功…

systemctl list-unit-files | grep mysqld

第八章、安装Hive并配置 1、解压hive文件…

#解压文件
tar zxvf apache-hive-3.1.2-bin.tar.gz -C /export/server
#移动文件
mv /export/server/apache-hive-3.1.2-bin /export/server/hive

2、解决hadoop、hive之间guava版本差异…

cd /export/server/hive
rm -rf lib/guava-19.0.jar
cp /export/server/hadoop-3.1.4/share/hadoop/common/lib/guava-27.0-jre.jar ./lib/

3、添加mysql jdbc驱动到hive安装包lib/文件下…

mysql-connector-java-5.1.32.jar

获取MySQL的jdbc驱动jar包，mysql-connector-java-5.1.32.jar，提取码：dsj8

4、修改hive环境变量文件添加Hadoop_HOME…

cd /export/server/hive/conf/
mv hive-env.sh.template hive-env.sh
vim hive-env.sh
export HADOOP_HOME=/export/server/hadoop-3.1.4
export HIVE_CONF_DIR=/export/server/hive/conf
export HIVE_AUX_JARS_PATH=/export/server/hive/lib

如下图所示：

5、新增hive-site.xml 配置mysql等相关信息…

vim hive-site.xml添加以下内容：

    
    
        javax.jdo.option.ConnectionURL
         jdbc:mysql://node1:3306/hive?createDatabaseIfNotExist=true&useSSL=false&useUnicode=true&characterEncoding=UTF-8
    
    
        javax.jdo.option.ConnectionDriverName
        com.mysql.jdbc.Driver
    
    
        javax.jdo.option.ConnectionUserName
        root
    
    
        javax.jdo.option.ConnectionPassword
        hadoop
    
    
    
        hive.server2.thrift.bind.host
        node1
    
    
    
        hive.metastore.uris
        thrift://node1:9083
    
    
    
        hive.metastore.event.db.notification.api.auth
        false
    
    
    
        hive.metastore.schema.verification
        false

6、初始化metadata…

cd /export/server/hive
bin/schematool -initSchema -dbType mysql -verbos

7、在node3上安装配置Hive…

1、进入node3环境下，解压Hive文件
cd /export/software/
tar zxvf apache-hive-3.1.2-bin.tar.gz -C /export/server
mv /export/server/apache-hive-3.1.2-bin /export/server/hive
2、解决hadoop、hive之间guava版本差异:
rm -rf /export/server/hive/lib/guava-19.0.jar
cp /export/server/hadoop-3.1.4/share/hadoop/common/lib/guava-27.0-jre.jar /export/server/hive/lib/

3、添加mysql jdbc驱动到hive安装包lib/文件下:
mysql-connector-java-5.1.32.jar

4、修改hive环境变量文件 添加Hadoop_HOME:
cd /export/server/hive/conf/
mv hive-env.sh.template hive-env.sh
vim hive-env.sh
export HADOOP_HOME=/export/server/hadoop-3.1.4
export HIVE_CONF_DIR=/export/server/hive/conf
export HIVE_AUX_JARS_PATH=/export/server/hive/lib
5、新增hive-site.xml 配置mysql等相关信息:
vim hive-site.xml添加以下内容：

    
        hive.metastore.uris
        thrift://node1:9083

8、配置logs文件…

1、在export路径下创建logs文件夹mkdir logs，添加以下内容：
nohup/export/server/hive/bin/hive --service metastore > ./metastore.log 2>&1 &
nohup/export/server/hive/bin/hive--service hiveserver2 > ./hiveserver2.log 2>&1 &
2、配置metastor和hiveserver3并在后台运行，连接beeline不会报错

如图所示：

9、配置beeline连接报错…

1、beeline连接报错 root is not allowed to impersonate root (state=08S01,code=0)
修改hadoop 配置/export/server/hadoop-3.1.4/etc/hadoop/core-site.xml,添加如下配置项：

  hadoop.proxyuser.root.hosts
  *


  hadoop.proxyuser.root.groups
  *

2、将配置好的core文件分发到node2、node3节点上：
scp -r /export/server/hadoop-3.1.4/etc/hadoop/core-site.xml root@node2://export/server/hadoop-3.1.4/etc/hadoop
scp -r /export/server/hadoop-3.1.4/etc/hadoop/core-site.xml root@node3://export/server/hadoop-3.1.4/etc/hadoop

10、启动集群、Hive…

start -all.sh
到cd /export/server/hive/bin路径下，输入 ./beeline启动；再输入
! connect jdbc:hive2://node1:10000 再输入root然后一直回车

如下图所示：

11、Hive命令表操作…

1、查看数据库show databases;
2、创建数据库create database if not exists myhive;
3、进入数据库use myhive;
4、查看该数据库中的表show tables;
5、对应的数据库在hdfs上的路径为 /user/hive/warehouse
6、删除数据库 drop database myhive; 如果有数据就会报错
7、强制删除数据库，包含数据库下面的表一起删除 
drop database myhive2 cascade; 
8、查看表的结构 desc stu1；
9、查看表的内容select * from stu1;
10、向表中插入数据
insert into stu values(1,'zhangsan'); 
insert into stu values(2,'lisi');
create table if not exists stu4(id int ,name string) row format delimited fields terminated by 't' ;
11、在windows上下载stu4文件并利用rz -E拖到data文件中
12、在HDFS上新建文件hadoop fs -mkdir -p /mytest
13、在data路径下直接上传文件到hdfs上表对应的路径
hadoop fs -put stu4.txt /user/hive/warehouse/mytest.db/stu4/
14、导入数据load data inpath '/hivedatas/stu.txt' into table stu4;

15、创建student表：
create external table student (sid string,sname string,sbirth string , ssex string) row format delimited fields terminated by 't' location '/hive_table/student';学生表添加数据 ：
load data local inpath '/export/data/student.txt' into table student;

16、创建teacher表：
create external table teacher (tid string,tname string) row format delimited fields terminated by 't' location '/hive_table/teacher';
老师表添加数据，并覆盖已有数据 ：
load data local inpath '/export/data/teacher.txt' overwrite into table teacher;

17、创建分数表：
 create table score(sid string,cid string, sscore int) partitioned by (month string) row format delimited fields terminated by 't';
导入数据：load data local inpath '/export/data/score.txt' into table score partition (month='202006');

18、创建分数表2:
create table score2(sid string,cid string, sscore int) partitioned by (year string,month string, day string) 
row format delimited fields terminated by 't'; 
导入数据：load data local inpath '/export/data/score.txt' into table score2 partition(year='2020',month='06',day='01');

19、查询表命令：
select * from score2 where year = '2020' and month = '06' and day = '01'；
show partitions score;
alter table score add partition(month='202008'); 
alter table score add partition(month='202009') partition(month = '202010');
alter table score drop partition(month = '202010');

20、创建hive_array表：
create external table hive_array(name string, work_locations array) row format delimited fields terminated by 't’ 
collection items terminated by ','; 
导入数据：load data local inpath '/export/data/array_data.txt' overwrite into table hive_array;
-- 查询loction数组中第一个元素 
select name, work_locations[0] location from hive_array; 
-- 查询location数组中元素的个数 
select name, size(work_locations) location from hive_array;
-- 查询location数组中包含tianjin的信息
select* from hive_array where array_contains(work_locations,'tianjin');

PS：这是本项目的第二部分，剩余的部分烦请移步到本人主页的查找，如有做的不好的地方请多多包涵！

全网最详细的Hadoop大数据集群搭建并进行项目分析(基于完全分布式)---第二部分

大数据系统相关栏目本月热门文章