获取镜像,https://pan.baidu.com/s/1ho4hMrvIu1V6W4wWdH8nIA,提取码:ygyg
获取Xshell,https://pan.baidu.com/s/1xWRle9chuNtBpE0fDa7DHA,提取码:u3s6
获取Hadoop,https://pan.baidu.com/s/1a5M23KlUMtqKOoWqDnZBHQ,提取码:y1y3
获取jdk,https://pan.baidu.com/s/1ftofkxBKIYuOhooe2tj_1A,提取码:z9y4
获取 MySQL,https://pan.baidu.com/s/19wa564c6Pln1ReJOmHbh-g,提取码:y4k7
获取MySQL的jdbc驱动jar包,https://pan.baidu.com/s/1vFCKEttZNnd5ZfyeompcqQ,提取码:dsj8
获取Hive,https://pan.baidu.com/s/1YcnL07UVg_Czr1mMFfgJsQ,提取码:n4i7
获取Sqoop,https://pan.baidu.com/s/1wY5NcbI6hwKDt6r9BWu0Hg,提取码:u3x9
获取Zeppelin ,https://pan.baidu.com/s/1xjqbw3FO1sNClLSgd1iFhw,提取码:yw52
##所有需要的资料全部已上传到百度网盘上,请自行下载## 第二部分:大数据集群搭建完全分布式(共分四部分)
第七章、安装配置MySQL
1、卸载Centos7自带mariadb..........................2、创建mysql安装包存放点.................................3、上传mysql-5.7.29安装包到上述文件夹下、解压............4、执行安装..............................................5、初始化mysql...........................................6、更改所属组.............................................7、启动mysql..............................................8、查看生成的临时root密码...............................9、这行日志的最后就是随机生成的临时密码..................10、修改mysql root密码、授权远程访问....................11、更新root密码 设置为hadoop..........................12、授权.................................................13、mysql的启动和关闭 状态查看...........................14、建议设置为开机自启动服务.............................15、查看是否已经设置自启动成功........................... 第八章、安装Hive并配置
1、解压hive文件.........................................2、解决hadoop、hive之间guava版本差异...................3、添加mysql jdbc驱动到hive安装包lib/文件下............4、修改hive环境变量文件 添加Hadoop_HOME.................5、新增hive-site.xml 配置mysql等相关信息................6、初始化metadata........................................7、在node3上安装配置Hive................................8、配置logs文件.........................................9、配置beeline连接报错..................................10、启动集群、Hive.......................................11、Hive命令表操作.......................................
第二部分:大数据集群搭建完全分布式(共分四部分) 第七章、安装配置MySQL 1、卸载Centos7自带mariadb…#执行命令 rpm -qa|grep mariadb mariadb-libs-5.5.56-2.el7.x86_64 rpm -e mariadb-libs-5.5.56-2.el7.x86_64 --nodeps2、创建mysql安装包存放点…
mkdir /export/software/mysql3、上传mysql-5.7.29安装包到上述文件夹下、解压…
tar xvf mysql-5.7.29-1.el7.x86_64.rpm-bundle.tar4、执行安装…
yum -y install libaio yum -y install net-tools rpm -ivh mysql-community-common-5.7.29-1.el7.x86_64.rpm mysql-community-libs-5.7.29-1.el7.x86_64.rpm mysql-community-client-5.7.29-1.el7.x86_64.rpm mysql-community-server-5.7.29-1.el7.x86_64.rpm5、初始化mysql…
mysqld --initialize6、更改所属组…
chown mysql:mysql /var/lib/mysql -R7、启动mysql…
systemctl start mysqld.service8、查看生成的临时root密码…
cat /var/log/mysqld.log9、这行日志的最后就是随机生成的临时密码…
[Note] A temporary password is generated for root@localhost: /JOFe7,c&jj010、修改mysql root密码、授权远程访问…
mysql -u root -p Enter password: #这里输入在日志中生成的临时密码11、更新root密码 设置为hadoop…
mysql> alter user user() identified by "hadoop"; Query OK, 0 rows affected (0.00 sec)12、授权…
mysql> use mysql; mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'hadoop' WITH GRANT OPTION; mysql> FLUSH PRIVILEGES;13、mysql的启动和关闭 状态查看…
systemctl stop mysqld systemctl status mysqld systemctl start mysqld14、建议设置为开机自启动服务…
systemctl enable mysqld15、查看是否已经设置自启动成功…
systemctl list-unit-files | grep mysqld第八章、安装Hive并配置 1、解压hive文件…
#解压文件 tar zxvf apache-hive-3.1.2-bin.tar.gz -C /export/server #移动文件 mv /export/server/apache-hive-3.1.2-bin /export/server/hive2、解决hadoop、hive之间guava版本差异…
cd /export/server/hive rm -rf lib/guava-19.0.jar cp /export/server/hadoop-3.1.4/share/hadoop/common/lib/guava-27.0-jre.jar ./lib/3、添加mysql jdbc驱动到hive安装包lib/文件下…
mysql-connector-java-5.1.32.jar
获取MySQL的jdbc驱动jar包,mysql-connector-java-5.1.32.jar,提取码:dsj8
4、修改hive环境变量文件 添加Hadoop_HOME…cd /export/server/hive/conf/ mv hive-env.sh.template hive-env.sh vim hive-env.sh export HADOOP_HOME=/export/server/hadoop-3.1.4 export HIVE_CONF_DIR=/export/server/hive/conf export HIVE_AUX_JARS_PATH=/export/server/hive/lib
如下图所示:
vim hive-site.xml添加以下内容:6、初始化metadata…javax.jdo.option.ConnectionURL jdbc:mysql://node1:3306/hive?createDatabaseIfNotExist=true&useSSL=false&useUnicode=true&characterEncoding=UTF-8 javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName root javax.jdo.option.ConnectionPassword hadoop hive.server2.thrift.bind.host node1 hive.metastore.uris thrift://node1:9083 hive.metastore.event.db.notification.api.auth false hive.metastore.schema.verification false
cd /export/server/hive bin/schematool -initSchema -dbType mysql -verbos7、在node3上安装配置Hive…
1、进入node3环境下,解压Hive文件 cd /export/software/ tar zxvf apache-hive-3.1.2-bin.tar.gz -C /export/server mv /export/server/apache-hive-3.1.2-bin /export/server/hive 2、解决hadoop、hive之间guava版本差异: rm -rf /export/server/hive/lib/guava-19.0.jar cp /export/server/hadoop-3.1.4/share/hadoop/common/lib/guava-27.0-jre.jar /export/server/hive/lib/ 3、添加mysql jdbc驱动到hive安装包lib/文件下: mysql-connector-java-5.1.32.jar 4、修改hive环境变量文件 添加Hadoop_HOME: cd /export/server/hive/conf/ mv hive-env.sh.template hive-env.sh vim hive-env.sh export HADOOP_HOME=/export/server/hadoop-3.1.4 export HIVE_CONF_DIR=/export/server/hive/conf export HIVE_AUX_JARS_PATH=/export/server/hive/lib 5、新增hive-site.xml 配置mysql等相关信息: vim hive-site.xml添加以下内容:8、配置logs文件…hive.metastore.uris thrift://node1:9083
1、在export路径下创建logs文件夹mkdir logs,添加以下内容: nohup/export/server/hive/bin/hive --service metastore > ./metastore.log 2>&1 & nohup/export/server/hive/bin/hive--service hiveserver2 > ./hiveserver2.log 2>&1 & 2、配置metastor和hiveserver3并在后台运行,连接beeline不会报错
如图所示:
1、beeline连接报错 root is not allowed to impersonate root (state=08S01,code=0) 修改hadoop 配置/export/server/hadoop-3.1.4/etc/hadoop/core-site.xml,添加如下配置项:10、启动集群、Hive…hadoop.proxyuser.root.hosts * 2、将配置好的core文件分发到node2、node3节点上: scp -r /export/server/hadoop-3.1.4/etc/hadoop/core-site.xml root@node2://export/server/hadoop-3.1.4/etc/hadoop scp -r /export/server/hadoop-3.1.4/etc/hadoop/core-site.xml root@node3://export/server/hadoop-3.1.4/etc/hadoop hadoop.proxyuser.root.groups *
start -all.sh 到cd /export/server/hive/bin路径下,输入 ./beeline启动;再输入 ! connect jdbc:hive2://node1:10000 再输入root然后一直回车
如下图所示:
1、查看数据库show databases; 2、创建数据库create database if not exists myhive; 3、进入数据库use myhive; 4、查看该数据库中的表show tables; 5、对应的数据库在hdfs上的路径为 /user/hive/warehouse 6、删除数据库 drop database myhive; 如果有数据就会报错 7、强制删除数据库,包含数据库下面的表一起删除 drop database myhive2 cascade; 8、查看表的结构 desc stu1; 9、查看表的内容select * from stu1; 10、向表中插入数据 insert into stu values(1,'zhangsan'); insert into stu values(2,'lisi'); create table if not exists stu4(id int ,name string) row format delimited fields terminated by 't' ; 11、在windows上下载stu4文件并利用rz -E拖到data文件中 12、在HDFS上新建文件hadoop fs -mkdir -p /mytest 13、在data路径下直接上传文件到hdfs上表对应的路径 hadoop fs -put stu4.txt /user/hive/warehouse/mytest.db/stu4/ 14、导入数据load data inpath '/hivedatas/stu.txt' into table stu4; 15、创建student表: create external table student (sid string,sname string,sbirth string , ssex string) row format delimited fields terminated by 't' location '/hive_table/student';学生表添加数据 : load data local inpath '/export/data/student.txt' into table student; 16、创建teacher表: create external table teacher (tid string,tname string) row format delimited fields terminated by 't' location '/hive_table/teacher'; 老师表添加数据,并覆盖已有数据 : load data local inpath '/export/data/teacher.txt' overwrite into table teacher; 17、创建分数表: create table score(sid string,cid string, sscore int) partitioned by (month string) row format delimited fields terminated by 't'; 导入数据:load data local inpath '/export/data/score.txt' into table score partition (month='202006'); 18、创建分数表2: create table score2(sid string,cid string, sscore int) partitioned by (year string,month string, day string) row format delimited fields terminated by 't'; 导入数据:load data local inpath '/export/data/score.txt' into table score2 partition(year='2020',month='06',day='01'); 19、查询表命令: select * from score2 where year = '2020' and month = '06' and day = '01'; show partitions score; alter table score add partition(month='202008'); alter table score add partition(month='202009') partition(month = '202010'); alter table score drop partition(month = '202010'); 20、创建hive_array表: create external table hive_array(name string, work_locations array) row format delimited fields terminated by 't’ collection items terminated by ','; 导入数据:load data local inpath '/export/data/array_data.txt' overwrite into table hive_array; -- 查询loction数组中第一个元素 select name, work_locations[0] location from hive_array; -- 查询location数组中元素的个数 select name, size(work_locations) location from hive_array; -- 查询location数组中包含tianjin的信息 select* from hive_array where array_contains(work_locations,'tianjin');
PS:这是本项目的第二部分,剩余的部分烦请移步到本人主页的查找,如有做的不好的地方请多多包涵!



