apache-hive-3.1.2-bin.tar.gz
mysql-server, mysql-client
提前安装好适配版本的hadoop和jdk并能启动hadoop
mysql版本对应的jar包 :mysql-connector-java-8.0.28.jar
2.安装mysql,创建元数据用户参考博文:
ubuntu20.4安装mysql8.0_weixin_42332638的博客-CSDN博客
创建hive用户;
create user 'hive'@'%' IDENTIFIED WITH mysql_native_password BY '123';
授权;
GRANT ALL PRIVILEGES ON *.* TO hive@'%' WITH GRANT OPTION;
#刷新权限 flush privileges;
修改/etc/mysql/mysql.conf.d/mysqld.cnf
bind-address = 0.0.0.0 #原127.0.0.1 修改为0.0.0.0 mysqlx-bind-address = 127.0.0.1
重启mysql服务
sudo service mysql restart
3.hive安装通常情况下安装在master 主机即可:
3.1 解压 ,重命名,移动tar -zxvf ~/apache-hive-3.1.2-bin.tar.gz -C ~/ mv ~/apache-hive-3.1.2-bin ~/hive sudo mv ~/hive /usr/local/3.2 修改配置文件
hive-default.xmlhive-site.xml环境变量配置文件 /etc/profile移动mysql连接用的JAR包 3.2.1 修改默认template
cd /usr/local/hive/conf # ls -la mv hive-default.xml.template hive-default.xml # 将默认模板配置文件重命名启用3.2.2 hive-site.xml文件配置
sudo vi hive-site.xml
输入以下:
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false&allowPublicKeyRetrieval=true
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
com.mysql.cj.jdbc.Driver
Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
hive
username to use against metastore database
javax.jdo.option.ConnectionPassword
123
password to use against metastore database
datanucleus.autoCreateTables
True
hive.metastore.schema.verification
false
hive.cli.print.current.db
true
hive.cli.print.header
true
hive.server2.thrift.port
10000 #hive server2通过thrift连接的端口号 默认10000 pyhive连接需要配置
hive.server2.thrift.bind.host
192.168.1.210 #hive server2 连接的主机ip 默认本机ip
:wq
保存退出
3.2.3 配置环境变量sudo vi /etc/profile
添加如下
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin:$JRE_HOME/bin:$ZOOKEEPER_HOME/bin:$HIVE_HOME/bin
source /etc/profile3.2.4 移动jdbc连接mysql元数据库的jar包:
mv ~/mysql-connector-java-8.0.28.jar /usr/local/hive/lib/4. 启动hive
启动hive前先启动hadoop集群
4.1 启动hadoopcd /usr/local/hadoop/sbin/ ./start-all.sh # master 主机启动集群 # jps查看集群状态 jps ################ 8080 DataNode 9088 Jps 8327 SecondaryNameNode 8520 ResourceManager 8696 NodeManager 7917 NameNode ################ ok4.2 运行hive
已经配置过path环境变量,可以直接运行hive命令
hive
报错处理:
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
分别查看hadoop和hive的 guavaJAR包版本
ls -la /usr/local/hadoop/share/hadoop/common/lib |grep guava ##### -rwxr-xr-x 1 hadoop hadoop 2747878 Sep 10 2019 guava-27.0-jre.jar -rwxr-xr-x 1 hadoop hadoop 2199 Sep 10 2019 listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar ##### ls -la /usr/local/hive/lib |grep guava ##### -rw-r--r-- 1 hadoop hadoop 2308517 Sep 27 2018 guava-19.0.jar -rw-r--r-- 1 hadoop hadoop 971309 May 21 2019 jersey-guava-2.25.1.jar #####
删除hive的JAR,将hadoop的jar包移动到对应目录。
rm /usr/local/hive/lib/guava-19.0.jar cp /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar /usr/local/hive/lib/
重新运行hive
进入hive SHELL
show tables;
报错
此时hive的元数据mysql数据库没有初始化
退出hive shell; exit;
在linux shell 运行如下:
schematool -initSchema -dbType mysql
Initialization script completed
schemaTool completed
OK
check:玩一下建表写数据~~~
hive (default)> show databases; OK database_name default ########################## hive (default)> create database hive; OK Time taken: 0.12 seconds ############################### hive (default)> use hive; OK Time taken: 0.041 seconds ############################ hive (hive)> create table test_hive(id int,name string,age int,phone string,email string,home_address string)partitioned by (dt string) stored as orc; OK Time taken: 0.324 seconds ################################ hive (hive)> insert into test_hive partition (dt ='2022-03-08') select 1,'mahuateng',18,'13888888888','123456@qq.com','shenzhen'; ######################################################################## Query ID = hadoop_20220308191934_89cd2fb5-08d5-4add-b0d4-c58b3328ec25 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=5. 启动hiveserver2In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1646734785518_0001, Tracking URL = http://master-msi:8088/proxy/application_1646734785518_0001/ Kill Command = /usr/local/hadoop/bin/mapred job -kill job_1646734785518_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2022-03-08 19:19:48,985 Stage-1 map = 0%, reduce = 0% 2022-03-08 19:20:00,363 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.5 sec 2022-03-08 19:20:07,613 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.16 sec MapReduce Total cumulative CPU time: 7 seconds 160 msec Ended Job = job_1646734785518_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://master-msi:9000/user/hive/warehouse/hive.db/test_hive/dt=2022-03-08/.hive-staging_hive_2022-03-08_19-19-34_546_8561434546202741399-1/-ext-10000 Loading data to table hive.test_hive partition (dt=2022-03-08) MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.16 sec HDFS Read: 19917 HDFS Write: 1135 SUCCESS Total MapReduce CPU Time Spent: 7 seconds 160 msec OK _c0 _c1 _c2 _c3 _c4 _c5 Time taken: 36.266 seconds ############################################################################## hive (hive)> select * from test_hive; OK test_hive.id test_hive.name test_hive.age test_hive.phone test_hive.email test_hive.home_address test_hive.dt 1 mahuateng 18 13888888888 123456@qq.com shenzhen 2022-03-08 Time taken: 0.244 seconds, Fetched: 1 row(s)
hiveserver2
等待运行 :
hadoop@master-msi:/usr/local/hive/conf$ hiveserver2
2022-03-08 20:01:26: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = b65ea6db-fdc8-48d8-9d72-55f338be672b
Hive Session ID = 216af5bc-8b98-4cff-b394-191ebdc818fa
OK
浏览器可以访问默认端口10002
192.168.1.210:10002
python 连接hive需要挂载hiveserver2服务, 访问端口10000。
安装pyhive,以及依赖的包。
pip install sasl
pip install thrift
pip install thrift-sasl #有坑
pip install pyhive
hadoop的HDFS目录下/tmp需要至少有写权限(默认有)
修改在WEB端可以访问,linux命令行输入
hadoop fs -chmod -R 777 /tmp
玩一下pyhive~~
from pyhive import hive
conn = hive.Connection(host='192.168.1.210', port=10000, username='hive', database='hive')
cur = conn.cursor()
sql = 'select * from test_hive'
# sql字符串结尾加;会报错
cur.execute(sql)
lines = cur.fetchall()
for line in lines:
print(line)
for i in range(len(line)):
print(line[i])
conn.close()
完结~



