栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

【hive】ubuntu20.04 安装hive3.1.2配置

【hive】ubuntu20.04 安装hive3.1.2配置

1.安装包准备

apache-hive-3.1.2-bin.tar.gz

mysql-server, mysql-client

提前安装好适配版本的hadoop和jdk并能启动hadoop

mysql版本对应的jar包 :mysql-connector-java-8.0.28.jar

2.安装mysql,创建元数据用户

参考博文:

ubuntu20.4安装mysql8.0_weixin_42332638的博客-CSDN博客

创建hive用户;

create user 'hive'@'%' IDENTIFIED WITH mysql_native_password BY '123';

授权;

GRANT ALL PRIVILEGES ON *.* TO hive@'%' WITH GRANT OPTION;
#刷新权限
flush privileges;

修改/etc/mysql/mysql.conf.d/mysqld.cnf

bind-address            = 0.0.0.0  #原127.0.0.1 修改为0.0.0.0
mysqlx-bind-address     = 127.0.0.1

重启mysql服务

sudo service mysql restart

3.hive安装

通常情况下安装在master 主机即可:

3.1 解压 ,重命名,移动
tar -zxvf ~/apache-hive-3.1.2-bin.tar.gz -C ~/
mv ~/apache-hive-3.1.2-bin ~/hive
sudo mv ~/hive /usr/local/
3.2 修改配置文件

hive-default.xmlhive-site.xml环境变量配置文件 /etc/profile移动mysql连接用的JAR包 3.2.1 修改默认template

cd /usr/local/hive/conf
# ls -la
mv hive-default.xml.template hive-default.xml
# 将默认模板配置文件重命名启用
3.2.2 hive-site.xml文件配置
sudo vi hive-site.xml

输入以下:




 
    javax.jdo.option.ConnectionURL

    jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false&allowPublicKeyRetrieval=true
    JDBC connect string for a JDBC metastore
 
 
    javax.jdo.option.ConnectionDriverName
    com.mysql.cj.jdbc.Driver
    Driver class name for a JDBC metastore
 

 
    javax.jdo.option.ConnectionUserName
    hive
    username to use against metastore database
 

 
    javax.jdo.option.ConnectionPassword
    123
    password to use against metastore database
 

 
    datanucleus.autoCreateTables
    True
 

 
        hive.metastore.schema.verification
        false
 

 
        hive.cli.print.current.db
        true
 

 
   hive.cli.print.header
   true
 

       
          hive.server2.thrift.port
          10000

#hive server2通过thrift连接的端口号 默认10000  pyhive连接需要配置
       
       
         hive.server2.thrift.bind.host
         192.168.1.210

#hive server2 连接的主机ip 默认本机ip
       

:wq

保存退出

3.2.3 配置环境变量
sudo vi /etc/profile

添加如下

export HIVE_HOME=/usr/local/hive

export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin:$JRE_HOME/bin:$ZOOKEEPER_HOME/bin:$HIVE_HOME/bin
 

source  /etc/profile
3.2.4 移动jdbc连接mysql元数据库的jar包:
 mv ~/mysql-connector-java-8.0.28.jar /usr/local/hive/lib/
4. 启动hive

启动hive前先启动hadoop集群

4.1 启动hadoop
cd /usr/local/hadoop/sbin/
./start-all.sh
# master 主机启动集群

# jps查看集群状态
jps
################
8080 DataNode
9088 Jps
8327 SecondaryNameNode
8520 ResourceManager
8696 NodeManager
7917 NameNode
################
ok
4.2 运行hive

已经配置过path环境变量,可以直接运行hive命令

hive

报错处理:

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
 

分别查看hadoop和hive的 guavaJAR包版本

ls -la /usr/local/hadoop/share/hadoop/common/lib |grep guava
#####
-rwxr-xr-x 1 hadoop hadoop 2747878 Sep 10  2019 guava-27.0-jre.jar
-rwxr-xr-x 1 hadoop hadoop    2199 Sep 10  2019 listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
#####
ls -la /usr/local/hive/lib |grep guava
#####
-rw-r--r--  1 hadoop hadoop  2308517 Sep 27  2018 guava-19.0.jar
-rw-r--r--  1 hadoop hadoop   971309 May 21  2019 jersey-guava-2.25.1.jar
#####

删除hive的JAR,将hadoop的jar包移动到对应目录。

rm /usr/local/hive/lib/guava-19.0.jar 
cp /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar /usr/local/hive/lib/

重新运行hive

进入hive SHELL

show tables;

报错

此时hive的元数据mysql数据库没有初始化

退出hive shell;  exit;

在linux shell 运行如下:

schematool -initSchema -dbType mysql

Initialization script completed
schemaTool completed

OK 

check:

玩一下建表写数据~~~

hive (default)> show databases;
OK
database_name
default
##########################
hive (default)> create database hive;
OK
Time taken: 0.12 seconds
###############################
hive (default)> use hive;
OK
Time taken: 0.041 seconds
############################
hive (hive)> create table test_hive(id int,name string,age int,phone string,email string,home_address string)partitioned by (dt string) stored as orc;
OK
Time taken: 0.324 seconds
################################

hive (hive)> insert into test_hive partition (dt ='2022-03-08') select 1,'mahuateng',18,'13888888888','123456@qq.com','shenzhen';
########################################################################
Query ID = hadoop_20220308191934_89cd2fb5-08d5-4add-b0d4-c58b3328ec25
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1646734785518_0001, Tracking URL = http://master-msi:8088/proxy/application_1646734785518_0001/
Kill Command = /usr/local/hadoop/bin/mapred job  -kill job_1646734785518_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-03-08 19:19:48,985 Stage-1 map = 0%,  reduce = 0%
2022-03-08 19:20:00,363 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.5 sec
2022-03-08 19:20:07,613 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.16 sec
MapReduce Total cumulative CPU time: 7 seconds 160 msec
Ended Job = job_1646734785518_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://master-msi:9000/user/hive/warehouse/hive.db/test_hive/dt=2022-03-08/.hive-staging_hive_2022-03-08_19-19-34_546_8561434546202741399-1/-ext-10000
Loading data to table hive.test_hive partition (dt=2022-03-08)
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 7.16 sec   HDFS Read: 19917 HDFS Write: 1135 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 160 msec
OK
_c0	_c1	_c2	_c3	_c4	_c5
Time taken: 36.266 seconds
##############################################################################
hive (hive)> select * from test_hive;
OK
test_hive.id	test_hive.name	test_hive.age	test_hive.phone	test_hive.email	test_hive.home_address	test_hive.dt
1	mahuateng	18	13888888888	123456@qq.com	shenzhen	2022-03-08
Time taken: 0.244 seconds, Fetched: 1 row(s)
5. 启动hiveserver2
hiveserver2

等待运行 :

hadoop@master-msi:/usr/local/hive/conf$ hiveserver2 
2022-03-08 20:01:26: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = b65ea6db-fdc8-48d8-9d72-55f338be672b
Hive Session ID = 216af5bc-8b98-4cff-b394-191ebdc818fa
OK

浏览器可以访问默认端口10002

192.168.1.210:10002

 

python 连接hive需要挂载hiveserver2服务, 访问端口10000。

安装pyhive,以及依赖的包。

pip install sasl

pip install thrift

pip install thrift-sasl  #有坑

pip install pyhive

hadoop的HDFS目录下/tmp需要至少有写权限(默认有)

修改在WEB端可以访问,linux命令行输入

hadoop fs -chmod  -R 777 /tmp

玩一下pyhive~~

from pyhive import hive

conn = hive.Connection(host='192.168.1.210', port=10000, username='hive', database='hive')
cur = conn.cursor()
sql = 'select * from test_hive'
# sql字符串结尾加;会报错
cur.execute(sql)
lines = cur.fetchall()
for line in lines:
    print(line)
    for i in range(len(line)):
        print(line[i])
conn.close()

完结~

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/758464.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号