栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Hive的基本使用

Hive的基本使用

一.Hive安装

1.1安装地址

1)Hive官网地址

http://hive.apache.org/

2)文档查看地址

https://cwiki.apache.org/confluence/display/Hive/GettingStarted

3)下载地址

http://archive.apache.org/dist/hive/

4)github地址

GitHub - apache/hive: Apache Hive

1.2Hive的安装部署

1.2.0 修改hadoop相关配置

配置core-site.xml

[atguigu@hadoop102 ~]$ cd $HADOOP_HOME/etc/hadoop

[atguigu@hadoop102 hadoop]$ vim core-site.xml

增加如下配置

    

        hadoop.proxyuser.atguigu.hosts

        *

    

        hadoop.proxyuser.atguigu.groups

        *

    

        hadoop.proxyuser.atguigu.users

        *

配置yarn-site.xml

Amount of physical memory, in MB, that can be allocated

for containers. If set to -1 and

yarn.nodemanager.resource.detect-hardware-capabilities is true, it is

automatically calculated(in case of Windows and Linux).

In other cases, the default is 8192MB.

yarn.nodemanager.resource.memory-mb

4096

The minimum allocation for every container request at the RM in MBs. Memory requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have less memory than this value will be shut down by the resource manager.

yarn.scheduler.minimum-allocation-mb

512

The maximum allocation for every container request at the RM in MBs. Memory requests higher than this will throw an InvalidResourceRequestException.

yarn.scheduler.maximum-allocation-mb

4096

Whether virtual memory limits will be enforced for

containers.

yarn.nodemanager.vmem-check-enabled

false

两个文件修改完毕后,记得分发,然后重启集群

1.2.1 安装Hive

1)把apache-hive-3.1.2-bin.tar.gz上传到linux的/opt/software目录下

2)解压apache-hive-3.1.2-bin.tar.gz到/opt/module/目录下面

[root@localhost software]$ tar -zxvf /opt/software/apache-hive-3.1.2-bin.tar.gz -C /opt/module/

3)修改apache-hive-3.1.2-bin.tar.gz的名称为hive

[root@localhost software]$ mv /opt/module/apache-hive-3.1.2-bin/ /opt/module/hive

4)修改/etc/profile.d/my_env.sh,添加环境变量

[root@localhost software]$ sudo vim /etc/profile.d/my_env.sh

5)添加内容

#HIVE_HOME

export HIVE_HOME=/opt/module/hive

export PATH=$PATH:$HIVE_HOME/bin

6)解决日志Jar包冲突

[root@localhost software]$ mv $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.jar $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.bak

7)初始化元数据库

[root@localhost hive]$ bin/schematool -dbType derby -initSchema

1.2.2 启动并使用Hive

1)启动Hive

[root@localhost hive]$ bin/hive

2)使用Hive

hive> show databases;

hive> show tables;

hive> create table test(id int);

hive> insert into test values(1);

hive> select * from test;

3)在xshell窗口中开启另一个窗口开启Hive,在/tmp/atguigu目录下监控hive.log文件

Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /opt/module/hive/metastore_db.

        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)

        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)

        at org.apache.derby.impl.store.raw.data.baseDataFileFactory.privGetJBMSLockonDB(Unknown Source)

        at org.apache.derby.impl.store.raw.data.baseDataFileFactory.run(Unknown Source)

...

原因在于Hive默认使用的元数据库为derby,开启Hive之后就会占用元数据库,且不与其他客户端共享数据,所以我们需要将Hive的元数据地址改为MySQL。

  1. 在Hive的安装目录下将derby.log和metastore_db删除,顺便将hdfs上目录删除

[root@localhost hive]$ rm -rf derby.log metastore_db

[root@localhost hive]$ hadoop fs -rm -r /user

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/714040.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号