栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Presto使用Docker独立运行Hive Standalone Metastore管理MinIO(S3)

Presto使用Docker独立运行Hive Standalone Metastore管理MinIO(S3)

在Hive 3.0.0以及之后,Hive metastore便可独立于Hive单独运行,可作为各数据的元数据中心。本文介绍使用Docker运行Hive Standalone metastore,并以Presto中的Hive连接器为例,通过Hive metastore管理MinIO(S3兼容的对象存储)中的数据。

本文涉及的组件及其版本:

组件名称组件版本
Hive Standalone metastore3.1.2
hadoop3.2.2
mysql5.7.35
presto0.261
MinIO8.3.3

如果您还未安装Minio,可参考:https://min.io/download

mysql安装方式参考:https://lrting-top.blog.csdn.net/article/details/120424755

presto安装方式参考:https://blog.csdn.net/weixin_39636364/article/details/120518455

构建Dockerfile

Hive metaStore需要以关系型数据库作为元数据管理,本文以MySQL为例,作为元数据存储。

  • MySQL版本:5.7.35
  • hostname:192.168.1.15
  • port:3306
  • username:root
  • password:Pass-123-root
  • database:metastore

除此之外,在上文中我们说到,要用此Hive metaStore作为MinIO的元数据管理,所以您还需配置MinIO的配置信息:

  • fs.s3a.endpoint:http://192.168.1.15:9000
  • fs.s3a.path.style.access:true
  • fs.s3a.connection.ssl.enabled:false
  • fs.s3a.access.key:minio
  • fs.s3a.secret.key:minio123

以上述配置信息构建Hive metastore的配置信息,metastore-site.xml




    
        fs.s3a.access.key
        M6ZBZGI1IIDA1O130OP8
    
    
        fs.s3a.secret.key
        f0BSwBw5GLKSS8hpIZd+qhJBIKooqq7xQdiowhpy
    
    
        fs.s3a.connection.ssl.enabled
        false
    
    
        fs.s3a.path.style.access
        true
    
    
        fs.s3a.endpoint
        http://192.168.1.15:9000
    
    
        javax.jdo.option.ConnectionURL
        jdbc:mysql://192.168.1.15:3306/metastore?useSSL=false&serverTimezone=UTC
    
    
        javax.jdo.option.ConnectionDriverName
        com.mysql.jdbc.Driver
    
    
        javax.jdo.option.ConnectionUserName
        root
    
    
        javax.jdo.option.ConnectionPassword
        m98Edicines
    
    
        hive.metastore.event.db.notification.api.auth
        false
    
    
        metastore.thrift.uris
        thrift://localhost:9083
        Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.
    
    
        metastore.task.threads.always
        org.apache.hadoop.hive.metastore.events.EventCleanerTask
    
    
        metastore.expression.proxy
        org.apache.hadoop.hive.metastore.DefaultPartitionexpressionProxy
    
    
        metastore.warehouse.dir
        /user/hive/warehouse
    

在构建Hive metastore镜像时,你还需要下载如下安装包以及JAR包:

  • hive-standalone-metastore-3.1.2-bin.tar.gz
  • hadoop-3.2.2.tar.gz
  • mysql-connector-java-5.1.49.jar

本文以将上述软件包放置于HTTP服务器为例:

全部Dockerfile为:

FROM centos:centos7

RUN yum install -y wget java-1.8.0-openjdk-devel && yum clean all

ARG HTTP_SERVER_HOSTNAME_PORT=192.168.1.15:11180

WORKDIR /install

RUN wget http://${HTTP_SERVER_HOSTNAME}/downloads/hive-standalone-metastore-3.1.2-bin.tar.gz
RUN tar zxvf hive-standalone-metastore-3.1.2-bin.tar.gz
RUN rm -rf hive-standalone-metastore-3.1.2-bin.tar.gz
RUN mv apache-hive-metastore-3.1.2-bin metastore

RUN wget http://${HTTP_SERVER_HOSTNAME}/downloads/hadoop-3.2.2.tar.gz
RUN tar zxvf hadoop-3.2.2.tar.gz
RUN rm -rf hadoop-3.2.2.tar.gz
RUN mv hadoop-3.2.2 hadoop

RUN wget http://${HTTP_SERVER_HOSTNAME}/downloads/mysql-connector-java-5.1.49.jar
RUN cp mysql-connector-java-5.1.49.jar ./metastore/lib

ENV JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
ENV HADOOP_HOME=/install/hadoop

RUN rm -f /install/metastore/lib/guava-19.0.jar 
  && cp ${HADOOP_HOME}/share/hadoop/common/lib/guava-27.0-jre.jar /install/metastore/lib 
  && cp ${HADOOP_HOME}/share/hadoop/tools/lib/hadoop-aws-3.2.2.jar /install/metastore/lib 
  && cp ${HADOOP_HOME}/share/hadoop/tools/lib/aws-java-sdk-bundle-*.jar /install/metastore/lib

# copy Hive metastore configuration file
COPY metastore-site.xml /install/metastore/conf/

# Hive metastore data folder
VOLUME ["/user/hive/warehouse"]

WORKDIR /install/metastore

RUN bin/schematool -initSchema -dbType mysql

CMD ["/install/metastore/bin/start-metastore"]

构建Docker镜像

将metastore-site.xml与Dockerfile文件放置于同一个目录下,并进入该目录中执行:

docker build . -t minio-hive-standalone-metastore:v1.0
运行Hive metastore
docker run -d -p 9083:9083/tcp --name minio-hive-metastore minio-hive-standalone-metastore:v1.0
使用Presto测试Hive metastore

如果您还没有安装好Presto,请先按照文档https://blog.csdn.net/weixin_39636364/article/details/120518455对catalog配置进行如下修改,并启动presto server

connector.name=hive-hadoop2

hive.metastore.uri=thrift://URL:9083
hive.metastore.username=metastore

hive.s3.aws-access-key=minio
hive.s3.aws-secret-key=minio123
hive.s3.endpoint=http://URL:9000
hive.s3.path-style-access=true

进入presto cli,查看catalogs:

show catalogs;

得到:

创建schema:

已知我们在MinIO上有一个hive-storage的buckets,那么执行如下命令创建schema

create schema hive.hive_storage with (location = 's3a://hive-storage/');

在该schema中创建表:

CREATE TABLE hive.hive_storage.sample_table (
   col1 varchar, 
   col2 varchar);

在表中插入数据

insert into hive.hive_storage.sample_table select 'value1', 'value2';

数据查询:

select * from hive.hive_storage.sample_table;

全部操作结果为:

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/673423.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号