栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Hive搭建记录

Hive搭建记录

Hive搭建记录 安装 下载、上传、解压 环境变量

vi ~/.bashrc
vi ~/app/hive/conf/hive-env.sh 【HADOOP_HOME已有环境变量,可不用再在hive-env.sh 设置】

整合MySQL hive-site.xml

hive不提供,需要自己在 conf/下添加,url、driver、user、password

[liqiang@Gargantua conf]$ pwd
/home/liqiang/app/hive/conf
[liqiang@Gargantua conf]$ vi hive-site.xml




	 
   	     javax.jdo.option.ConnectionURL
    	     jdbc:mysql://ip:3306/my_hive?createDatabaseIfNotExist=true
 	 

	
	  javax.jdo.option.ConnectionDriverName
	  com.mysql.jdbc.Driver
	

	
	  javax.jdo.option.ConnectionUserName
	  账号
	

	
	  javax.jdo.option.ConnectionPassword
	  密码
	
	

mysql-connector-java-5.1.49.jar

【hive不提供,可以在自己maven仓库里上传至 ~app/hive/lib/】

初始化数据库到MySQL

固定写法:./schematool -dbType mysql -initSchema
如没做这一步init操作,不会在mysql中初始化表;Hive将元数据保存在MySQL

[liqiang@Gargantua bin]$ ./schematool -dbType mysql -initSchema
hive 启动
[liqiang@Gargantua ~]$ cd ~/app/hive/bin
[liqiang@Gargantua bin]$ hive

报错:com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)

[liqiang@Gargantua bin]$ hive
which: no hbase in (/home/liqiang/app/hive/bin:/home/liqiang/app/hadoop/bin:/home/liqiang/app/hadoop/sbin:/usr/java/jdk1.8.0_121/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/liqiang/.local/bin:/home/liqiang/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/liqiang/app/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/liqiang/app/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
// ...
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

参考:https://www.cnblogs.com/syq816/p/12632028.html 解决。
即把hadoop(hadoop/share/hadoop/common/lib) 下的 guava-27.0-jre.jar 复制到hive/lib,保持jar版本一致

再次启动:[liqiang@Gargantua bin]$ hive

成功启动,进入Hive。

关闭Hive

hive> quit;
hiveserver2 启动

前端启动 hiveserver2 【ssh终端关闭后服务就会停止】

[liqiang@Gargantua bin]$ ./hiveserver2
or
[liqiang@Gargantua bin]$ hive --service hiveserver2

后端启动 hiveserver2

[liqiang@Gargantua bin]$ nohub sh hiveserver2 &

nohup : 不挂断的运行,免疫session的SIGHUP信号。 no hang up(不挂断) 的缩写
& :在后台运行,免疫(Ctrl + C)SIGINT信号。

hiveserver2 启动后,可以使用 beeline 联接

[liqiang@Gargantua bin]$ ./beeline

beeline> !connect jdbc:hive2://localhost:10000 liqiang
Connecting to jdbc:hive2://localhost:10000

Enter password for jdbc:hive2://localhost:10000: 

然后报错。。

21/12/31 00:43:00 [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: 	Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): 
User: liqiang is not allowed to impersonate hadoop (state=08S01,code=0)
beeline> 
beeline> !quit;

回头检查hadoop的core-site.xml


     
	hadoop.proxyuser.liqiang.hosts     
	* 
 

     
	hadoop.proxyuser.liqiang.groups     
	* 

重启 hadoop

#进入 hadoop/sbin
$ ./stop-all.sh
$ ./start-all.sh

重启 hiveserver2

./hiveserver2 
查看数据库

在 hive

hive> show databases;
	OK
	default
	Time taken: 0.551 seconds, Fetched: 1 row(s)
hive> 

hive有一个默认数据库default,创建表,列出所有表:

hive> use default;
hive> create table my_table(id string);
hive> show tables;
	OK
	my_table
	Time taken: 0.027 seconds, Fetched: 1 row(s)
hive> 

在 hiveserver2 下

0: jdbc:hive2://localhost:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
Hive数据

hive数据:

  • 数据存放在HDFS:
    hdfs://Gargantua:9000/user/hive/warehouse

  • 元数据存放在MySQL
    TBLS:表
    DBS:库
    COLUMNS_V2:表的相关字段信息

数据
[liqiang@Gargantua ~]$ hdfs dfs -ls /user/hive/warehouse
元数据

进入MySQL查看my_hive数据库:

mysql> use my_hive
mysql> show tables;
+-------------------------------+
| Tables_in_my_hive             |
+-------------------------------+
| aux_table                     |
| bucketing_cols                |
| cds                           |
| columns_v2                    | 【COLUMNS_V2:表的相关字段信息】
| compaction_queue              |
| completed_compactions         |
| completed_txn_components      |
| ctlgs                         |
| database_params               |
| db_privs                      |
| dbs                           |  【DBS:库】
| delegation_tokens             |
| func_ru                       |
| funcs                         |
| global_privs                  |
| hive_locks                    |
| i_schema                      |
| idxs                          |
| index_params                  |
| key_constraints               |
| master_keys                   |
| materialization_rebuild_locks |
| metastore_db_properties       |
| min_history_level             |
| mv_creation_metadata          |
| mv_tables_used                |
| next_compaction_queue_id      |
| next_lock_id                  |
| next_txn_id                   |
| next_write_id                 |
| notification_log              |
| notification_sequence         |
| nucleus_tables                |
| part_col_privs                |
| part_col_stats                |
| part_privs                    |
| partition_events              |
| partition_key_vals            |
| partition_keys                |
| partition_params              |
| partitions                    |
| repl_txn_map                  |
| role_map                      |
| roles                         |
| runtime_stats                 |
| schema_version                |
| sd_params                     |
| sds                           |
| sequence_table                |
| serde_params                  |
| serdes                        |
| skewed_col_names              |
| skewed_col_value_loc_map      |
| skewed_string_list            |
| skewed_string_list_values     |
| skewed_values                 |
| sort_cols                     |
| tab_col_stats                 |
| table_params                  |
| tbl_col_privs                 |
| tbl_privs                     |
| tbls                          |  【TBLS:表】
| txn_components                |
| txn_to_write_id               |
| txns                          |
| type_fields                   |
| types                         |
| version                       |
| wm_mapping                    |
| wm_pool                       |
| wm_pool_to_trigger            |
| wm_resourceplan               |
| wm_trigger                    |
| write_set                     |
+-------------------------------+

查看刚才在 hive 中创建的表

mysql> select * from tbls where tbl_name = 'my_table';
+--------+-------------+-------+------------------+---------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER   | OWNER_TYPE | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+---------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
|      1 |  1640862757 |     1 |                0 | liqiang | USER       |         0 |     1 | my_table | MANAGED_TABLE | NULL               | NULL               |                    |
+--------+-------------+-------+------------------+---------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
1 row in set (0.00 sec)

TBL_ID:表ID
DB_ID:数据库ID

因此查看 my_table 这张表对应的库的信息:

mysql> select * from dbs where db_id = 1;
mysql> select * from dbs where db_id = (select db_id from tbls where tbl_name = 'my_table');

查看 my_table 这张表对应的表字段的信息:

mysql> select * from columns_v2; 
【列中并没有关联这些column来自哪个表..没有TBL_ID这样的】
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
|     1 | NULL    | id          | string    |           0 |
+-------+---------+-------------+-----------+-------------+

只有一个 CD_ID,而 tabls 中也只有 一个 SD_ID 不是关联的

而cds 表只有个cd_id,也没关联 tbl_id

mysql> select * from cds;
+-------+
| CD_ID |
+-------+
|     1 |
+-------+

接着在sds 表中:

mysql> select * from sds;
+-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------+-------------+------------------------------------------------------------+----------+
| SD_ID | CD_ID | INPUT_FORMAT                             | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION                                           | NUM_BUCKETS | OUTPUT_FORMAT                                              | SERDE_ID |
+-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------+-------------+------------------------------------------------------------+----------+
|     1 |     1 | org.apache.hadoop.mapred.TextInputFormat |               |                           | hdfs://Gargantua:9000/user/hive/warehouse/my_table |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        1 |
+-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------+-------------+------------------------------------------------------------+----------+
1 row in set (0.00 sec)

好像关联上了。

mysql> select *  from  columns_v2 where cd_id = (
    -> select sds.cd_id from sds join tbls on sds.sd_id = tbls.sd_id where tbls.tbl_name = 'my_table'
    -> );
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
|     1 | NULL    | id          | string    |           0 |
+-------+---------+-------------+-----------+-------------+
1 row in set (0.00 sec)
Hive设置属性 配置文件

是全局的,但是需要重启Hive后生效。如 hive-site.xml 中加上:


	  hive.cli.print.header
	  true

	

	  hive.cli.print.current.db
	  true

hive 命令带参数启动
	[liqiang@Gargantua bin]$ hive --hive.cli.print.current.db=ture

临时的,只作用于当前会话。

set

查看参数

	set hive.cli.print.current.db;

修改参数

	set hive.cli.print.current.db=true;

临时的,只作用于当前会话。

hive 命令其他参数

hive -e sql语句 【可以在hive外执行sql】

hive -e “select * from my_table”;

hive -f sql文件
hive -i

jdbc 连接 hs2
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/696418.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号