整合MySQL hive-site.xmlvi ~/.bashrc
vi ~/app/hive/conf/hive-env.sh 【HADOOP_HOME已有环境变量,可不用再在hive-env.sh 设置】
hive不提供,需要自己在 conf/下添加,url、driver、user、password
[liqiang@Gargantua conf]$ pwd /home/liqiang/app/hive/conf [liqiang@Gargantua conf]$ vi hive-site.xml
mysql-connector-java-5.1.49.jarjavax.jdo.option.ConnectionURL jdbc:mysql://ip:3306/my_hive?createDatabaseIfNotExist=true javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName 账号 javax.jdo.option.ConnectionPassword 密码
【hive不提供,可以在自己maven仓库里上传至 ~app/hive/lib/】
初始化数据库到MySQL固定写法:./schematool -dbType mysql -initSchema
如没做这一步init操作,不会在mysql中初始化表;Hive将元数据保存在MySQL
[liqiang@Gargantua bin]$ ./schematool -dbType mysql -initSchemahive 启动
[liqiang@Gargantua ~]$ cd ~/app/hive/bin [liqiang@Gargantua bin]$ hive
报错:com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)
[liqiang@Gargantua bin]$ hive which: no hbase in (/home/liqiang/app/hive/bin:/home/liqiang/app/hadoop/bin:/home/liqiang/app/hadoop/sbin:/usr/java/jdk1.8.0_121/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/liqiang/.local/bin:/home/liqiang/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/liqiang/app/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/liqiang/app/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) // ... at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
参考:https://www.cnblogs.com/syq816/p/12632028.html 解决。
即把hadoop(hadoop/share/hadoop/common/lib) 下的 guava-27.0-jre.jar 复制到hive/lib,保持jar版本一致
再次启动:[liqiang@Gargantua bin]$ hive
成功启动,进入Hive。
关闭Hive
hive> quit;hiveserver2 启动
前端启动 hiveserver2 【ssh终端关闭后服务就会停止】
[liqiang@Gargantua bin]$ ./hiveserver2 or [liqiang@Gargantua bin]$ hive --service hiveserver2
后端启动 hiveserver2
[liqiang@Gargantua bin]$ nohub sh hiveserver2 &
nohup : 不挂断的运行,免疫session的SIGHUP信号。 no hang up(不挂断) 的缩写
& :在后台运行,免疫(Ctrl + C)SIGINT信号。
hiveserver2 启动后,可以使用 beeline 联接
[liqiang@Gargantua bin]$ ./beeline beeline> !connect jdbc:hive2://localhost:10000 liqiang Connecting to jdbc:hive2://localhost:10000 Enter password for jdbc:hive2://localhost:10000:
然后报错。。
21/12/31 00:43:00 [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:10000 Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: liqiang is not allowed to impersonate hadoop (state=08S01,code=0) beeline> beeline> !quit;
回头检查hadoop的core-site.xml
hadoop.proxyuser.liqiang.hosts * hadoop.proxyuser.liqiang.groups *
重启 hadoop
#进入 hadoop/sbin $ ./stop-all.sh $ ./start-all.sh
重启 hiveserver2
./hiveserver2查看数据库
在 hive
hive> show databases; OK default Time taken: 0.551 seconds, Fetched: 1 row(s) hive>
hive有一个默认数据库default,创建表,列出所有表:
hive> use default; hive> create table my_table(id string); hive> show tables; OK my_table Time taken: 0.027 seconds, Fetched: 1 row(s) hive>
在 hiveserver2 下
0: jdbc:hive2://localhost:10000> show databases; +----------------+ | database_name | +----------------+ | default | +----------------+Hive数据
hive数据:
数据
数据存放在HDFS:
hdfs://Gargantua:9000/user/hive/warehouse元数据存放在MySQL
TBLS:表
DBS:库
COLUMNS_V2:表的相关字段信息
[liqiang@Gargantua ~]$ hdfs dfs -ls /user/hive/warehouse元数据
进入MySQL查看my_hive数据库:
mysql> use my_hive mysql> show tables; +-------------------------------+ | Tables_in_my_hive | +-------------------------------+ | aux_table | | bucketing_cols | | cds | | columns_v2 | 【COLUMNS_V2:表的相关字段信息】 | compaction_queue | | completed_compactions | | completed_txn_components | | ctlgs | | database_params | | db_privs | | dbs | 【DBS:库】 | delegation_tokens | | func_ru | | funcs | | global_privs | | hive_locks | | i_schema | | idxs | | index_params | | key_constraints | | master_keys | | materialization_rebuild_locks | | metastore_db_properties | | min_history_level | | mv_creation_metadata | | mv_tables_used | | next_compaction_queue_id | | next_lock_id | | next_txn_id | | next_write_id | | notification_log | | notification_sequence | | nucleus_tables | | part_col_privs | | part_col_stats | | part_privs | | partition_events | | partition_key_vals | | partition_keys | | partition_params | | partitions | | repl_txn_map | | role_map | | roles | | runtime_stats | | schema_version | | sd_params | | sds | | sequence_table | | serde_params | | serdes | | skewed_col_names | | skewed_col_value_loc_map | | skewed_string_list | | skewed_string_list_values | | skewed_values | | sort_cols | | tab_col_stats | | table_params | | tbl_col_privs | | tbl_privs | | tbls | 【TBLS:表】 | txn_components | | txn_to_write_id | | txns | | type_fields | | types | | version | | wm_mapping | | wm_pool | | wm_pool_to_trigger | | wm_resourceplan | | wm_trigger | | write_set | +-------------------------------+
查看刚才在 hive 中创建的表
mysql> select * from tbls where tbl_name = 'my_table'; +--------+-------------+-------+------------------+---------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+ | TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | OWNER_TYPE | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED | +--------+-------------+-------+------------------+---------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+ | 1 | 1640862757 | 1 | 0 | liqiang | USER | 0 | 1 | my_table | MANAGED_TABLE | NULL | NULL | | +--------+-------------+-------+------------------+---------+------------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+ 1 row in set (0.00 sec)
TBL_ID:表ID
DB_ID:数据库ID
因此查看 my_table 这张表对应的库的信息:
mysql> select * from dbs where db_id = 1; mysql> select * from dbs where db_id = (select db_id from tbls where tbl_name = 'my_table');
查看 my_table 这张表对应的表字段的信息:
mysql> select * from columns_v2; 【列中并没有关联这些column来自哪个表..没有TBL_ID这样的】 +-------+---------+-------------+-----------+-------------+ | CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX | +-------+---------+-------------+-----------+-------------+ | 1 | NULL | id | string | 0 | +-------+---------+-------------+-----------+-------------+
只有一个 CD_ID,而 tabls 中也只有 一个 SD_ID 不是关联的
而cds 表只有个cd_id,也没关联 tbl_id
mysql> select * from cds; +-------+ | CD_ID | +-------+ | 1 | +-------+
接着在sds 表中:
mysql> select * from sds; +-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------+-------------+------------------------------------------------------------+----------+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------+-------------+------------------------------------------------------------+----------+ | 1 | 1 | org.apache.hadoop.mapred.TextInputFormat | | | hdfs://Gargantua:9000/user/hive/warehouse/my_table | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 1 | +-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------+-------------+------------------------------------------------------------+----------+ 1 row in set (0.00 sec)
好像关联上了。
mysql> select * from columns_v2 where cd_id = (
-> select sds.cd_id from sds join tbls on sds.sd_id = tbls.sd_id where tbls.tbl_name = 'my_table'
-> );
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
| 1 | NULL | id | string | 0 |
+-------+---------+-------------+-----------+-------------+
1 row in set (0.00 sec)
Hive设置属性
配置文件
是全局的,但是需要重启Hive后生效。如 hive-site.xml 中加上:
hive 命令带参数启动hive.cli.print.header true hive.cli.print.current.db true
[liqiang@Gargantua bin]$ hive --hive.cli.print.current.db=ture
临时的,只作用于当前会话。
set查看参数
set hive.cli.print.current.db;
修改参数
set hive.cli.print.current.db=true;
临时的,只作用于当前会话。
hive 命令其他参数hive -e sql语句 【可以在hive外执行sql】
hive -e “select * from my_table”;
hive -f sql文件
hive -i



