栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Hive Transaction(Hive 事务管理)

Hive Transaction(Hive 事务管理)

Hive 事务在 Hive 3 得到增强。

hive-site.xml 配置

   hive.txn.manager
   org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
   
     Set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive
     transactions, which also requires appropriate settings for hive.compactor.initiator.on,
     hive.compactor.worker.threads, hive.support.concurrency (true),
     and hive.exec.dynamic.partition.mode (nonstrict).
     The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides
     no transactions.
   
 
 
   hive.support.concurrency
   true
 
 
   hive.compactor.initiator.on
   true
   
     Whether to run the initiator and cleaner threads on this metastore instance or not.
     Set this to true on one instance of the Thrift metastore service as part of turning
     on Hive transactions. For a complete list of parameters required for turning on
     transactions, see hive.txn.manager.
   
 
 
   hive.compactor.worker.threads
   2
   
     How many compactor worker threads to run on this metastore instance. Set this to a
     positive number on one or more instances of the Thrift metastore service as part of
     turning on Hive transactions. For a complete list of parameters required for turning
     on transactions, see hive.txn.manager.
     Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions
     themselves. Increasing the number of worker threads will decrease the time it takes
     tables or partitions to be compacted once they are determined to need compaction.
     It will also increase the background load on the Hadoop cluster as more MapReduce jobs
     will be running in the background.
   
 
 
   hive.exec.dynamic.partition.mode
   nonstrict
   
     In strict mode, the user must specify at least one static partition
     in case the user accidentally overwrites all partitions.
     In nonstrict mode all partitions are allowed to be dynamic.
   
 
测试 创建事务表
create table t1(c1 int,c2 int) stored as orc tblproperties('transactional'='true');

执行以下操作。执行之后,可以看到每个操作在表的目录下生成相应的 delta 目录。

insert into t1 values(1,1),(2,2),(3,3);
insert into t1 values(4,4);
insert into t1 values(5,5);
insert into t1 values(6,6);
insert into t1 values(7,7);
insert into t1 values(8,8);
insert into t1 values(9,9);
insert into t1 values(10,10);
insert into t1 values(11,11);
insert into t1 values(12,12);
insert into t1 values(13,13);

delete from t1 where c1=13;
insert into t1 values(13,14);
delete from t1 where c1=13;
insert into t1 values(13,15);
insert_only 事务

insert_only 事务部要求表必须是 orc 格式。
t2 表仅支持 insert,不支持 delete, update。
insert 语句成功, delete 失败。

create table t2(c1 int,c2 int) stored as orc tblproperties('transactional'='true','transactional_properties'='insert_only');

insert into t2 values(1,1),(2,2),(3,3);
delete from t2 where c1=3;
修改现有表支持事务 全部事务支持

表 t3 从创建的时候没有支持事务,用 alter table 支持事务,之后可以执行 insert, delete 操作。

create table t3(c1 int,c2 int) stored as orc;
alter table t3 set  tblproperties('transactional'='true','transactional_properties'='default');
insert into t3 values(1,1),(2,2),(3,3);
delete from t3 where c1=3;
insert_only 事务支持
create table t4(c1 int,c2 int) stored as orc;
alter table t4 set  tblproperties('transactional'='true','transactional_properties'='insert_only');
insert into t4 values(1,1),(2,2),(3,3);

Hive 官方文档https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

常见问题
    作业报错
    如果在执行的时候,发生了 compaction,compaction 之后删除了原来的文件,导致抛出 FileNotFoundException: File does not exist:.2 个会话同时执行 insert overwrite 抛出 LockException

创建表

create table t1(c1 int) stored as orc tblproperties('transactional'='true');

在两个窗口里分别用 beeline 连接 hiveserver。

在会话1,执行以下命令,函数 timesleep 会 sleep 10秒钟,返回 10001。

insert overwrite table t1 select default.timesleep(10000);

在会话2,执行以下命令,会话2 的命令在会话1之前执行完毕。

insert overwrite table t1 select 1;

会话1抛出以下异常

ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:126.  Reason: Aborting [txnid:126,128] due to a write conflict on test/t1 committed by [txnid:127,127] u/u)
org.apache.hadoop.hive.ql.lockmgr.LockException: Transaction manager has aborted the transaction txnid:126.  Reason: Aborting [txnid:126,128] due to a write conflict on test/t1 committed by [txnid:127,127] u/u
	at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:670)

由于函数在编译阶段给出具体值。

select * from t1;
+--------+
| t1.c1  |
+--------+
| 1      |
| 2      |
| 3      |
+--------+

以下两个语句:
会话1:

insert overwrite table t2 select c1,default.timesleep(10000 * c1) from t1;

会话2:

insert overwrite table t2 select c1,default.timesleep(10 * c1) from t1;

会话2 会在 commit 的时候失败。

ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:163.  Reason: Aborting [txnid:163,163] due to a write conflict on test/t2 committed by [txnid:162,163] u/u)
org.apache.hadoop.hive.ql.lockmgr.LockException: Transaction manager has aborted the transaction txnid:163.  Reason: Aborting [txnid:163,163] due to a write conflict on test/t2 committed by [txnid:162,163] u/u
	at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:670)

    在执行 insert overwrite 时,在另外的会话执行drop table,导致第 1 个会话抛出表找不到异常。示例如下:
0: jdbc:hive2://localhost:10000/default> insert overwrite table t1 select default.timesleep(10000);
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:23 Table not found 't1' (state=42S02,code=10001)

show transactions 可以显示 aborted 的 transaction。

show transactions;
+-----------------+--------------------+----------------+----------------------+-------------+------------------------+
|      txnid      |       state        |  startedtime   |  lastheartbeattime   |    user     |          host          |
+-----------------+--------------------+----------------+----------------------+-------------+------------------------+
| Transaction ID  | Transaction State  | Started Time   | Last Heartbeat Time  | User        | Hostname               |
| 126             | ABORTED            | 1646189665000  | 1646189665000        | houzhizhen  | localhost.localdomain  |
| 130             | OPEN               | 1646189960000  | 1646189973000        | houzhizhen  | localhost.localdomain  |
+-----------------+--------------------+----------------+----------------------+-------------+------------------------+

    两个会话同时 insert 不会有冲突。

参考资料:
官方文档:Hive+Transactions
PPT Transactional Operations in Apache Hive:Present and Future

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/752247.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号