Hive 事务在 Hive 3 得到增强。
hive-site.xml 配置测试 创建事务表hive.txn.manager org.apache.hadoop.hive.ql.lockmgr.DbTxnManager Set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive transactions, which also requires appropriate settings for hive.compactor.initiator.on, hive.compactor.worker.threads, hive.support.concurrency (true), and hive.exec.dynamic.partition.mode (nonstrict). The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides no transactions. hive.support.concurrency true hive.compactor.initiator.on true Whether to run the initiator and cleaner threads on this metastore instance or not. Set this to true on one instance of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see hive.txn.manager. hive.compactor.worker.threads 2 How many compactor worker threads to run on this metastore instance. Set this to a positive number on one or more instances of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see hive.txn.manager. Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions themselves. Increasing the number of worker threads will decrease the time it takes tables or partitions to be compacted once they are determined to need compaction. It will also increase the background load on the Hadoop cluster as more MapReduce jobs will be running in the background. hive.exec.dynamic.partition.mode nonstrict In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. In nonstrict mode all partitions are allowed to be dynamic.
create table t1(c1 int,c2 int) stored as orc tblproperties('transactional'='true');
执行以下操作。执行之后,可以看到每个操作在表的目录下生成相应的 delta 目录。
insert into t1 values(1,1),(2,2),(3,3); insert into t1 values(4,4); insert into t1 values(5,5); insert into t1 values(6,6); insert into t1 values(7,7); insert into t1 values(8,8); insert into t1 values(9,9); insert into t1 values(10,10); insert into t1 values(11,11); insert into t1 values(12,12); insert into t1 values(13,13); delete from t1 where c1=13; insert into t1 values(13,14); delete from t1 where c1=13; insert into t1 values(13,15);insert_only 事务
insert_only 事务部要求表必须是 orc 格式。
t2 表仅支持 insert,不支持 delete, update。
insert 语句成功, delete 失败。
create table t2(c1 int,c2 int) stored as orc tblproperties('transactional'='true','transactional_properties'='insert_only');
insert into t2 values(1,1),(2,2),(3,3);
delete from t2 where c1=3;
修改现有表支持事务
全部事务支持
表 t3 从创建的时候没有支持事务,用 alter table 支持事务,之后可以执行 insert, delete 操作。
create table t3(c1 int,c2 int) stored as orc;
alter table t3 set tblproperties('transactional'='true','transactional_properties'='default');
insert into t3 values(1,1),(2,2),(3,3);
delete from t3 where c1=3;
insert_only 事务支持
create table t4(c1 int,c2 int) stored as orc;
alter table t4 set tblproperties('transactional'='true','transactional_properties'='insert_only');
insert into t4 values(1,1),(2,2),(3,3);
Hive 官方文档https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
常见问题- 作业报错
如果在执行的时候,发生了 compaction,compaction 之后删除了原来的文件,导致抛出 FileNotFoundException: File does not exist:.2 个会话同时执行 insert overwrite 抛出 LockException
创建表
create table t1(c1 int) stored as orc tblproperties('transactional'='true');
在两个窗口里分别用 beeline 连接 hiveserver。
在会话1,执行以下命令,函数 timesleep 会 sleep 10秒钟,返回 10001。
insert overwrite table t1 select default.timesleep(10000);
在会话2,执行以下命令,会话2 的命令在会话1之前执行完毕。
insert overwrite table t1 select 1;
会话1抛出以下异常
ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:126. Reason: Aborting [txnid:126,128] due to a write conflict on test/t1 committed by [txnid:127,127] u/u) org.apache.hadoop.hive.ql.lockmgr.LockException: Transaction manager has aborted the transaction txnid:126. Reason: Aborting [txnid:126,128] due to a write conflict on test/t1 committed by [txnid:127,127] u/u at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:670)
由于函数在编译阶段给出具体值。
select * from t1; +--------+ | t1.c1 | +--------+ | 1 | | 2 | | 3 | +--------+
以下两个语句:
会话1:
insert overwrite table t2 select c1,default.timesleep(10000 * c1) from t1;
会话2:
insert overwrite table t2 select c1,default.timesleep(10 * c1) from t1;
会话2 会在 commit 的时候失败。
ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:163. Reason: Aborting [txnid:163,163] due to a write conflict on test/t2 committed by [txnid:162,163] u/u) org.apache.hadoop.hive.ql.lockmgr.LockException: Transaction manager has aborted the transaction txnid:163. Reason: Aborting [txnid:163,163] due to a write conflict on test/t2 committed by [txnid:162,163] u/u at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:670)
- 在执行 insert overwrite 时,在另外的会话执行drop table,导致第 1 个会话抛出表找不到异常。示例如下:
0: jdbc:hive2://localhost:10000/default> insert overwrite table t1 select default.timesleep(10000); Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:23 Table not found 't1' (state=42S02,code=10001)
show transactions 可以显示 aborted 的 transaction。
show transactions; +-----------------+--------------------+----------------+----------------------+-------------+------------------------+ | txnid | state | startedtime | lastheartbeattime | user | host | +-----------------+--------------------+----------------+----------------------+-------------+------------------------+ | Transaction ID | Transaction State | Started Time | Last Heartbeat Time | User | Hostname | | 126 | ABORTED | 1646189665000 | 1646189665000 | houzhizhen | localhost.localdomain | | 130 | OPEN | 1646189960000 | 1646189973000 | houzhizhen | localhost.localdomain | +-----------------+--------------------+----------------+----------------------+-------------+------------------------+
- 两个会话同时 insert 不会有冲突。
参考资料:
官方文档:Hive+Transactions
PPT Transactional Operations in Apache Hive:Present and Future



