栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

【Hive】Hive 分区表、桶的基本操作

【Hive】Hive 分区表、桶的基本操作


文章目录

一、Hive 分区表

1、创建分区表2、插入数据3、查看分区4、修改表分区5、删除表分区 二、Hive 桶

1、创建桶2、插入数据3、抽样桶表


环境准备

Hadoop 完全分布式(一主两从即可)MySQL环境、Hive环境


一、Hive 分区表 1、创建分区表

在 Hive 中创建一个分区表 goods,包含 goods_id 和 goods_status 两个字段,字符类型为 string ,分区为 cat_id ,字符类型为 string ,以 ‘t’ 为分隔符:

hive> create table if not exists goods(goods_id string,goods_status string)
    > partitioned by (cat_id string)
    > row format delimited fields terminated by 't';
OK
Time taken: 0.424 seconds

查看表 goods 的表结构:

hive> desc goods;
OK
goods_id            	string              	                    
goods_status        	string              	                    
cat_id              	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
cat_id              	string              	                    
Time taken: 0.226 seconds, Fetched: 8 row(s)

这样一个分区表就建好啦~

返回顶部


2、插入数据

向分区表中插入数据,将本地 /home/data/hive-data 下的 goods 中的数据插入到分区表中。

首先在 Hive 中创建一个非分区表 goods1 ,用于存储本地 goods 文件的数据:

hive> create table goods1(goods_id string,goods_status string,cat_id string)
    > row format delimited fields terminated by 't';
OK
Time taken: 0.084 seconds

将本地数据导入 goods1:

hive> load data local inpath '/home/data/hive-data/goods' into table goods1;
Loading data to table db.goods1
OK
Time taken: 1.316 seconds

再将表 goods1 中的数据导入到分区表 goods 中:

hive> insert into table goods partition(cat_id='52052') select goods_id,goods_status from goods1 where cat_id='52052';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220306171401_5343b66f-8685-487a-be77-537fd35ccb3a
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1646556082284_0003, Tracking URL = http://server:8088/proxy/application_1646556082284_0003/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1646556082284_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-03-06 17:14:52,886 Stage-1 map = 0%,  reduce = 0%
2022-03-06 17:14:59,984 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.7 sec
MapReduce Total cumulative CPU time: 1 seconds 700 msec
Ended Job = job_1646556082284_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://192.168.64.183:9000/user/hive/warehouse/db.db/goods/cat_id=52052/.hive-staging_hive_2022-03-06_17-14-01_837_747076726384884371-1/-ext-10000
Loading data to table db.goods partition (cat_id=52052)
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.7 sec   HDFS Read: 213175 HDFS Write: 378 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 700 msec
OK
Time taken: 60.848 seconds

查看结果:

hive> select * from goods limit 10;
OK
1020405	6	52052
1020406	6	52052
1020408	6	52052
1020326	6	52052
1020407	6	52052
1020404	6	52052
1020327	6	52052
1020409	6	52052
1020325	6	52052
1024641	1	52052`在这里插入代码片`
Time taken: 0.136 seconds, Fetched: 10 row(s)

返回顶部


3、查看分区

查看 goods 表中的分区:

hive> show partitions goods;
OK
cat_id=52052
Time taken: 0.128 seconds, Fetched: 1 row(s)

返回顶部


4、修改表分区

修改分区表 goods 中的分区列为 cat_id=52051,并查看修改后的分区名:

hive> alter table goods partition(cat_id=52052 ) rename to partition(cat_id=52051);
OK
Time taken: 0.391 seconds
hive> show partitions goods;
OK
cat_id=52051
Time taken: 0.061 seconds, Fetched: 1 row(s)

返回顶部


5、删除表分区

删除 goods 表中的 cat_id 分区:

hive> alter table goods drop if exists partition (cat_id='52051');
Dropped the partition cat_id=52051
OK
Time taken: 0.456 seconds
hive> show partitions goods;
OK
Time taken: 0.059 seconds

返回顶部


二、Hive 桶 1、创建桶

设置环境变量,开启Hive分桶:

hive> set hive.enforce.bucketing=true;

创建一个名为 goods_t 的表,包含两个字段:goods_id 和 goods_status,字符类型首位 string,按 cat_id string 作为分区,按照 goods_status 列聚类和 goods_id 列排列,划分成两个桶:

hive> create table goods_t(goods_id string,goods_status string) 
    > partitioned by (cat_id string)
    > clustered by (goods_status)
    > sorted by (goods_id)
    > into 2 buckets;
OK
Time taken: 0.08 seconds

返回顶部


2、插入数据

向 goods_t 表中插入 goods1 表中的数据:

hive> insert overwrite table goods_t partition(cat_id='52063') select goods_id,goods_status from goods1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220306173319_d8682df4-91ba-42cc-b3cd-76ced5fdf898
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1646556082284_0004, Tracking URL = http://server:8088/proxy/application_1646556082284_0004/
Kill Command = /usr/local/src/hadoop/bin/hadoop job  -kill job_1646556082284_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
2022-03-06 17:33:45,839 Stage-1 map = 0%,  reduce = 0%
2022-03-06 17:33:51,130 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.31 sec
2022-03-06 17:33:58,420 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 3.31 sec
2022-03-06 17:34:12,914 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.1 sec
MapReduce Total cumulative CPU time: 5 seconds 100 msec
Ended Job = job_1646556082284_0004
Loading data to table db.goods_t partition (cat_id=52063)
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 2   Cumulative CPU: 5.1 sec   HDFS Read: 220296 HDFS Write: 112278 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 100 msec
OK
Time taken: 54.668 seconds

返回顶部


3、抽样桶表

查看表数据:

hive> select * from goods_t tablesample(bucket 1 out of 2 on goods_status) limit 10;
OK
1000016	4	52063
1000018	4	52063
1000096	4	52063
1000218	4	52063
1000219	4	52063
1000223	4	52063
1000310	4	52063
1000368	4	52063
1000371	4	52063
1000380	4	52063
Time taken: 0.131 seconds, Fetched: 10 row(s)

返回顶部


转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/752889.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号