hive分区和分桶的区别(hive分区表数据迁移)

分区表实际上就是对应一个 HDFS 文件系统上的独立的文件夹，该文件夹下是该分区所
有的数据文件。Hive 中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据
集。在查询时通过 WHERe 子句中的表达式选择查询所需要的指定的分区，这样的查询效率
会提高很多

分区表的基本操作

引入分区数据

dept_20200401.log
dept_20200402.log
dept_20200403.log

创建分区表的基本语法

create table if not exists dept_partition(dept int,
dname string,loc string) partitioned by (day string)
row format delimted fields terminated by " ";

dept_20200401.log

10 ACCOUNTING 1700
20 RESEARCH 1800

dept_20200402.log

30 SALES 1900
40 OPERATIONS 1700

dept_20200403.log

50 TEST 2000
60 DEV 1900

hive (default)> load data local inpath 
'/opt/module/hive/datas/dept_20200401.log' into table dept_partition 
partition(day='20200401');
hive (default)> load data local inpath 
'/opt/module/hive/datas/dept_20200402.log' into table dept_partition 
partition(day='20200402');
hive (default)> load data local inpath 
'/opt/module/hive/datas/dept_20200403.log' into table dept_partition 
partition(day='20200403');

tip:引入数据时必须指定分区！！；
分区表的分区字段可以当成一个字段来使用
for example:

select * from dept_partition where day = "20200403";

增加分区
和修改table 字段类似都使用alter table

alter table dept_partition add partition (day = "xxx");

增加多个字段

alter table dept_partition add partition (day = "xxx") 
partition (day = "xxx");

tip: partition关键字用空格分开

删除字段

alter table dept_partition drop partition (day = "xxx");

删除多个字段

alter table dept_partition drop partition (day = "xxx"),
partition (day = "xxx");

tip:删除字段 partition用，分开

二级分区

如何一天的日志数据量也很大，如何再将数据拆分?
答案显而易见在进行一次分区

hive (default)> create table dept_partition2(
 deptno int, dname string, loc string
 )
 partitioned by (day string, hour string)
 row format delimited fields terminated by 't';

加载数据

hive (default)> load data local inpath 
'/opt/module/hive/datas/dept_20200401.log' into table
dept_partition2 partition(day='20200401', hour='12');

查询分区数据

hive (default)> select * from dept_partition2 where day='20200401' and 
hour='12';

**把数据直接上传到分区目录上，让分区表和数据产生关联的三种方式
**

通过hadoop上传文件

hadoop fs -mkdir /user/hive/warehouse/dept_par/day=2020-10-28

hadoop -put dept1.txt /user/hive/warehouse/dept_par/day=2020-10-28

方法1：
通过msck repari table table_name;进行修复
方法2：
通过在hive上创建对应分区可以手动修复
方法3：
可以通过load命令
load data inpath “/user/hive/warehouse/dept_par/day=2020-10-28” into table table_name partition(day=“xxx”,hour="xxx‘）；
进行 tip:load命令可以操作元数据！

hive分区和分桶的区别(hive分区表数据迁移)

大数据系统相关栏目本月热门文章