hive是基于hadoop的一个(hbase映射到hive外部表)

1.内部表操作指定建表存放hdfs位置

文章目录

1.内部表操作

指定建表存放hdfs位置从已有表复制创建

复制数据复制表结构不复制数据 2.外部表操作

外部表上传数据外部表加载数据 3.分区表操作

创建分区表分区表加载数据多分区表加载数据分区表查询 4.分桶操作5.修改表结构

修改表名添加列

hive> create table if not exists stu3(id int ,name string) row format delimited fields terminated by 't' location '/user/stu3';

从已有表复制创建复制数据

create table stu4 as select * from stu3;

复制表结构不复制数据

create table stu4 like stu3;

2.外部表操作

定义：指定其他hdfs路径的数据加载到表当中，hive不会完全独占，删除hive表时，hdfs仍存在数据。

使用场景：每天的日志信息，定期流入共享，在外部表的基础上做统计分析，用内部表存储

没有表但是数据文件还存在

hive> create external table if not exists stu3_ex(id int ,name string) row format delimited fields terminated by 't' location '/user/stu3_ex';

外部表上传数据

hive的insert也是mapreduce操作，销量低，并不适用；
可以直接上传数据文件：

[root@node02 ~]# vim teacher.txt
[root@node02 ~]# hdfs dfs -put teacher.txt /user/teacher_ex

注： teacher.txt 文件格式需要与表字段类型一致

外部表加载数据

如果从hdfs加载，做的是剪切操作：数据源文件剪切、粘贴到表目录下。

加载本地数据

load data local inpath ‘/export/servers/teacher.csv’ into table  teacher_ex;

加载并且覆盖

load data local inpath ‘/export/servers/teacher.csv’  overwrite into table  teacher_ex;

3.分区表操作

大文件按照不同条件放在不同文件夹中。创建分区表

hive> create table score2(s_id string, c_id string, s_score int) partitioned by (year string,month string) row format delimited fields terminated by 't';

分区表加载数据

load data local inpath '/export/servers/score.csv' into table score partition(month='201801') ;

多分区表加载数据

hive> load data local inpath '/root/score.csv' into table score2 partition(year='2022',month='202203');

分区表查询

建立表与数据文件之间的映射关系

msck repair table score4;

查询分区

show partitions score;

查询所有分区

select * from score ;

查询单个分区

select * from score  where month = '202201';

4.分桶操作

将数据按照指定字段划分到多个文件当中——mapreduce中的分区。

分别设置分桶参数和reduces数量

hive> set hive.enforce.bucketing=true;
hive> set mapreduce.job.reduces=3;

创建分桶表

create external table score5(s_id string, c_id string, s_score int) clustered by(c_id) into 3 buckets row format delimited fields terminated by 't' location '/score_data2';

需要中间表向分桶表加载数据。

insert overwrite table score5 select s_id,c_id,s_score from score4 cluster by (c_id);

5.修改表结构修改表名

alter table score4 rename to score5;

添加列

alter table score5 add columns(mycol string, mysco int);

hive是基于hadoop的一个(hbase映射到hive外部表)

大数据系统相关栏目本月热门文章