Impala入门操作

Impala简介

用于处理存储在Hadoop集群中的大量数据的大规模并行处理SQL查询引擎，高性能低延迟
底层基于C++编写
优点
- 执行数据处理时，不需要对存储在Hadoop上的数据进行转换或移动
- 可以用传统的SQL处理数据
- 使用Rarquet文件格式
缺点
- 不提供对序列化和反序列化的支持
- 只能读取文本文件，不能读取自定义二进制文件

架构

Impala daemon

即所谓的impalad，是运行在集群每个节点的守护进程，主要负责读写数据，接受其他接口的查询请求，并于其他节点分布式并行工作，将本节点的查询结果返回给中心协调点。

Impala Statestore

用于检查集群中impalad节点的健康情况，保证不将请求放给不可用的节点上
对集群各节点进行信息同步，相当于监控功能

Impala Catalog Serveice

即catalogd，当impala集群中执行的SQL语句会引起元数据变化时，catalog服务会将这些变化推送到其他的impalad进程节点上
一般将statestore和catalog放在一个节点上

查询处理接口

Impala-shell：命令行
Hue：浏览器
ODBC/JDBC驱动程序

语句命令

数据库命令

-- 创建数据库
creat database if not exisis my_database;

-- 选择数据库
use my_database;

-- 删除数据库
drop database is exsis my_database;

表命令

基础增删改查

-- 新建表
create table if not exists my_database.my_table(col1 type2, col2 type2);
create table my_table_copy as 
selet * from my_table;

-- 插入数据（追加）
insert into my_table value (v1, v2);

-- 插入数据（覆盖）
insert overwrite my_table value (v1, v2);

-- 获取数据
select col1, col2 from my_table;

-- 查看描述
describe my_table;

-- 改名表
alter table my_table rename to table_new;

-- 插入列
alter table my_table add columns (col3 type3, col4 type4);

-- 删除列,column可加可不加
alter table my_table drop [column] col4;

-- 更改column的数据类型和名称
alter table my_table change col3 col4 type4;

-- 删除表, drop为删除整个表，truncate为删除表数据
drop table if exists table_copy;
truncate table if exists table_copy;

-- 显示数据库中的所有表
use my_database;
show tables;

-- 创建视图，视图和临时表类似，但是他不是物理表而是虚拟表，主要是方便查询，同时可以不改变原表的结构
create view if not exists table_view as select col1, col2 from my_table;

-- 更改视图
alter view table_view as select col1, col3 from my_table;

-- 删除视图
drop view table_view;

其他基础操作

-- 排序，asc为升序，desc为降序，nulls first表示将空值放到表头，nulls last表示将空值放到表尾
select * from my_table order by col1 [asc|desc] [nulls first|nulls last]

-- 聚合,按照col1聚合，func()为一些逻辑函数，如count(), sum(), max()等，下句表示按照col聚合，求相同col1对应的col2的和。同时过滤group by得到的结果不能用where，此时一般会用having，作用与where类似
select col1, sum(col2) from my_table group by col1;
select col1, sum(col2) from my_table group by col2 where col2>0;

-- 限制取前n行，一般limit可以和offset连用，表示取偏移后的n行,第一句表示取(1-n)，第二句表示取(m-m+n-1)
select * from my_table limit n;
select * from my_table limit n offset m;

-- 连接两个命令
select * from my_table limt 1 
union
select * from table_copy limit 4 offset 4;

-- with as,类似将查询结果作为一个临时表
with t1 as (select * from customers where age>25), 
t2 as (select * from employee where age>25) 
(select * from t1 union select * from t2);

-- 根据某几列去重
select distinct * from my_table;

Impala入门操作

C/C++/C#相关栏目本月热门文章