【HBase之轨迹】（2）使用 hbase命令和 JavaAPI 操作 HBase（包括复杂查询--过滤器，HBase 工具类）

—— 目录 ——

0. 前置1. 命令行使用

① DDL ：表的增删改查② DML ：数据的增删改查③ 原子自增 incr 2. 过滤器：复杂查询语句

① 默认过滤器② 比较器③ 实战例子 3. Java API 使用

① Hbase 工具类② 使用方法 4. 写在最后

0. 前置

前面一弹介绍了如何使用 docker 搭建 hbase
【Hbase之轨迹】（1）使用 Docker 搭建 Hbase 集群

现在开始通过 hbase 命令和 JavaAPI 使用 Hbase
对 Hbase 的介绍、运作流程原理和架构等，将在总集篇一起放出来

本篇介绍了 Hbase 一系列命令的使用，包括表的增删改查，数据的增删改查
同时列出了官网中提到的各个过滤器和比较器及其功能，最后在命令行和 Java 中进行使用
JavaAPI 另外整理了一个工具集，以及其简单使用

1. 命令行使用

首先进入操作界面

./hbase shell

① DDL ：表的增删改查

操作过程中，可以关注 Web 端的 Tables，能看到创建的表的信息

1)	创建表
	create <表名>,<列族名1>,<列族名2>...
例：	create 'user','address','info'

2)	查询所有用户表
	list

3)	查看表详情，可以看到各列族的属性
	describe <表名>
例：	describe 'user'

4)	更改表中列族的属性
	alter <表名>,{NAME=><列族名>,<属性名称>=><属性值>}
例：	alter 'user',{NAME=>'address',VERSIONS=>3}
	该例将 address 列族的最大版本数改为了 3
	表示列族将存储最近 3 个版本的数据，更旧的数据删去
	默认为 1，表示只保留最新版本，有新数据时，其余旧版本的数据都将倍删去

5)	删除表
	先 disable <表名>，后 drop <表名>
例：	disable 'user'
	drop 'user'

6)	创建命名空间
	create_namespace <命名空间名>
例：	create_namespace 'iceclean'

7)	查看所有命名空间
	list_namespace

8)	在指定的命名空间中建表
	create <命名空间名:表名>
例：	create 'iceclean:skill','live','program'
	上边在建表时，没有指定哪一个命名空间，默认就在 default 下建表

9)	删除命名空间
	要删除的命名空间必须为空（下面没有表），不为空的话得先将表删除
	drop_namespace <命名空间名>

② DML ：数据的增删改查

注意：在增删操作中，最后边都可以加上时间戳手动指明时间，不加的花由系统默认生成

1)	插入数据（兼更新数据，只要指定了同一个列，就是更新）
	put <表名>,<行键>,<列族名:列名>,<值>
例：	put 'user','1','info:name','iceclean'

2)	删除列族/列
	delete <表名>,<行键>,<列族名>
	delete <表名>,<行键>,<列族名:列名>
	注意：如果某个列有多个版本，删除的是最新的版本（所以老版本会跳出来）

3)	删除一整行数据
	deleteall <表名>,<行键>

4)	清空表中的数据
	truncate <表名>

5)	指定行键查询，也可以精确到列族或者列
	get <表名>,<行键>
	get <表名>,<行键>,<列族名>
	get <表名>,<行键>,<列族名:列名>
	注意：如果出现中文，可以加一个参数：{FORMATTER => "toString"}

6)	查询列，且精确到版本号
	get <表名>,<行键>,{COLUMN=><列族名:列名>,VERSIONS=><数量>}
例：	get 'user','101c',{COLUMN=>'info:name',VERSIONS=>10}
	注意，这里虽然指明了要查看 10 个版本，也确实修改了很多个版本
	但如果在建表时，该列指定保留的版本数只为 1，则这里依旧只显示 1 个版本
	因为其余的旧版本并没有被保留下来，自然就查不到啦

7)	全表扫描查询数据
	scan <表名>

8)	计算表数据量
	count <表名> 
	
9)	范围查询，左闭右开，STARTROW 缺省默认为最小，STOPROW 缺省默认为最大
	scan <表名>,{STARTROW=><行键>,STOPROW=><行键>}
例：	scan 'user',{STARTROW=>'101a',STOPROW=>'103f'}

10)	指定列名查询
	增加参数：{COLUMNS => ['<列族名1>:<列名1>', '<列族名2>:<列名2>', ...]}

11)	限制查询
	增加参数：{LIMIT => 3}

12)	查看操作日志
	其中 VERSIONS 表示每条数据最多查看到第几个版本
	scan <表名>,{RAW=>true,VERSIONS=>10}

③ 原子自增 incr

在很多情况下，我们需要某个列值进行自增，如常见的点赞收藏和阅读量等的自增
使用 put 创建的列是不支持自增的，需要使用 incr，语法如下：

icnr <表名>, <行键>, <列族名:列名>, [累加值，默认为1]

2. 过滤器：复杂查询语句

使用 get 只能通过 rowKey 查询数据
而当需要通过条件过滤查询数据时，get 做不到，就需要用到过滤器了

过滤器通常和 scan 结合使用
其实底层也是调用了 Hbase 的 JavaAPI，后边 SpringBoot 整合会直接 new 出来用

① 默认过滤器

以下过滤器都有对应的 Java 实现类

种类	过滤器名	功能
行键过滤器	RowFilter	实现行键字符串的比较和过滤
	PrefixFilter	rowkey 前缀过滤器
	KeyOnlyFilter	只对单元格的键进行过滤和显示，不显示值
	FirstKeyOnlyFilter	只扫描显示相同键的第一个单元格，其键值对会显示出来
	InclusiveStopFilter	替代 ENDROW 返回终止条件行
列过滤器	FamilyFilter	列簇过滤器，只显示对应列簇的数据
	QualifierFilter	列标识过滤器，只显示对应列名的数据
	ColumnPrefixFilter	对列名称的前缀进行过滤
	MultipleColumnPrefixFilter	可以指定多个前缀对列名称过滤
	ColumnRangeFilter	过滤列名称的范围
值过滤器	ValueFilter	找到符合值条件的键值对
	SingleColumnValueFilter	按指定列和指定值过滤，相当于 where key [比较符] value
	SingleColumnValueExcludeFilter	过滤掉匹配上的键值对
其他过滤器	ColumnPaginationFilter	对一行的所有列分页，只返回 [offset,offset+limit] 范围内的列
	PageFilter	对显示结果按行进行分页显示
	TimestampsFilter	时间戳过滤，支持等值，可以设置多个时间戳
	ColumnCountGetFilter	限制每个逻辑行返回键值对的个数，在 get 方法中使用
	DependentColumnFilter	允许用户指定一个参考列或引用列来过滤其他列的过滤器

② 比较器

无论在 Java 还是在 Shell 中，进行过滤都需要比较器辅佐进行

比较器名	功能描述	表达式缩写
BinaryComparator	匹配完整字节数组（字符串）	binary:值
BinaryPrefixComparator	匹配字节数组前缀（字符串前缀）	binaryprefix:值
BitComparator	匹配比特位	bit:值
NullComparator	匹配空值	null
RegexStringComparator	匹配正则表达式	regexstring:值
SubstringComparator	匹配子字符串	substring:值

③ 实战例子

首先建表以及演示数据：
这里建了一张描述前端标签元素的表，因为没换个标签可以有各自不同的属性，所以可以充分利用 Hbase 可随意增加列的特性，来存储它们各自不同的属性
数据只简单插入了两条，更多的数据再自己插入就欧了

这里范例表只用了一个列族，因为如果又多个列族的话会降低 Hbase 性能，非必要的情况下一个列族就够了

create 'stardust', 'info'

# 范例数据
put 'stardust', '1', 'info:name', 'btn1'
put 'stardust', '1', 'info:kind', 'button'
put 'stardust', '1', 'info:belong', 'root'
put 'stardust', '1', 'info:x', '100px'
put 'stardust', '1', 'info:y', '200px'
put 'stardust', '1', 'info:width', '50px'
put 'stardust', '1', 'info:height', '100px'
put 'stardust', '1', 'info:store-int', 60

put 'stardust', '2', 'info:name', 'text1'
put 'stardust', '2', 'info:kind', 'label'
put 'stardust', '2', 'info:belong', 'root'
put 'stardust', '2', 'info:x', '100px'
put 'stardust', '2', 'info:y', '150px'
put 'stardust', '2', 'info:width', '30px'
put 'stardust', '2', 'info:height', '100px'
put 'stardust', '2', 'info:store-string', 'label content'

查询示例：

# 查询 id 为 1 的元素
scan 'stardust', {FILTER => "RowFilter(=, 'binary:1')"}

# 过滤出有存储字符串能力的元素（只得到列数据）
scan 'stardust', {FILTER => "QualifierFilter(=, 'binary:store-string')"}

# 查询类型为 button 的元素
scan 'stardust', {FILTER => "SingleColumnValueFilter('info', 'kind', =, 'binary:button')"}

# 查询出值为 100px 的列的元素（只得到列数据）
scan 'stardust', {FILTER => "ValueFilter(=, 'binary:100px')"}

# 查询出 x 坐标在 100px 且名字包含 te 的元素
scan 'stardust', {FILTER => "SingleColumnValueFilter('detail', 'x', =, 'binary:100px') AND SingleColumnValueFilter('info', 'name', =, 'substring:te')"}

3. Java API 使用

Connection 是重量级且线程安全的，需要存下来重复利用
HTable 是轻量级且线程不安全的，需要每次用完都关闭，下一次重新开

① Hbase 工具类

下列为参考网上其他工具类自己改写的，底层调用了 Hbase 的 JavaAPI
需要修改的是静态代码块中的 zookeeper 配置

public class HbaseUtils {

    private static Connection connection;

    static {
        Configuration configuration = HbaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", "hadoop001:12181,hadoop002:12182,hadoop003:12183");
        try {
            connection = ConnectionFactory.createConnection(configuration);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    
    public static boolean createTable(String tableName, String ... columnFamilies) {
        try {
            HbaseAdmin admin = (HbaseAdmin) connection.getAdmin();
            if (admin.tableExists(TableName.valueOf(tableName))) {
                admin.close();
                return false;
            }
            TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(TableName.valueOf(tableName));
            Arrays.stream(columnFamilies).forEach(columnFamily ->
                    tableDescriptor.setColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(columnFamily))
                    .setMaxVersions(1)
                    .build()));
            admin.createTable(tableDescriptor.build());
            admin.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return true;
    }

    
    public static boolean deleteTable(String tableName) {
        try {
            HbaseAdmin admin = (HbaseAdmin) connection.getAdmin();
            // 删除表前需要先禁用表
            admin.disableTable(TableName.valueOf(tableName));
            admin.deleteTable(TableName.valueOf(tableName));
            admin.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return true;
    }

    
    public static boolean putRow(String tableName, String rowKey, String columnFamilyName, String qualifier, String value) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));
            Put put = new Put(Bytes.toBytes(rowKey));
            put.addColumn(Bytes.toBytes(columnFamilyName), Bytes.toBytes(qualifier), Bytes.toBytes(value));
            table.put(put);
            table.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return true;
    }

    
    public static boolean putRow(String tableName, String rowKey, String columnFamilyName, List> pairList) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));
            Put put = new Put(Bytes.toBytes(rowKey));
            pairList.forEach(pair -> put.addColumn(Bytes.toBytes(columnFamilyName), Bytes.toBytes(pair.getFirst()), Bytes.toBytes(pair.getSecond())));
            table.put(put);
            table.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return true;
    }

    
    public static Result getRow(String tableName, String rowKey) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));
            Get get = new Get(Bytes.toBytes(rowKey));
            Result result = table.get(get);
            printResult(result);
            table.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    
    public static void printResult(Result result) {
        List cells = result.listCells();
        for (Cell cell : cells) {
            // 获取列簇名称
            String cf = Bytes.toString(cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength());
            // 获取列名称
            String cn = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
            // 获取值
            String value = Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
            System.out.println(cf + ":" + cn + " => " + value);
        }
    }

    
    public static String getCell(String tableName, String rowKey, String columnFamily, String qualifier) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));
            Get get = new Get(Bytes.toBytes(rowKey));
            if (!get.isCheckExistenceOnly()) {
                get.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(qualifier));
                Result result = table.get(get);
                byte[] resultValue = result.getValue(Bytes.toBytes(columnFamily), Bytes.toBytes(qualifier));
                table.close();
                return Bytes.toString(resultValue);
            }
            table.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    
    public static ResultScanner getScanner(String tableName) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));
            Scan scan = new Scan();
            return table.getScanner(scan);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    
    public static ResultScanner getScanner(String tableName, FilterList filterList) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));
            Scan scan = new Scan();
            scan.setFilter(filterList);
            return table.getScanner(scan);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    
    public static ResultScanner getScanner(String tableName, String startRowKey, String endRowKey,
                                           FilterList filterList) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));
            Scan scan = new Scan();
            scan.withStartRow(Bytes.toBytes(startRowKey));
            scan.withStopRow(Bytes.toBytes(endRowKey));
            scan.setFilter(filterList);
            return table.getScanner(scan);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }
    
    public static boolean deleteRow(String tableName, String rowKey) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));
            Delete delete = new Delete(Bytes.toBytes(rowKey));
            table.delete(delete);
            table.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return true;
    }
    
    public static boolean deleteColumn(String tableName, String rowKey, String familyName,
                                       String qualifier) {
        try {
            Table table = connection.getTable(TableName.valueOf(tableName));
            Delete delete = new Delete(Bytes.toBytes(rowKey));
            delete.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(qualifier));
            table.delete(delete);
            table.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return true;
    }
}

② 使用方法

(1)—— 删除表

可以先删掉上述用命令行创建的表：

void deleteTest() {
    if (HbaseUtils.deleteTable("stardust")) {
        System.out.println("删除成功");
    }
}

(2)—— 创建表并插入数据

void initStardust() {
    // 创建表
    HbaseUtils.createTable("stardust", "info");
    // 插入数据
    String[] kinds = {"button", "label", "input", "img", "div"};
    String kind;
    Random random = new Random(new Date().getTime());
    for (int i = 1; i <= 9; i++) {
        kind = kinds[random.nextInt(5)];
        HbaseUtils.putRow("stardust", "" + i, "info", "id", "" + i);
        HbaseUtils.putRow("stardust", "" + i, "info", "name", kind + random.nextInt(100));
        HbaseUtils.putRow("stardust", "" + i, "info", "kind", "" + kind);
        HbaseUtils.putRow("stardust", "" + i, "info", "belong", "0");

        HbaseUtils.putRow("stardust", "" + i, "info", "x", random.nextInt(500) + "px");
        HbaseUtils.putRow("stardust", "" + i, "info", "y", random.nextInt(500) + "px");
        HbaseUtils.putRow("stardust", "" + i, "info", "width", random.nextInt(500) + "px");
        HbaseUtils.putRow("stardust", "" + i, "info", "height", random.nextInt(500) + "px");

        int skill = random.nextInt(100);
        if (skill < 40) {
            HbaseUtils.putRow("stardust", "" + i, "info", "store-int", random.nextInt(100) + "");
        } else if (skill < 70) {
            HbaseUtils.putRow("stardust", "" + i, "info", "store-string", kind + " content :" + random.nextInt(100));
        } else if (skill < 85) {
            HbaseUtils.putRow("stardust", "" + i, "info", "modify", "(1, 2|x, y|x=x+y)");
        } else if (skill < 95) {
            HbaseUtils.putRow("stardust", "" + i, "info", "check", "(1|x|x=10)");
        } else {
            HbaseUtils.putRow("stardust", "" + i, "info", "clock", "0/30 * * * * ?");
        }
    }
}

(3)—— 打印全表

void printTable(String tableName) {
    ResultScanner scanner = HbaseUtils.getScanner(tableName);
    if (scanner != null) {
        for (Result result : scanner) {
            HbaseUtils.printResult(result);
            System.out.println("---");
        }
    }
}

(4)—— 过滤出 x 在 100px 以上，y 在 300px 以上的元素

注意，这里的比较是字符串比较，而不是真正的值比较
所以这里使用的是正则表达式，而非简单的大于小于

void scanTest() {
    SingleColumnValueFilter start = new SingleColumnValueFilter(
            Bytes.toBytes("info"),
            Bytes.toBytes("x"),
            CompareOperator.EQUAL,
            new RegexStringComparator("[2-9][0-9][0-9]px"));

    SingleColumnValueFilter end = new SingleColumnValueFilter(
            Bytes.toBytes("info"),
            Bytes.toBytes("y"),
            CompareOperator.EQUAL,
            new RegexStringComparator((" [3-9][0-9][0-9]px")));

    FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, start, end);
    ResultScanner scanner = HbaseUtils.getScanner("stardust", filterList);
    if (scanner != null) {
        for (Result result : scanner) {
            HbaseUtils.printResult(result);
            System.out.println("---");
        }
        scanner.close();
    }
}

对工具类的使用可以灵活变通，特别是过滤器可以耍出很多花样，这里就不再示例了
重要的还是对 Hbase 工具类的理解，还可以继续往工具类中添加功能（这里示范的只是一个很简单的工具类）

4. 写在最后

使用 Hbase 工具类对 Hbase 进行操作依旧很麻烦，特别是查询要求复杂时需要用到很多过滤器
所以下一弹会介绍 apache 的另一个项目 Phoenix，可以像 MySQL 一样使用 SQL 操作 Hbase

冰面上的鱼早就 over 拉（IceClean）

【HBase之轨迹】（2）使用 hbase命令 和 JavaAPI 操作 HBase（包括复杂查询--过滤器，HBase 工具类）

大数据系统相关栏目本月热门文章

【HBase之轨迹】（2）使用 hbase命令和 JavaAPI 操作 HBase（包括复杂查询--过滤器，HBase 工具类）