以编程方式将数据批量加载到HBase的最快方法是什么？

我经历的过程可能与您尝试找到一种将MR中的数据加载到Hbase的有效方法的过程非常相似。我发现工作的是

HFileOutputFormat

用作MR的OutputFormatClass。

以下是我必须生成的代码的基础

job

以及

map

写出数据的Mapper 函数。很快
我们不再使用它，因此我手边没有数字，但是在一分钟内大约有250万条记录。

这是我编写的（分解后的）函数，用于为MapReduce流程生成作业以将数据放入Hbase

private Job createCubeJob(...) {    //Build and Configure Job    Job job = new Job(conf);    job.setJobName(jobName);    job.setMapOutputKeyClass(ImmutableBytesWritable.class);    job.setMapOutputValueClass(Put.class);    job.setMapperClass(HiveToHbaseMapper.class);//Custom Mapper    job.setJarByClass(CubeBuilderDriver.class);    job.setInputFormatClass(TextInputFormat.class);    job.setOutputFormatClass(HFileOutputFormat.class);    TextInputFormat.setInputPaths(job, hiveOutputDir);    HFileOutputFormat.setOutputPath(job, cubeOutputPath);    Configuration hConf = HbaseConfiguration.create(conf);    hConf.set("hbase.zookeeper.quorum", hbaseZookeeperQuorum);    hConf.set("hbase.zookeeper.property.clientPort", hbaseZookeeperClientPort);    HTable hTable = new HTable(hConf, tableName);    HFileOutputFormat.configureIncrementalLoad(job, hTable);    return job;}

这是我来自

HiveToHbaseMapper

该类的地图函数（略作编辑）。

public void map(WritableComparable key, Writable val, Context context)        throws IOException, InterruptedException {    try{        Configuration config = context.getConfiguration();        String[] strs = val.toString().split(Constants.HIVE_RECORD_COLUMN_SEPARATOR);        String family = config.get(Constants.CUBEBUILDER_CONFIGURATION_FAMILY);        String column = strs[COLUMN_INDEX];        String Value = strs[VALUE_INDEX];        String sKey = generateKey(strs, config);        byte[] bKey = Bytes.toBytes(sKey);        Put put = new Put(bKey);        put.add(Bytes.toBytes(family), Bytes.toBytes(column), (value <= 0)   ? Bytes.toBytes(Double.MIN_VALUE)  : Bytes.toBytes(value));        ImmutableBytesWritable ibKey = new ImmutableBytesWritable(bKey);        context.write(ibKey, put);        context.getCounter(CubeBuilderContextCounters.CompletedMapExecutions).increment(1);    }    catch(Exception e){        context.getCounter(CubeBuilderContextCounters.FailedMapExecutions).increment(1);        }}

我很确定这不会成为您的复制粘贴解决方案。显然，我在这里使用的数据不需要任何自定义处理（在此之前在MR作业中完成）。我要提供的主要内容是

HFileOutputFormat

。其余只是我如何使用它的一个示例。:)
我希望它能使您踏上寻求良好解决方案的坚实道路。：

以编程方式将数据批量加载到HBase的最快方法是什么？

面试问答相关栏目本月热门文章