栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

2021SC@SDUSC hbase代码分析(九)HFile分析(1)

2021SC@SDUSC hbase代码分析(九)HFile分析(1)

2021SC@SDUSC hbase源码分析(九)HFile分析(1)

2021SC@SDUSC 2021SC@SDUSC
2021SC@SDUSC 2021SC@SDUSC

目录
  • 2021SC@SDUSC hbase源码分析(九)HFile分析(1)
    • HFile构成
    • HFile物理数据
    • HFile类中相关代码
        • 介绍
        • HFile存储路径
        • 判断格式
        • 获得HFile路径集合
        • HFile设置blocksize
      • 写的API
        • 创建Writer
      • 客户端读
        • 获取reader:
      • HFile类的相关代码分析完毕
    • 未完待续

HFile构成

文件主要分为四个部分:Scanned block section部分,Non-scanned block section部分,Load-on-open-section部分和Trailer部分。

  1. Scanned block section:

    顾名思义,表示顺序扫描HFile时所有的数据块将会被读取,包括Leaf Index Block、Data Block以及Bloom Block。其中Data Block中储存用户的KeyValue数据,Leaf Index Block中储存索引树的叶子节点数据,Bloom Block中存储布隆过滤器相关数据。

  2. Non-scanned block section:

    表示在HFile顺序扫描的时候数据不会被读取,主要包括meta Block和Intermediate Level Data Index Blocks两部分。

  3. Load-on-open-section:

    这部分数据在Hbase的region server启动时,需要加载到内存中。包括FileInfo、Bloom filter block、data block index和meta block index。

  4. Trailer:

    这部分主要记录了HFile的基本信息、各个部分的偏移值和寻址信息。

HFile物理数据

HFile物理结构图:

如上图,HFile会被切分为多个大小相等的block块,每个block的大小可以在创建表列簇的时候通过参数blocksize => ‘65535’进行指定,默认为64k,大号的Block有利于顺序Scan,小号Block利于随机查询,因而需要权衡。

HFile类中相关代码 介绍

我们可以看到HFile类中的介绍:

* 

* File is made of data blocks followed by meta data blocks (if any), a fileinfo * block, data block index, meta data block index, and a fixed size trailer * which records the offsets at which file changes content type. *

<data blocks><meta blocks><fileinfo><
* data index><meta index><trailer>
* Each block has a bit of magic at its start. Block are comprised of * key/values. In data blocks, they are both byte arrays. metadata blocks are * a String key and a byte array value. An empty file looks like this: *
<fileinfo><trailer>
. That is, there are not data nor meta * blocks present. *

HFile中部分部分重要属性:

从上到下依次是:

  1. HFile中key的最大长度
  2. HFile支持的最小版本
  3. HFile支持的最大版本
public final static int MAXIMUM_KEY_LENGTH = Integer.MAX_VALUE;
public static final int MIN_FORMAT_VERSION = 2;
public static final int MAX_FORMAT_VERSION = 3;
HFile存储路径

同时在这个类中,我们可以得到HFile的存储路径为:ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE:

public final static int MIN_NUM_HFILE_PATH_LEVELS = 5;
判断格式

HFile中判断HFile格式的判断方法:

public static boolean isHFileFormat(final FileSystem fs, final FileStatus fileStatus)
    throws IOException {
  final Path path = fileStatus.getPath();
  final long size = fileStatus.getLen();
  try (FSDataInputStreamWrapper fsdis = new FSDataInputStreamWrapper(fs, path)) {
    boolean isHbaseChecksum = fsdis.shouldUseHbaseChecksum();
    assert !isHbaseChecksum; // Initially we must read with FS checksum.
    FixedFileTrailer.readFromStream(fsdis.getStream(isHbaseChecksum), size);
    return true;
  } catch (IllegalArgumentException e) {
    return false;
  }
}

上图代码中的FileStatus类中相关属性,用于获得路径信息:

public class FileStatus implements Writable, Comparable {
    private Path path;
    private long length;
    private boolean isdir;
    private short block_replication;
    private long blocksize;
    private long modification_time;
    private long access_time;
    private FsPermission permission;
    private String owner;
    private String group;
    private Path symlink;
    、、、
 }
获得HFile路径集合

获得HFile文件位置的方法:

public static List getStoreFiles(FileSystem fs, Path regionDir)
    throws IOException {
  List regionHFiles = new ArrayList<>();
  PathFilter dirFilter = new FSUtils.DirFilter(fs);
  FileStatus[] familyDirs = fs.listStatus(regionDir, dirFilter);
  for(FileStatus dir : familyDirs) {
    FileStatus[] files = fs.listStatus(dir.getPath());
    for (FileStatus file : files) {
      if (!file.isDirectory() &&
          (!file.getPath().toString().contains(HConstants.HREGION_OLDLOGDIR_NAME)) &&
          (!file.getPath().toString().contains(HConstants.RECOVERED_EDITS_DIR))) {
        regionHFiles.add(file.getPath());
      }
    }
  }
  return regionHFiles;
}

它返回一个路径集合,其中保存着HFile的路径。

HFile设置blocksize

HFile类中设置blocksize相关接口源码:

  public interface CachingBlockReader {
    
    HFileBlock readBlock(long offset, long onDiskBlockSize,
        boolean cacheBlock, final boolean pread, final boolean isCompaction,
        final boolean updateCacheMetrics, BlockType expectedBlockType,
        DataBlockEncoding expectedDataBlockEncoding)
        throws IOException;
  }
写的API

Writer结构:

HFile类中的内部抽象接口Writer继承了Closeable, CellSink, ShipperListener类,作用是作为写的API

相关抽象类源码:

public interface Writer extends Closeable, CellSink, ShipperListener {
    
  public static final byte [] MAX_MEMSTORE_TS_KEY = Bytes.toBytes("MAX_MEMSTORE_TS_KEY");
  void appendFileInfo(byte[] key, byte[] value) throws IOException;
  Path getPath();
  void addInlineBlockWriter(InlineBlockWriter bloomWriter);
  void appendmetaBlock(String bloomFiltermetaKey, Writable metaWriter);
  void addGeneralBloomFilter(BloomFilterWriter bfw);
  void addDeleteFamilyBloomFilter(BloomFilterWriter bfw) throws IOException;

  HFileContext getFileContext();
}
创建Writer

方法中需要获取版本信息,从而根据不同的不同版本会执行不同的方法

public static final WriterFactory getWriterFactory(Configuration conf,
    CacheConfig cacheConf) {
  int version = getFormatVersion(conf);
  switch (version) {
    case 2:
      throw new IllegalArgumentException("This should never happen. " +
        "Did you change hfile.format.version to read v2? This version of the software writes v3" +
        " hfiles only (but it can read v2 files without having to update hfile.format.version " +
        "in hbase-site.xml)");
    case 3:
      return new HFile.WriterFactory(conf, cacheConf);
    default:
      throw new IllegalArgumentException("Cannot create writer for HFile " +
          "format version " + version);
  }
}
客户端读

Reader结构:

HFile中的内部抽象接口继承了Closeable, CachingBlockReader,它的作用是client用来打开或迭代HFile

public interface Reader extends Closeable, CachingBlockReader {

  String getName();

  CellComparator getComparator();

  HFileScanner getScanner(boolean cacheBlocks, final boolean pread, final boolean isCompaction);

  HFileBlock getmetaBlock(String metaBlockName, boolean cacheBlock) throws IOException;

  Optional getLastKey();

  Optional midKey() throws IOException;

  long length();

  long getEntries();

  Optional getFirstKey();

  long indexSize();

  Optional getFirstRowKey();

  Optional getLastRowKey();

  FixedFileTrailer getTrailer();

  void setDataBlockIndexReader(HFileBlockIndex.CellbasedKeyBlockIndexReader reader);
  HFileBlockIndex.CellbasedKeyBlockIndexReader getDataBlockIndexReader();

  void setmetaBlockIndexReader(HFileBlockIndex.ByteArrayKeyBlockIndexReader reader);
  HFileBlockIndex.ByteArrayKeyBlockIndexReader getmetaBlockIndexReader();

  HFileScanner getScanner(boolean cacheBlocks, boolean pread);

  DataInput getGeneralBloomFiltermetadata() throws IOException;

  DataInput getDeleteBloomFiltermetadata() throws IOException;

  Path getPath();

  void close(boolean evictOnClose) throws IOException;

  DataBlockEncoding getDataBlockEncoding();

  boolean hasMVCCInfo();

  HFileContext getFileContext();

  boolean isPrimaryReplicaReader();

  DataBlockEncoding getEffectiveEncodingInCache(boolean isCompaction);

  @VisibleForTesting
  HFileBlock.FSReader getUncachedBlockReader();

  @VisibleForTesting
  boolean prefetchComplete();

  void unbufferStream();

  ReaderContext getContext();
  HFileInfo getHFileInfo();
  void setDataBlockEncoder(HFileDataBlockEncoder dataBlockEncoder);
}
获取reader:
public static Reader createReader(ReaderContext context, HFileInfo fileInfo,
    CacheConfig cacheConf, Configuration conf) throws IOException {
  try {
    if (context.getReaderType() == ReaderType.STREAM) {
      return new HFileStreamReader(context, fileInfo, cacheConf, conf);
    }
    FixedFileTrailer trailer = fileInfo.getTrailer();
    switch (trailer.getMajorVersion()) {
      case 2:
        LOG.debug("Opening HFile v2 with v3 reader");
        // Fall through. FindBugs: SF_SWITCH_FALLTHROUGH
      case 3:
        return new HFilePreadReader(context, fileInfo, cacheConf, conf);
      default:
        throw new IllegalArgumentException("Invalid HFile version " + trailer.getMajorVersion());
    }
  } catch (Throwable t) {
    IOUtils.closeQuietly(context.getInputStreamWrapper());
    throw new CorruptHFileException("Problem reading HFile Trailer from file "
        + context.getFilePath(), t);
  } finally {
    context.getInputStreamWrapper().unbuffer();
  }
}

与创建Writer不同的是:获取reader无需指定version,因为version信息已经在HFile的trailer

HFile类的相关代码分析完毕 未完待续
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/601511.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号