●HBase|hbase的Region分裂代码分析

region分裂有2种触发情景:1是用户手动触发(参见HRegionServer的splitRegion方法),2是后台flush线程flush完一个region的memstore时,会去检查这个region是否需要分裂(参见MemStoreFlushe的flushRegion方法)。这两种情景在代码实现上并无多大差异。
1.下面以手动的split为例分析,手动split有HregionServer的splitRegion开始

  1. @Override//手动split的实现
  2. public void splitRegion(HRegionInfo regionInfo, byte[] splitPoint)
  3. throws NotServingRegionException, IOException {
  4. checkOpen();
  5. HRegion region = getRegion(regionInfo.getRegionName());
  6. region.flushcache(); //刷新memstore,减少内存堆积
  7. region.forceSplit(splitPoint); //强制split
  8. compactSplitThread.requestSplit(region, region.checkSplit()); //通过compactSplitThread线程池完成split,具体由SplitRequest的run方法负责,SplitRequest内部会创建一个SplitTransaction来完成split
  9. }


其中compactSplitThread.requestSplit(region, region.checkSplit())中region.checkSplit()会计算该region的分裂点,看代码

  1. public byte[] checkSplit() {
  2. // Can't split ROOT/META
  3. //默认使用IncreasingToUpperBoundRegionSplitPolicy的分裂检查实现,有两种情况需要分裂:
  4. //1.splitRequest=true
  5. //2.check到该region下有store大于阀值,这个阀值和hbase.hregion.max.filesize和该region所在的RegionServer上和该Region属于同一表的region个数有关,具体见 getSizeToCheck方法。注意,如果该region下存在一个storefile是reference类型的文件则不能split
  6. if (!splitPolicy.shouldSplit()) {
  7. return null;
  8. }
  9. //判定该region是否分裂,如果有reference的storefile则不分裂,否则使用StoreFile.Reader得到最大的storefile,通过HFileBlockIndex得到该最大storefile的midkeyTODO midkey的获得还需深入细看
  10. byte[] ret = splitPolicy.getSplitPoint();
  11. return ret;
  12. }

2.接下来接着看SplitRequest的run方法主要部分

  1. SplitTransaction st = new SplitTransaction(parent, midKey);
  2. if (!st.prepare()) return; //再次判断有没有reference的region,创建两个新的region对象,代表分裂后产生的两个dautghter region
  3. st.execute(this.server, this.server);
下面分析下SplitRequest的execute做了什么

  1. public PairOfSameType execute(final Server server,
  2. final RegionServerServices services)
  3. throws IOException {
  4. PairOfSameType regions = createDaughters(server, services); //在zk上创建一个ephemeral node,以防regionserver在分裂过程中down掉,Zookeeper临时路径是/hbase/region-in-transition/regionEncodedName,在parent region的hdfs下创建.splits文件夹,关闭当前待分裂region
  5. openDaughters(server, services, regions.getFirst(), regions.getSecond());
  6. transitionZKNode(server, services, regions.getFirst(), regions.getSecond());
  7. return regions;
  8. }
【●HBase|hbase的Region分裂代码分析】
先分析 createDaughters
  1. PairOfSameType createDaughters(final Server server,
  2. final RegionServerServices services) throws IOException {
  3. this.fileSplitTimeout = testing ? this.fileSplitTimeout :
  4. server.getConfiguration().getLong("hbase.regionserver.fileSplitTimeout",
  5. this.fileSplitTimeout); //split超时时间,默认30s
  6. if (server != null && server.getZooKeeper() != null) {
  7. try {
  8. createNodeSplitting(server.getZooKeeper(),
  9. this.parent.getRegionInfo(), server.getServerName()); //在zk创建一个临时的节点,保存split状态为RS_ZK_REGION_SPLITTING,表示开始region分裂
  10. } catch (KeeperException e) {
  11. throw new IOException("Failed creating SPLITTING znode on " +
  12. this.parent.getRegionNameAsString(), e);
  13. }
  14. }
  15. createSplitDir(this.parent.getFilesystem(), this.splitdir); //在hdfs上为这个region的split过程创建临时工作目录/hbase/tableName/regionEncodedName/.splits
  16. this.journal.add(JournalEntry.CREATE_SPLIT_DIR);
  17. List hstoreFilesToSplit = null;
  18. Exception exceptionToThrow = null;
  19. try{
  20. hstoreFilesToSplit = this.parent.close(false); //关闭当前region,关闭前会等待region的flush和compact都完成(通过writestate同步实现),还会判断memstore的size小于5m(默认)时,会preFlush,然后关闭该region,region停止读写
  21. } catch (Exception e) {
  22. exceptionToThrow = e;
  23. }
  24. if (!testing) {
  25. services.removeFromOnlineRegions(this.parent.getRegionInfo().getEncodedName()); //从regionserver的online服务中移除
  26. }
  27. this.journal.add(JournalEntry.OFFLINED_PARENT);
  28. splitStoreFiles(this.splitdir, hstoreFilesToSplit); //通过创建与该region下storefile个数相同的线程池子进行并行分裂,见StoreFileSplitter的splitStoreFile方法,其核心走StoreFile.split方法
  29. this.journal.add(JournalEntry.STARTED_REGION_A_CREATION);
  30. HRegion a = createDaughterRegion(this.hri_a, this.parent.rsServices); //region读写数为父region的一半
  31. this.journal.add(JournalEntry.STARTED_REGION_B_CREATION);
  32. HRegion b = createDaughterRegion(this.hri_b, this.parent.rsServices);
  33. if (!testing) {
  34. //在.META.表中下线split的region,修改.META.表的该region信息,把offline split设置为true,添加列:splitA和splitB
  35. MetaEditor.offlineParentInMeta(server.getCatalogTracker(), this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo());
  36. }
StoreFile的split方法

  1. static Path split(final FileSystem fs,final Path splitDir,final StoreFile f,final byte [] splitRow, final Reference.Range range)
  2. throws IOException {
  3. // 检查split的key是否在storefile中
  4. if (range == Reference.Range.bottom) {
  5. //check if smaller than first key
  6. KeyValue splitKey = KeyValue.createLastOnRow(splitRow);
  7. byte[] firstKey = f.createReader().getFirstKey();
  8. // If firstKey is null means storefile is empty.
  9. if (firstKey == null) return null;
  10. if (f.getReader().getComparator().compare(splitKey.getBuffer(),
  11. splitKey.getKeyOffset(), splitKey.getKeyLength(),
  12. firstKey, 0, firstKey.length) < 0) {
  13. return null;
  14. }
  15. }
  16. else {
  17. //check if larger than last key.
  18. KeyValue splitKey = KeyValue.createFirstOnRow(splitRow);
  19. byte[] lastKey = f.createReader().getLastKey();
  20. // If lastKey is null means storefile is empty.
  21. if (lastKey == null) return null;
  22. if (f.getReader().getComparator().compare(splitKey.getBuffer(),
  23. splitKey.getKeyOffset(), splitKey.getKeyLength(),
  24. lastKey, 0, lastKey.length) > 0) {
  25. return null;
  26. }
  27. }
  28. /*生成类型为reference的storefile文件,比如encode name为a,column family为cf(该cf下有名为hfile的storefile)的region分裂后会形成名为b和c的引用文件,此时在hdfs中该region下的目录结构为
  29. /hbase/tableName/a/cf/hfile
  30. /hbase/tableName/b/.splits/cf/hfile.a
  31. /hbase/tableName/c/.splits/cf/hfile.a
  32. 这两个引用文件的storefile的内容由原storefile的中间rowkey和range组成,reference文件的个数与原split region的storefile文件个数相同
  33. */
  34. Reference r = new Reference(splitRow, range);
  35. String parentRegionName = f.getPath().getParent().getParent().getName();
  36. Path p = new Path(splitDir, f.getPath().getName() + "." + parentRegionName);
  37. return r.write(fs, p);
  38. }

再来看一下openDaughters

  1. void openDaughters(final Server server,
  2. final RegionServerServices services, HRegion a, HRegion b)
  3. throws IOException {
  4. //并行打开两个daughters
  5. DaughterOpener aOpener = new DaughterOpener(server, a);
  6. DaughterOpener bOpener = new DaughterOpener(server, b);
  7. aOpener.start();
  8. bOpener.start();
  9. if (services != null) {
  10. try {
  11. services.postOpenDeployTasks(b, server.getCatalogTracker(), true); // compact有references的storefile,compact操作最终清理掉这些reference文件,并把实际文件的内容写到region中去。将regioninfo信息和location的位置信息put到.META.表中
  12. services.addToOnlineRegions(b); //添加region对象到regionserver的online列表中,终于可以对外提供服务了
  13. services.postOpenDeployTasks(a, server.getCatalogTracker(), true);
  14. services.addToOnlineRegions(a);
  15. } catch (KeeperException ke) {
  16. throw new IOException(ke);
  17. }
  18. }
  19. }


最后梳理下整个流程:
检查该region是否需要分裂,如果满足分裂条件,则通过region.checkSplit()拿到midkey,并把该分裂请求SplitRequest提交给后台的CompactSplitThread线程池去执行,SplitRequest内部会创建SplitTransaction来实现分裂,其过程如下:
* 根据该region和midkey创建两个新的region对象HRegionInfo,代表分裂后的两个dautghter region
* 在zk上创建一个临时节点(名称为“/hbase/region-in-transition/region-name”的znode),以防regionserver在分裂过程中down掉,保存split状态为RS_ZK_REGION_SPLITTING,表示开始region分裂。同时因为master一直watch znode(/hbase/region-in-transition),所以master会知道这个region的变化,以防master对其进行move等操作
* 在该region所在的hdfs路径下创建.splits文件夹
* 关闭该region,关闭前会等待region的flush和compact都完成(通过writestate同步实现),还会判断如果memstore的size小于5m(默认)时,会preFlush,然后关闭该region,region停止读写,并从regionserver的online服务中移除
* 通过创建与该region下storefile个数相同的线程池子进行storefile的并行分裂,见StoreFileSplitter的splitStoreFile方法,其核心走StoreFile.split方法,其生成类型为reference的storefile文件,比如encode name为a,column family为cf(该cf下有名为hfile的storefile)的region分裂后会形成名为b和c的引用文件,此时在hdfs中该region下的目录结构为
/hbase/tableName/a/cf/hfile
/hbase/tableName/b/.splits/cf/hfile.a
/hbase/tableName/c/.splits/cf/hfile.a
这两个引用文件的storefile的内容由原storefile的中间rowkey和range组成,reference文件的个数与原split region的storefile文件个数相同


* 在.META.表中下线split的region,修改.META.表的该region信息,把offline split设置为true,添加列:splitA和splitB
*并行打开两个daughters region,CompactSplitThread后台线程会compact有references的storefile,compact操作最终清理掉这些reference文件,把实际文件的内容写到daughters region中去。并将daughter region的regioninfo信息和location的位置信息put到.META.表中
* 添加region对象到regionserver的online列表中,终于可以对外提供服务了


转载请注明出处:http://blog.csdn.net/odailidong/article/details/42217439
参考文章:
http://blog.csdn.net/c77_cn/article/details/38758545
http://www.cnblogs.com/foxmailed/p/3970050.html


    推荐阅读