2021-11-03 hadoop NameNode启动

ps: 本文参考hadoop-3.3.0

1 NameNode简介
namenode是hadoop架构中最重要的角色之一，NameNode主要管理管理着名称空间表（文件名与block的映射，存储在磁盘上且十分重要）和inodes表（块与机器的映射，只要namenode出现就会存在），通常情况下一个集群中有且仅有一个活跃的NameNode，从Hadoop2开始，为了保证集群中的高可用，可以配置两个namenode，一个active另一个为standby。但这种情况下，当集群规模达到一定程度时，namenode仍然会成为集群的瓶颈，因此在这种情况下，hadoop Federation应运而生，即一个集群中允许多组NameNode提供服务，每组namenode各自分管一部分目录（NameSpace，BlockPool），彼此之间相互隔离，但共享相同的DataNode。

但Federation模式虽然间接的扩展了NameNode，但是由于Federation中schema需要使用ViewFs且无法兼容HDFS，因此这种情况下需要对运行在集群上的schema做一个替换。因此Hadoop3中新增的的多个NameNode的特性就显得十分有效，理论上支持使用>2个NameNode，但官方建议使用3-5个NameNode节点。

还有一点内容需要了解，从hadoop3.3.0开始，社区新增了一个新feature，即新增一个observe namenode，为了解决active 访问负载过高的问题，将部分read request 转移到observer上可以大幅的降低active namenode的负载。

2 NameNode的启动

2.1 main()方法

首先进入NameNode的main方法，通过解析各个参数执行NameNode的启动：

public static void main(String argv[]) throws Exception {
    // 这里会做一个判断是否是help操作
    if (DFSUtil.parseHelpArgument(argv, NameNode.USAGE, System.out, true)) {
      System.exit(0);
    }

    try {
      StringUtils.startupShutdownMessage(NameNode.class, argv, LOG);
      // 创建NameNode
      NameNode namenode = createNameNode(argv, null);
      // 进程持续运行，运行NameNodeRpcServer#join，从而实现rpc的通信
      if (namenode != null) {
        namenode.join();
      }
    } catch (Throwable e) {
      LOG.error("Failed to start namenode.", e);
      terminate(1, e);
    }
  }

2.2 createNameNode

此方法主要是用来创建一个NameNode，根据传进去的参数创建不同的NameNode，当没有特别的参数指定时，创建一个默认的NameNode，即正常的NameNode

public static NameNode createNameNode(String argv[], Configuration conf)
      throws IOException {
    LOG.info("createNameNode " + Arrays.asList(argv));
    if (conf == null)
      conf = new HdfsConfiguration();
    // Parse out some generic args into Configuration.
    GenericOptionsParser hParser = new GenericOptionsParser(conf, argv);
    argv = hParser.getRemainingArgs();
    // Parse the rest, NN specific args.
    
    StartupOption startOpt = parseArguments(argv);
    if (startOpt == null) {
      printUsage(System.err);
      return null;
    }
    // 将startOpt添加到配置中，用户后面创建不同的NameNode
    setStartupOption(conf, startOpt);

    boolean aborted = false;
    // 根据配置创建不同的NameNode
    switch (startOpt) {
    case FORMAT:
      aborted = format(conf, startOpt.getForceFormat(),
          startOpt.getInteractiveFormat());
      terminate(aborted ? 1 : 0);
      return null; // avoid javac warning
    case GENCLUSTERID:
      String clusterID = NNStorage.newClusterID();
      LOG.info("Generated new cluster id: {}", clusterID);
      terminate(0);
      return null;
    case ROLLBACK:
      aborted = doRollback(conf, true);
      terminate(aborted ? 1 : 0);
      return null; // avoid warning
    case BOOTSTRAPSTANDBY:
      String[] toolArgs = Arrays.copyOfRange(argv, 1, argv.length);
      int rc = BootstrapStandby.run(toolArgs, conf);
      terminate(rc);
      return null; // avoid warning
    case INITIALIZESHAREDEDITS:
      aborted = initializeSharedEdits(conf,
          startOpt.getForceFormat(),
          startOpt.getInteractiveFormat());
      terminate(aborted ? 1 : 0);
      return null; // avoid warning
    case BACKUP:
    case CHECKPOINT:
      NamenodeRole role = startOpt.tonodeRole();
      DefaultMetricsSystem.initialize(role.toString().replace(" ", ""));
      return new BackupNode(conf, role);
    case RECOVER:
      NameNode.doRecovery(startOpt, conf);
      return null;
    case metaDATAVERSION:
      printmetadataVersion(conf);
      terminate(0);
      return null; // avoid javac warning
    case UPGRADEONLY:
      DefaultMetricsSystem.initialize("NameNode");
      new NameNode(conf);
      terminate(0);
      return null;
    default:
      DefaultMetricsSystem.initialize("NameNode");
      return new NameNode(conf);
    }
  }

这个方法主要是根据参数创建各种不同类型的NameNode，默认是Regular，创建普通的NameNode。另外还有其他几种方式：

format：格式化NameNode，建立NameNode节点的文件结构。带有format参数启动NameNode节点时，首先启动NameNode节点，然后对其机型格式化，再关闭节点，如果文件目录已经存在当前文件系统，则会提示用户。它有两个参数nonInteractive和force，nonInteractive表示如果NameNode节点的文件夹在当前的底层文件系统中存在，那么用户将不会收到提示，并且当前的格式化会失败，force表示不管NameNode的目录存不存在，强制格式化NameNode节点，也不会提示用户，如果nonInteractive和force参数同时存在，那么force参数将会被忽略
其他几种方式参见注释

2.3 NameNode构造

protected NameNode(Configuration conf, NamenodeRole role)
      throws IOException {
    super(conf);
    this.tracer = new Tracer.Builder("NameNode").
        conf(TraceUtils.wrapHadoopConf(NAMENODE_HTRACE_PREFIX, conf)).
        build();
    this.tracerConfigurationManager =
        new TracerConfigurationManager(NAMENODE_HTRACE_PREFIX, conf);
    this.role = role;
    String nsId = getNameServiceId(conf);
    String namenodeId = HAUtil.getNameNodeId(conf, nsId);
    clientNamenodeAddress = NameNodeUtils.getClientNamenodeAddress(
        conf, nsId);

    if (clientNamenodeAddress != null) {
      LOG.info("Clients should use {} to access"
          + " this namenode/service.", clientNamenodeAddress);
    }
    // 是否启用HA
    this.haEnabled = HAUtil.isHAEnabled(conf, nsId);
    // 获取当前是Active或者Standby状态
    state = createHAState(getStartupOption(conf));
    this.allowStaleStandbyReads = HAUtil.shouldAllowStandbyReads(conf);
    this.haContext = createHAContext();
    try {
      initializeGenericKeys(conf, nsId, namenodeId);
      // NameNode初始化
      initialize(getConf());
      state.prepareToEnterState(haContext);
      try {
        haContext.writeLock();
        // 启动对应状态的服务，如active、standby
        state.enterState(haContext);
      } finally {
        haContext.writeUnlock();
      }
    } catch (IOException e) {
      this.stopAtException(e);
      throw e;
    } catch (HadoopIllegalArgumentException e) {
      this.stopAtException(e);
      throw e;
    }
    notBecomeActiveInSafemode = conf.getBoolean(
        DFS_HA_NN_NOT_BECOME_ACTIVE_IN_SAFEMODE,
        DFS_HA_NN_NOT_BECOME_ACTIVE_IN_SAFEMODE_DEFAULT);
    this.started.set(true);
  }

在初始化中会进行一下参数的补充，检查当前NameNode的状态，是否HA之类的，而后根据给定状态启动对应服务。最重要的是initialize(conf)，用户namenode完成初始化操作。

2.4 initialize

protected void initialize(Configuration conf) throws IOException {
    if (conf.get(HADOOP_USER_GROUP_METRICS_PERCENTILES_INTERVALS) == null) {
      String intervals = conf.get(DFS_METRICS_PERCENTILES_INTERVALS_KEY);
      if (intervals != null) {
        conf.set(HADOOP_USER_GROUP_METRICS_PERCENTILES_INTERVALS,
          intervals);
      }
    }

    // 启动一些监控服务与配置添加
    UserGroupInformation.setConfiguration(conf);
    loginAsNameNodeUser(conf);

    NameNode.initMetrics(conf, this.getRole());
    StartupProgressMetrics.register(startupProgress);

    pauseMonitor = new JvmPauseMonitor();
    pauseMonitor.init(conf);
    pauseMonitor.start();
    metrics.getJvmMetrics().setPauseMonitor(pauseMonitor);

    if (conf.getBoolean(DFS_NAMENODE_GC_TIME_MONITOR_ENABLE,
        DFS_NAMENODE_GC_TIME_MONITOR_ENABLE_DEFAULT)) {
      long observationWindow = conf.getTimeDuration(
          DFS_NAMENODE_GC_TIME_MONITOR_OBSERVATION_WINDOW_MS,
          DFS_NAMENODE_GC_TIME_MONITOR_OBSERVATION_WINDOW_MS_DEFAULT,
          TimeUnit.MILLISECONDS);
      long sleepInterval = conf.getTimeDuration(
          DFS_NAMENODE_GC_TIME_MONITOR_SLEEP_INTERVAL_MS,
          DFS_NAMENODE_GC_TIME_MONITOR_SLEEP_INTERVAL_MS_DEFAULT,
          TimeUnit.MILLISECONDS);
      gcTimeMonitor = new Builder().observationWindowMs(observationWindow)
          .sleepIntervalMs(sleepInterval).build();
      gcTimeMonitor.start();
      metrics.getJvmMetrics().setGcTimeMonitor(gcTimeMonitor);
    }

    // 启动一个NameNodeHttpServer, 启动0.0.0.0:9870
    if (NamenodeRole.NAMENODE == role) {
      startHttpServer(conf);
    }

    // 根据配置中指定位置的edit和fsImage初始化FsNameSystem，加载fsImage
    loadNamesystem(conf);
    startAliasMapServerIfNecessary(conf);

    // hadoop rpc
    rpcServer = createRpcServer(conf);

    initReconfigurableBackoffKey();

    if (clientNamenodeAddress == null) {
      // This is expected for MiniDFSCluster. Set it now using 
      // the RPC server's bind address.
      clientNamenodeAddress = 
          NetUtils.getHostPortString(getNameNodeAddress());
      LOG.info("Clients are to use " + clientNamenodeAddress + " to access"
          + " this namenode/service.");
    }
    if (NamenodeRole.NAMENODE == role) {
      httpServer.setNameNodeAddress(getNameNodeAddress());
      httpServer.setFSImage(getFSImage());
      if (levelDBAliasMapServer != null) {
        httpServer.setAliasMap(levelDBAliasMapServer.getAliasMap());
      }
    }

    startCommonServices(conf);
    startMetricsLogger(conf);
  }

3 其他

在此主要介绍NameNode类中中比较重要的几个变量：

protected FSNamesystem namesystem; 
// namenode的角色分类，包括NameNode backup checkpoint
protected final NamenodeRole role;
// Ha状态，包括active standby backup三种实现
private volatile HAState state;

// 负责namenode的http调用
protected NameNodeHttpServer httpServer;

// 主要负责处理namenode的rpc调用
private NameNodeRpcServer rpcServer;

2021-11-03 hadoop NameNode启动

2 NameNode的启动

大数据系统相关栏目本月热门文章