Yarn核心参数配置

由于没有环境进行测试，搬运尚硅谷hadoop材料，留坑待填

Yarn配置案例和相关参数

需求：从 1G 数据中，统计每个单词出现次数。服务器 3 台，每台配置 4G 内存， 4 核CPU， 4 线程。

1G / 128m = 8 个 MapTask； 1 个 ReduceTask； 1 个 mrAppMaster，平均每个节点运行 10 个 / 3 台 ≈ 3 个任务（4 3 3）

修改yarn-site.xml



    The class to use as the resource scheduler.
    yarn.resourcemanager.scheduler.class
    org.apache.hadoop.yarn.server.resourcemanager.scheduler.capaci
ty.CapacityScheduler



    Number of threads to handle scheduler interface.
    yarn.resourcemanager.scheduler.client.thread-count
    8



    Enable auto-detection of node capabilities such as memory and CPU.
    
    yarn.nodemanager.resource.detect-hardware-capabilities
    false



    Flag to determine if logical processors(such as
hyperthreads) should be counted as cores. Only applicable on Linux
when yarn.nodemanager.resource.cpu-vcores is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true.
    
    yarn.nodemanager.resource.count-logical-processors-ascores
    false



    Multiplier to determine how to convert phyiscal cores to
vcores. This value is used if yarn.nodemanager.resource.cpu-vcores
is set to -1(which implies auto-calculate vcores) and
yarn.nodemanager.resource.detect-hardware-capabilities is set to true.
The number of vcores will be calculated as number of CPUs * multiplier.
    
    yarn.nodemanager.resource.pcores-vcores-multiplier
    1.0



    Amount of physical memory, in MB, that can be allocated
for containers. If set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically calculated(in case of Windows and Linux).
In other cases, the default is 8192MB.
    
    yarn.nodemanager.resource.memory-mb
    4096



    Number of vcores that can be allocated
for containers. This is used by the RM scheduler when allocating
resources for containers. This is not used to limit the number of
CPUs used by YARN containers. If it is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically determined from the hardware in case of Windows and Linux.
In other cases, number of vcores is 8 by default.
    yarn.nodemanager.resource.cpu-vcores
    4



    The minimum allocation for every container request at the
RM in MBs. Memory requests lower than this will be set to the value of
this property. Additionally, a node manager that is configured to have
less memory than this value will be shut down by the resource manager.
    
    yarn.scheduler.minimum-allocation-mb
    1024



    The maximum allocation for every container request at the
RM in MBs. Memory requests higher than this will throw an
InvalidResourceRequestException.
    
    yarn.scheduler.maximum-allocation-mb
    2048



    The minimum allocation for every container request at the
RM in terms of virtual CPU cores. Requests lower than this will be set to
the value of this property. Additionally, a node manager that is configured
to have fewer virtual cores than this value will be shut down by the
resource manager.
    
    yarn.scheduler.minimum-allocation-vcores
    1



    The maximum allocation for every container request at the
RM in terms of virtual CPU cores. Requests higher than this will throw an
InvalidResourceRequestException.
    yarn.scheduler.maximum-allocation-vcores
    2



    Whether virtual memory limits will be enforced for
containers.
    yarn.nodemanager.vmem-check-enabled
    false



    Ratio between virtual memory to physical memory when
setting memory limits for containers. Container allocations are
expressed in terms of physical memory, and virtual memory usage is
allowed to exceed this allocation by this ratio.
    
    yarn.nodemanager.vmem-pmem-ratio
    2.1

容量调度器多队列配置

需求 1： default 队列占总内存的 40%，最大资源容量占总资源 60%， hive 队列占总内存的 60%，最大资源容量占总资源 80%。

需求 2：配置队列优先级

由于默认是容量调度器配置，因此不需要另外指定配置文件为 capacity-scheduler.xml

在 capacity-scheduler.xml 中配置如下：

修改配置



    yarn.scheduler.capacity.root.queues
    default,hive
    The queues at the this level (root is the root queue).
    



    yarn.scheduler.capacity.root.default.capacity
    40



    yarn.scheduler.capacity.root.default.maximum-capacity
    60

为新加队列添加属性

同样在 capacity-scheduler.xml 中配置，直接复制default的配置修改，相当于同样的属性写两遍



    yarn.scheduler.capacity.root.hive.capacity
    60



    yarn.scheduler.capacity.root.hive.user-limit-factor
    1



    yarn.scheduler.capacity.root.hive.maximum-capacity
    80



    yarn.scheduler.capacity.root.hive.state
    RUNNING



    yarn.scheduler.capacity.root.hive.acl_submit_applications
    *



    yarn.scheduler.capacity.root.hive.acl_administer_queue
    *



    yarn.scheduler.capacity.root.hive.acl_application_max_priority
    *




    yarn.scheduler.capacity.root.hive.maximum-applicationlifetime
    -1



    yarn.scheduler.capacity.root.hive.default-applicationlifetime
    -1

分发配置文件，并重启yarn或刷新队列

yarn rmadmin -refreshQueues

向Hive队列提交任务

shell方式

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -D mapreduce.job.queuename=hive /input /output

jar包方式

默认的任务提交都是提交到 default 队列的。如果希望向其他队列提交任务，需要在Driver 中声明

Configuration conf = new Configuration();
conf.set("mapreduce.job.queuename","hive");
Job job = Job.getInstance(conf);

任务优先级

容量调度器，支持任务优先级的配置，在资源紧张时，优先级高的任务将优先获取资源。默认情况， Yarn 将所有任务的优先级限制为 0，若想使用任务的优先级功能，须开放该限制

修改 yarn-site.xml 文件，增加以下参数


	yarn.cluster.max-application-priority
	5

分发配置，重启yarn

模拟资源紧张环境，提交计算pi的任务

hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar pi 5 2000000

运行过程中提交更高优先级任务，发现插队
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar pi -D mapreduce.job.priority=5 5 2000000

修改任务优先级

yarn application -appID  -updatePriority 优先级
yarn application -appID application_1611133087930_0009 -updatePriority 5

公平调度器多队列配置

创建两个队列，分别是 test 和 atguigu（以用户所属组命名）。期望实现以下效果：若用户提交任务时指定队列，则任务提交到指定队列运行；若未指定队列， test 用户提交的任务到 root.group.test 队列运行， atguigu 提交的任务到 root.group.atguigu 队列运行

公平调度器的配置涉及到两个文件，一个是 yarn-site.xml，另一个是公平调度器队列分配文件 fair-scheduler.xml（文件名可自定义）

修改 yarn-site.xml 文件，指定fair调度器的配置文件位置


    yarn.resourcemanager.scheduler.class
    org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairS
cheduler
    配置使用公平调度器


    yarn.scheduler.fair.allocation.file
    /opt/module/hadoop-3.1.3/etc/hadoop/fair-scheduler.xml
    指明公平调度器队列分配配置文件


    yarn.scheduler.fair.preemption
    false
    禁止队列间资源抢占

配置 fair-scheduler.xml



    
    0.5
    
    4096mb,4vcores
    
    
        
        2048mb,2vcores
        
        4096mb,4vcores
        
        4
        
        0.5
        
        1.0
        
        fair
    
    
    
        
        2048mb,2vcores
        
        4096mb,4vcores
        
        4
        
        0.5
        
        1.0
        
        fair

分发配置并重启 Yarn 测试

hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar pi - Dmapreduce.job.queuename=root.test 1 1

不指定队列会提交到和用户名匹配的队列

Yarn核心参数配置

Java相关栏目本月热门文章