由于没有环境进行测试,搬运尚硅谷hadoop材料,留坑待填
Yarn配置案例和相关参数需求:从 1G 数据中,统计每个单词出现次数。服务器 3 台,每台配置 4G 内存, 4 核CPU, 4 线程。
1G / 128m = 8 个 MapTask; 1 个 ReduceTask; 1 个 mrAppMaster,平均每个节点运行 10 个 / 3 台 ≈ 3 个任务(4 3 3)
修改yarn-site.xml容量调度器多队列配置The class to use as the resource scheduler. yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capaci ty.CapacityScheduler Number of threads to handle scheduler interface. yarn.resourcemanager.scheduler.client.thread-count 8 Enable auto-detection of node capabilities such as memory and CPU. yarn.nodemanager.resource.detect-hardware-capabilities false Flag to determine if logical processors(such as hyperthreads) should be counted as cores. Only applicable on Linux when yarn.nodemanager.resource.cpu-vcores is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true. yarn.nodemanager.resource.count-logical-processors-ascores false Multiplier to determine how to convert phyiscal cores to vcores. This value is used if yarn.nodemanager.resource.cpu-vcores is set to -1(which implies auto-calculate vcores) and yarn.nodemanager.resource.detect-hardware-capabilities is set to true. The number of vcores will be calculated as number of CPUs * multiplier. yarn.nodemanager.resource.pcores-vcores-multiplier 1.0 Amount of physical memory, in MB, that can be allocated for containers. If set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated(in case of Windows and Linux). In other cases, the default is 8192MB. yarn.nodemanager.resource.memory-mb 4096 Number of vcores that can be allocated for containers. This is used by the RM scheduler when allocating resources for containers. This is not used to limit the number of CPUs used by YARN containers. If it is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically determined from the hardware in case of Windows and Linux. In other cases, number of vcores is 8 by default. yarn.nodemanager.resource.cpu-vcores 4 The minimum allocation for every container request at the RM in MBs. Memory requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have less memory than this value will be shut down by the resource manager. yarn.scheduler.minimum-allocation-mb 1024 The maximum allocation for every container request at the RM in MBs. Memory requests higher than this will throw an InvalidResourceRequestException. yarn.scheduler.maximum-allocation-mb 2048 The minimum allocation for every container request at the RM in terms of virtual CPU cores. Requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager. yarn.scheduler.minimum-allocation-vcores 1 The maximum allocation for every container request at the RM in terms of virtual CPU cores. Requests higher than this will throw an InvalidResourceRequestException. yarn.scheduler.maximum-allocation-vcores 2 Whether virtual memory limits will be enforced for containers. yarn.nodemanager.vmem-check-enabled false Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio. yarn.nodemanager.vmem-pmem-ratio 2.1
需求 1: default 队列占总内存的 40%,最大资源容量占总资源 60%, hive 队列占总内存的 60%, 最大资源容量占总资源 80%。
需求 2:配置队列优先级
由于默认是容量调度器配置,因此不需要另外指定配置文件为 capacity-scheduler.xml
在 capacity-scheduler.xml 中配置如下:
修改配置为新加队列添加属性yarn.scheduler.capacity.root.queues default,hive The queues at the this level (root is the root queue). yarn.scheduler.capacity.root.default.capacity 40 yarn.scheduler.capacity.root.default.maximum-capacity 60
同样在 capacity-scheduler.xml 中配置,直接复制default的配置修改,相当于同样的属性写两遍
yarn.scheduler.capacity.root.hive.capacity 60 yarn.scheduler.capacity.root.hive.user-limit-factor 1 yarn.scheduler.capacity.root.hive.maximum-capacity 80 yarn.scheduler.capacity.root.hive.state RUNNING yarn.scheduler.capacity.root.hive.acl_submit_applications * yarn.scheduler.capacity.root.hive.acl_administer_queue * yarn.scheduler.capacity.root.hive.acl_application_max_priority * yarn.scheduler.capacity.root.hive.maximum-applicationlifetime -1 yarn.scheduler.capacity.root.hive.default-applicationlifetime -1
分发配置文件,并重启yarn或刷新队列
yarn rmadmin -refreshQueues向Hive队列提交任务
shell方式
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -D mapreduce.job.queuename=hive /input /output
jar包方式
默认的任务提交都是提交到 default 队列的。如果希望向其他队列提交任务,需要在Driver 中声明
Configuration conf = new Configuration();
conf.set("mapreduce.job.queuename","hive");
Job job = Job.getInstance(conf);
任务优先级
容量调度器,支持任务优先级的配置,在资源紧张时,优先级高的任务将优先获取资源。默认情况, Yarn 将所有任务的优先级限制为 0,若想使用任务的优先级功能,须开放该限制
修改 yarn-site.xml 文件,增加以下参数
yarn.cluster.max-application-priority 5
分发配置,重启yarn
模拟资源紧张环境 ,提交计算pi的任务
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar pi 5 2000000 运行过程中提交更高优先级任务,发现插队 hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar pi -D mapreduce.job.priority=5 5 2000000
修改任务优先级
yarn application -appID公平调度器多队列配置-updatePriority 优先级 yarn application -appID application_1611133087930_0009 -updatePriority 5
创建两个队列,分别是 test 和 atguigu(以用户所属组命名)。期望实现以下效果: 若用户提交任务时指定队列,则任务提交到指定队列运行; 若未指定队列, test 用户提交的任务到 root.group.test 队列运行, atguigu 提交的任务到 root.group.atguigu 队列运行
公平调度器的配置涉及到两个文件,一个是 yarn-site.xml,另一个是公平调度器队列分配文件 fair-scheduler.xml(文件名可自定义)
修改 yarn-site.xml 文件,指定fair调度器的配置文件位置
yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairS cheduler 配置使用公平调度器 yarn.scheduler.fair.allocation.file /opt/module/hadoop-3.1.3/etc/hadoop/fair-scheduler.xml 指明公平调度器队列分配配置文件 yarn.scheduler.fair.preemption false 禁止队列间资源抢占
配置 fair-scheduler.xml
0.5 4096mb,4vcores 2048mb,2vcores 4096mb,4vcores 4 0.5 1.0 fair 2048mb,2vcores 4096mb,4vcores 4 0.5 1.0 fair
分发配置并重启 Yarn 测试
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar pi - Dmapreduce.job.queuename=root.test 1 1
不指定队列会提交到和用户名匹配的队列



