由于 LLAP 服务一直运行不释放。整个集群可以有一个 LLAP 服务,也可以有多个 LLAP 服务。提交LLAP 服务时指定 LLAP 到哪个队列。每个 LLAP 都有唯一的名称,用户提交作业时指定提交到哪个 LLAP中。
生成LLAP 服务程序每个用户都可以执行生成 LLAP 服务程序,运行此程序,仅仅根据参数生成运行 LLAP 需要的程序和配置。
hive --service llap --name llap-demo --instances 1 --cache 128m --executors 3 --iothreads 2 --size 1024m --xmx 512m --queue default --loglevel INFO重点的参数
| 参数名称 | 参数说明 |
|---|---|
| service | llap, 调用 hive 的 llap service,这个是固定值 |
| name | LLAP 的名称,必须唯一(所有的 LLAP 服务必须用不同的名称)。由于 LLAP 使用 Zookeeper 做服务发现,启动此 LLAP 服务时,注册到 Zookeeper 的相关目录里。 |
| instances | 容器的个数 |
| cache | 缓存的大小 |
| executors | 一个容器内的执行线程数,一个 执行线程处理一个 Task。 |
| iothreads | 读取数据线程和执行线程是不同的线程。读取数据线程读取数据,并准备成执行线程所需要的列执行的格式 |
| size | 容器的内存大小,指向 ResourceManager 申请容器的大小。 |
| xmx | 容器的堆内存大小 |
| queue | 此 LLAP 服务提交到哪个队列里。 |
| loglevel | 容器的日志级别 |
usage: llap -a,--args java arguments to the llap instance -auxhive,--auxhive whether to package the Hive aux jars (true by default) -b,--service-am-container-mb The size of the service AppMaster container in MB -c,--cache生成的文件cache size per instance -d,--directory Temp directory for jars etc. -e,--executors executor per instance -H,--help Print help information -h,--auxhbase whether to package the Hbase jars (true by default) --health-init-delay-secs Delay in seconds after which health percentage is monitored (Default: 400) --health-percent Percentage of running containers after which LLAP application is considered healthy (Default: 80) --health-time-window-secs Time window in seconds (after initial delay) for which LLAP application is allowed to be in unhealthy state before being killed (Default: 300) --hiveconf Use value for given property. Overridden by explicit parameters -i,--instances Specify the number of instances to run this on -j,--auxjars additional jars to package (by default, JSON SerDe jar is packaged if available) --javaHome Path to the JRE/JDK. This should be installed at the same location on all cluster nodes ($JAVA_HOME, java.home by default) -l,--loglevel log levels for the llap instance --logger logger for llap instance ([RFA], query-routing, console -n,--name Cluster name for YARN registry --output
执行之后,生成如 “llap-yarn-29Sep2021” 的目录,以当前日期为后缀。里面有三个文件:
- Yarnfile : Yarn Service 的定义文件。
run.sh: 执行此命令启动 LLAP 服务。
llap-29Sep2021.tar.gz: LLAP 服务用的 jar 包。
Yarnfile 的内容如下:
{
"name": "llap-demo",
"version": "1.0.0",
"queue": "",
"configuration": {
"properties": {
"yarn.service.rolling-log.include-pattern": ".*\.done",
"yarn.component.placement.policy" : "4",
"yarn.container.health.threshold.percent": "80",
"yarn.container.health.threshold.window.secs": "300",
"yarn.container.health.threshold.init.delay.secs": "400"
}
},
"components": [
{
"name": "llap",
"number_of_containers": 1,
"launch_command": "$LLAP_DAEMON_BIN_HOME/llapDaemon.sh start &> $LLAP_DAEMON_TMP_DIR/shell.out",
"artifact": {
"id": ".yarn/package/LLAP/llap-29Sep2021.tar.gz",
"type": "TARBALL"
},
"resource": {
"cpus": 1,
"memory": "1024"
},
"configuration": {
"env": {
"JAVA_HOME": "/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home",
"LLAP_DAEMON_HOME": "$PWD/lib/",
"LLAP_DAEMON_TMP_DIR": "$PWD/tmp/",
"LLAP_DAEMON_BIN_HOME": "$PWD/lib/bin/",
"LLAP_DAEMON_CONF_DIR": "$PWD/lib/conf/",
"LLAP_DAEMON_LOG_DIR": "",
"LLAP_DAEMON_LOGGER": "query-routing",
"LLAP_DAEMON_LOG_LEVEL": "INFO",
"LLAP_DAEMON_HEAPSIZE": "512",
"LLAP_DAEMON_PID_DIR": "$PWD/lib/app/run/",
"LLAP_DAEMON_LD_PATH": "/usr/local/hadoop/lib/native",
"LLAP_DAEMON_OPTS": " -Dhttp.maxConnections=4 ",
"APP_ROOT": "/app/install/",
"APP_TMP_DIR": "/tmp/"
}
}
}
],
"kerberos_principal" : {
"principal_name" : "",
"keytab" : ""
},
"quicklinks": {
"LLAP Daemon JMX Endpoint": "http://llap-0.${SERVICE_NAME}.${USER}.${DOMAIN}:15002/jmx"
}
}
run.sh
run.sh 先 stop 服务,然后 destroy, 然后重新执行。
#!/bin/bash -e baseDIR=$(dirname $0) yarn app -stop llap-demo yarn app -destroy llap-demo hdfs dfs -mkdir -p .yarn/package/LLAP hdfs dfs -copyFromLocal -f $baseDIR/llap-29Sep2021.tar.gz .yarn/package/LLAP yarn app -launch llap-demo $baseDIR/Yarnfilellap-${CREATE_DATE}.tar.gz
对 llap-${CREATE_DATE}.tar.gz 解压
bin包含 service 的运行命令
- llap-daemon-env.sh
- llapDaemon.sh
- runLlapDaemon.sh
生成 service 的参数,都以 JSON 的格式放到此文件里。
conf 生成的配置目录。其中 llap-daemon-site.xml 包含 LLAP 的参数。包括core-site.xml hive-site.xml llap-udfs.lst hadoop-metrics2.properties llap-daemon-log4j2.properties tez-site.xml hdfs-site.xml llap-daemon-site.xml yarn-site.xmlllap-daemon-site.xml
可以看到,我们命令中输入的参数生成了 llap 服务配置的参数。
libhive.llap.daemon.service.hosts @llap-demo false CLI direct hive.llap.io.memory.size 134217728 false CLI direct hive.llap.daemon.yarn.container.mb 1024 false CLI direct hive.llap.io.threadpool.size 2 false CLI direct hive.llap.daemon.num.executors 3 false CLI direct hive.llap.daemon.memory.per.instance.mb 512 false CLI direct
lib 目录是运行 llap 的 jar 包。
运行 Service执行 run.sh 文件,可以看到 RerouceManager 上出现了一个 Application。
| Port | Parameter | Mean |
|---|---|---|
| 15002 | hive.llap.daemon.web.port | LLAP daemon web UI port. |
| 15003 | hive.llap.daemon.output.service.port | LLAP daemon output service port |
| 15004 | hive.llap.management.rpc.port | RPC port for LLAP daemon management service. |
| 15551 | hive.llap.daemon.yarn.shuffle.port | YARN shuffle port for LLAP-daemon-hosted shuffle. |
| 0 | hive.llap.daemon.rpc.port | The LLAP daemon RPC port. |
从以下可以看到,每个 LLAP 服务都在 /llap-unsecure 的当前用户下有一个目录。workers目录下有两个文件,一个是 slot 文件,一个是 worker 文件。每个容器一个 slot znode,一个 worker znode。打开 slot znode,有一个 UUID。打开 worker znode,有LLAP 容器的相关信息,并且信息中有 “registry.unique.id”:“34850c09-d8b1-415b-8572-139456d476fc” 和 slot znode 的内容对应。
[zk: localhost:2181(CONNECTED) 6] get /llap-unsecure/user-houzhizhen/llap-demo/workers/slot-0000000000
34850c09-d8b1-415b-8572-139456d476fc
[zk: localhost:2181(CONNECTED) 7] get /llap-unsecure/user-houzhizhen/llap-demo/workers/worker-0000000026
{"type":"JSONServiceRecord","external":[{"api":"services","addressType":"uri","protocolType":"webui","addresses":[{"uri":"http://localhost:15002"}]}],"internal":[{"api":"llap","addressType":"host/port","protocolType":"hadoop/IPC","addresses":[{"host":"localhost","port":"46480"}]},{"api":"llapmng","addressType":"host/port","protocolType":"hadoop/IPC","addresses":[{"host":"localhost","port":"15004"}]},{"api":"shuffle","addressType":"host/port","protocolType":"tcp","addresses":[{"host":"localhost","port":"15551"}]},{"api":"llapoutputformat","addressType":"host/port","protocolType":"hadoop/IPC","addresses":[{"host":"localhost","port":"15003"}]}],"hive.llap.daemon.container.id":"container_1632897605333_0007_01_000002","hive.llap.daemon.yarn.container.mb":"2048","hive.llap.auto.auth":"false","hive.llap.io.allocator.mmap":"false","hive.llap.io.use.lrfu":"true","hive.llap.io.memory.size":"134217728","hive.llap.management.rpc.port":"15004","hive.llap.allow.permanent.fns":"true","hive.llap.daemon.rpc.port":"46480","hive.llap.daemon.web.ssl":"false","hive.llap.auto.max.input.size":"10737418240","hive.llap.io.lrfu.lambda":"1.0E-6","hive.llap.daemon.nm.address":"localhost:38742","llap.daemon.metrics.sessionid":"40fc27da-f0d3-458b-9059-d46c8dc32132","hive.llap.auto.enforce.vectorized":"true","hive.llap.daemon.service.refresh.interval.sec":"60s","hive.llap.io.orc.time.counters":"true","hive.llap.auto.max.output.size":"1073741824","hive.llap.io.allocator.direct":"true","registry.unique.id":"34850c09-d8b1-415b-8572-139456d476fc","hive.llap.daemon.web.port":"15002","hive.llap.object.cache.enabled":"true","hive.llap.execution.mode":"all","hive.llap.daemon.yarn.shuffle.port":"15551","hive.llap.daemon.output.service.port":"15003","hive.llap.daemon.download.permanent.fns":"false","hive.llap.io.memory.mode":"cache","hive.llap.daemon.task.scheduler.wait.queue.size":"10","hive.llap.daemon.memory.per.instance.mb":"1024","hive.llap.auto.enforce.tree":"true","hive.llap.io.threadpool.size":"2","hive.llap.daemon.service.hosts":"@llap-demo","hive.llap.auto.enforce.stats":"true","hive.llap.auto.allow.uber":"false","hive.llap.daemon.num.executors":"1"}
Hive测试
hive-site.xml 添加以下配置,注意 hive.llap.daemon.service.hosts 必须是 “@” + ${LLAP_SERVICE_NAME}
LLAP 测试hive.execution.engine tez hive.llap.execution.mode all hive.execution.mode llap hive.llap.daemon.service.hosts @llap-demo hive.zookeeper.quorum zk_ip:zk_port hive.llap.daemon.memory.per.instance.mb 2048 hive.llap.daemon.num.executors 2 hive.server2.tez.default.queues root.default hive.server2.tez.initialize.default.sessions true hive.server2.tez.sessions.per.default.queue 2
我们用 tpch-ds 测试,执行三次 query1.sql。
use tpcds_bin_partitioned_orc_2; source query1.sql; source query1.sql; source query1.sql;
我们发现,第一次执行用 7.47 秒,第2次执行用 4.31 秒,第 3 次执行用 4.05 秒。因为第 1 次执行后,LLAP 把一些原始数据缓冲到堆外内存里。
set hive.execution.mode=tez; set hive.llap.execution.mode=none; use tpcds_bin_partitioned_orc_2; source query1.sql; source query1.sql; source query1.sql;
为了公平,测试之前先杀死 LLAP 的资源。
第 1 次运行。
第 2 次运行:
第 3 次运行:
可以看出,每次运行都用 11 秒左右。
- 没办法指定容器的cpu 的 vcores 数量。我们指定 executors 参数,是控制启动后的容器中,启动多少个计算线程,并不控制从 ResourceManager 中申请多少 CPU 资源。向 ResourceManager 申请的 CPU 资源,是生成的 Yarnfile 中的以下参数控制。
"resource": {
"cpus": 1,
"memory": "1024"
},
- 不能在一台服务器上启动两个 LLAP 服务。
因为
- 用户自定义 jar 包。



