- 1.案例需求
- 2.需求分析
- 3.环境准备
- 4.实验步骤
- 4.1准备工作
- 4.2 创建 flume1-logger-flume.conf
- 4.3 创建 flume2-netcat-flume.conf
- 4.4 创建 flume3-flume-logger.conf
- 4.5 执行配置文件
- 4.6 在group.log上追加内容
- 4.7 在 YGS02 上向 44444 端口发送数据
- 4.8 检查 YGS04 上数据
- hadoop02(YGS02) 上的 Flume-1 监控文件/opt/module/data/group.log,
- hadoop03(YGS03) 上的 Flume-2 监控某一个端口的数据流,
- Flume-1 与 Flume-2 将数据发送给 hadoop04(YGS04) 上的 Flume-3,Flume-3 将最终数据打印到控 制台。
- 启动虚拟机
- 远程工具上传Flume-1.7.0并解压安装和部署到3台服务器
1) Flume 官网地址:http://flume.apache.org/
2)文档查看地址:http://flume.apache.org/FlumeUserGuide.html
3)下载地址:http://archive.apache.org/dist/flume/
- 解压 apache-flume-1.7.0-bin.tar.gz 到/software/spath目录下
[root@YGS02 ~]# cd /software/spath/ [root@YGS02 spath]# ll /software/spackage/ 总用量 257720 -rw-r--r--. 1 root root 55711670 10月 10 16:15 apache-flume-1.7.0-bin.tar.gz -rw-r--r--. 1 root root 9506321 3月 24 2021 apache-maven-3.6.3-bin.tar.gz -rw-r--r--. 1 root root 197657687 3月 18 2021 hadoop-2.7.2.tar.gz -rw-r--r--. 1 root root 1022881 3月 14 2021 nginx-1.15.3.tar.gz [root@YGS02 spath]# tar -zxvf /software/spackage/apache-flume-1.7.0-bin.tar.gz -C ./
- 修改 apache-flume-1.7.0-bin 的名称为 flume-1.7.0
[root@YGS02 spath]# mv ./apache-flume-1.7.0-bin ./flume-1.7.0 [root@YGS02 spath]# ll 总用量 0 drwxr-xr-x. 6 root root 235 9月 27 2020 apache-ant-1.10.9 drwxr-xr-x. 7 root root 114 3月 24 2021 apache-maven_3.6.3 drwxr-xr-x. 7 root root 187 10月 10 16:29 flume-1.7.0 drwxr-xr-x. 12 root root 185 3月 23 2021 hadoop_2.7.2 drwxr-xr-x. 8 root root 255 3月 16 2021 jdk1.8.0_171 drwxrwxr-x. 6 2000 2000 79 9月 15 04:21 scala-2.12.15 [root@YGS02 spath]#
- 将 flume/conf 下的 flume-env.sh.template 文件修改为 flume-env.sh,并配置flume-env.sh 文件
[root@YGS02 spath]# cd flume-1.7.0/conf/ [root@YGS02 conf]# ll 总用量 16 -rw-r--r--. 1 root root 1661 9月 26 2016 flume-conf.properties.template -rw-r--r--. 1 root root 1455 9月 26 2016 flume-env.ps1.template -rw-r--r--. 1 root root 1565 9月 26 2016 flume-env.sh.template -rw-r--r--. 1 root root 3107 9月 26 2016 log4j.properties [root@YGS02 conf]# cp flume-env.sh.template flume-env.sh [root@YGS02 conf]# ll 总用量 20 -rw-r--r--. 1 root root 1661 9月 26 2016 flume-conf.properties.template -rw-r--r--. 1 root root 1455 9月 26 2016 flume-env.ps1.template -rw-r--r--. 1 root root 1565 10月 10 16:32 flume-env.sh -rw-r--r--. 1 root root 1565 9月 26 2016 flume-env.sh.template -rw-r--r--. 1 root root 3107 9月 26 2016 log4j.properties [root@YGS02 conf]# vim flume-env.sh # 找到这段配置,去掉注释,JDK为自己的安装位置 # export JAVA_HOME=/usr/lib/jvm/java-6-sun export JAVA_HOME=/software/spath/jdk1.8.0_171
- 分发Flume到YGS03和YGS04
[root@YGS02 spath]# [root@YGS02 spath]# ll 总用量 0 drwxr-xr-x. 6 root root 235 9月 27 2020 apache-ant-1.10.9 drwxr-xr-x. 7 root root 114 3月 24 2021 apache-maven_3.6.3 drwxr-xr-x. 7 root root 187 10月 10 16:29 flume-1.7.0 drwxr-xr-x. 12 root root 185 3月 23 2021 hadoop_2.7.2 drwxr-xr-x. 8 root root 255 3月 16 2021 jdk1.8.0_171 drwxrwxr-x. 6 2000 2000 79 9月 15 04:21 scala-2.12.15 [root@YGS02 spath]# xsync ./flume-1.7.0/
7. 启动Hadoop集群
先启动HDFS
[root@YGS02 spath]# ll 总用量 0 drwxr-xr-x. 6 root root 235 9月 27 2020 apache-ant-1.10.9 drwxr-xr-x. 7 root root 114 3月 24 2021 apache-maven_3.6.3 drwxr-xr-x. 7 root root 187 10月 10 16:29 flume-1.7.0 drwxr-xr-x. 12 root root 185 3月 23 2021 hadoop_2.7.2 drwxr-xr-x. 8 root root 255 3月 16 2021 jdk1.8.0_171 drwxrwxr-x. 6 2000 2000 79 9月 15 04:21 scala-2.12.15 [root@YGS02 spath]# cd hadoop_2.7.2/ [root@YGS02 hadoop_2.7.2]# sbin/start-dfs.sh Starting namenodes on [YGS02] YGS02: starting namenode, logging to /software/spath/hadoop_2.7.2/logs/hadoop-root-namenode-YGS02.out YGS03: starting datanode, logging to /software/spath/hadoop_2.7.2/logs/hadoop-root-datanode-YGS03.out YGS04: starting datanode, logging to /software/spath/hadoop_2.7.2/logs/hadoop-root-datanode-YGS04.out YGS02: starting datanode, logging to /software/spath/hadoop_2.7.2/logs/hadoop-root-datanode-YGS02.out Starting secondary namenodes [YGS04] YGS04: starting secondarynamenode, logging to /software/spath/hadoop_2.7.2/logs/hadoop-root-secondarynamenode-YGS04.out [root@YGS02 hadoop_2.7.2]#
再启动YARN
[root@YGS03 spath]# [root@YGS03 spath]# cd hadoop_2.7.2/ [root@YGS03 hadoop_2.7.2]# sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /software/spath/hadoop_2.7.2/logs/yarn-root-resourcemanager-YGS03.out YGS02: starting nodemanager, logging to /software/spath/hadoop_2.7.2/logs/yarn-root-nodemanager-YGS02.out YGS04: starting nodemanager, logging to /software/spath/hadoop_2.7.2/logs/yarn-root-nodemanager-YGS04.out YGS03: starting nodemanager, logging to /software/spath/hadoop_2.7.2/logs/yarn-root-nodemanager-YGS03.out [root@YGS03 hadoop_2.7.2]#
在 YGS02、YGS03 以及 YGS04 的/software/spath/flume-1.7.0/job目录下创建一个 group3文件夹。
分别在三台虚拟机输入命令
[root@YGS02 spath]# cd /software/spath/flume-1.7.0/job [root@YGS02 job]# mkdir ./group3 [root@YGS03 spath]# cd /software/spath/flume-1.7.0/job [root@YGS03 job]# mkdir ./group3 [root@YGS04 spath]# cd /software/spath/flume-1.7.0/job [root@YGS04 job]# mkdir ./group34.2 创建 flume1-logger-flume.conf
配置 Source 用于监控 XXX.log 文件,配置 Sink 输出数据到下一级 Flume。
在 YGS02 上编辑配置文件
[root@YGS02 group3]# vim flume1-logger-flume.conf
写入下面内容
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F /software/spath/flume-1.7.0/mydatas/group.log a1.sources.r1.shell = /bin/bash -c # Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.hostname = YGS04 a1.sinks.k1.port = 4141 # Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c14.3 创建 flume2-netcat-flume.conf
配置 Source 监控端口 44444 数据流,配置 Sink 数据到下一级 Flume:
在 hadoop103 上编辑配置文件
[root@YGS03 group3]# vim flume2-netcat-flume.conf
写入如下内容
# Name the components on this agent a2.sources = r1 a2.sinks = k1 a2.channels = c1 # Describe/configure the source a2.sources.r1.type = netcat a2.sources.r1.bind = YGS03 a2.sources.r1.port = 44444 # Describe the sink a2.sinks.k1.type = avro a2.sinks.k1.hostname = YGS04 a2.sinks.k1.port = 4141 # Use a channel which buffers events in memory a2.channels.c1.type = memory a2.channels.c1.capacity = 1000 a2.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a2.sources.r1.channels = c1 a2.sinks.k1.channel = c14.4 创建 flume3-flume-logger.conf
配置 source 用于接收 flume1 与 flume2 发送过来的数据流,最终合并后 sink 到控制台。
在 YGS04 上编辑配置文件
[root@YGS04 group3]# vim flume3-netcat-flume.conf
写入如下内容
在这里插入代码片4.5 执行配置文件
分别开启对应配置文件:flume3-flume-logger.conf,flume2-netcat-flume.conf,flume1-logger-flume.conf。
[root@YGS04 flume-1.7.0]# bin/flume-ng agent --conf conf/ --name a3 --conf-file job/group3/flume3-flume-logger.conf -Dflume.root.logger=INFO,console [root@YGS02 flume-1.7.0]# bin/flume-ng agent --conf conf/ --name a2 --conf-file job/group3/flume1-logger-flume.conf [root@YGS03 flume-1.7.0]# bin/flume-ng agent --conf conf/ --name a1 --conf-file job/group3/flume2-netcat-flume.conf4.6 在group.log上追加内容
在 YGS03 上向/software/spath/flume-1.7.0/mydatas 目录下的 group.log 追加内容
[root@YGS03 mydatas]# echo 'Hello YGS 187701020038' > group.log4.7 在 YGS02 上向 44444 端口发送数据
[[root@YGS02 flume-1.7.0]# telnet YGS02 444444.8 检查 YGS04 上数据



