一 基础
1.1 概述1.2 架构
1.2.1 Agent1.2.2 Source1.2.3 Channel1.2.4 Sink1.2.5 Event 1.3 事务1.4 拓扑结构
1.4.1 简单串联1.4.2 复制和多路复用1.4.3 负载均衡和故障转移1.4.4 聚合 二 下载安装三 案例
3.1 官方案例3.2 复制和多路复用3.3 负载均衡和故障转移3.4 聚合3.5 自定义拦截器
3.5.1 编码3.5.2 配置3.5.3 测试 3.6 自定义source
3.6.1 编码3.6.2 配置3.6.3 测试 3.7 自定义sink
3.7.1 编码3.7.2 配置3.7.3 测试3.7.3 测试
一 基础 1.1 概述1.2 架构 1.2.1 AgentFlume 是一种分布式的、高可靠的、高可用的服务,用于高效收集、聚合和移动大量日志数据。它具有基于数据流的简单灵活的架构。它具有可调整的可靠性机制以及许多故障转移和恢复机制,具有健壮性和容错性。它使用一个简单的可扩展数据模型,允许在线分析应用程序。
Agent是一个JVM进程,它以事件的形式将数据从源头送达目的地。
Agent主要由三个部分组成:Source,Channel,Sink。
Agent 内部原理:
ChannelSelector(选择器):
ChannelSelector 的作用就是选出 Event 将要被发往哪个 Channel。其共有两种类型, 分别是 Replicating(复制)和 Multiplexing(多路复用)。
ReplicatingSelector 会将同一个 Event 发往所有的 Channel,Multiplexing 会根据相 应的原则,将不同的 Event 发往不同的 Channel。
SinkProcessor :
1.2.2 SourceSinkProcessor 共 有 三 种 类 型 , 分 别 是 DefaultSinkProcessor 、 LoadBalancingSinkProcessor 和 FailoverSinkProcessor
DefaultSinkProcessor 对 应 的 是 单 个 的 Sink , LoadBalancingSinkProcessor 和 FailoverSinkProcessor 对应的是 Sink Group,LoadBalancingSinkProcessor 可以实现负载均衡的功能,FailoverSinkProcessor 可以错误恢复的功能(高可用、故障转移)。
Source 是负责接收数据到 Flume Agent 的组件。Source 组件可以处理各种类型、各种格式的日志数据,包括 avro、thrift、exec、jms、spooling directory、netcat、taildir、 sequence generator、syslog、http、legacy。
avro:接受来自其他flume的数据exec:执行shell命令,可以用来获取本地数据netcat:监控端口数据 1.2.3 Channel
Channel 是位于 Source 和 Sink 之间的缓冲区。因此,Channel 允许 Source 和 Sink 运作在不同的速率上。Channel 是线程安全的,可以同时处理几个 Source 的写入操作和几个 Sink 的读取操作。
Flume 自带两种 Channel:Memory Channel 和 File Channel。
Memory Channel:是内存中的队列。Memory Channel 在不需要关心数据丢失的情景下适用。如果需要关心数据丢失,那么 Memory Channel 就不应该使用,因为程序死亡、机器宕机或者重启都会导致数据丢失。File Channel:将所有事件写到磁盘。因此在程序关闭或机器宕机的情况下不会丢失数据。 1.2.4 Sink
1.2.5 EventSink 不断地轮询 Channel 中的事件且批量地移除它们,并将这些事件批量写入到存储或索引系统、或者被发送到另一个 Flume Agent。
Sink 组件目的地包括 hdfs、logger、avro、thrift、ipc、file、Hbase、solr、自定 义。
1.3 事务 1.4 拓扑结构 1.4.1 简单串联传输单元,Flume 数据传输的基本单元,以 Event 的形式将数据从源头送至目的地。 Event 由 Header 和 Body 两部分组成,Header 用来存放该 event 的一些属性,为 K-V 结构, Body 用来存放该条数据,形式为字节数组。
1.4.2 复制和多路复用这种模式是将多个 flume 顺序连接起来了,从最初的 source 开始到最终 sink 传送的 目的存储系统。此模式不建议桥接过多的 flume 数量, flume 数量过多不仅会影响传输速 率,而且一旦传输过程中某个节点 flume 宕机,会影响整个传输系统。
1.4.3 负载均衡和故障转移Flume 支持将事件流向一个或者多个目的地。这种模式可以将相同数据复制到多个 channel 中,或者将不同数据分发到不同的 channel 中,sink 可以选择传送到不同的目的地。
1.4.4 聚合Flume支持使用将多个sink逻辑上分到一个sink组,sink组配合不同的SinkProcessor 可以实现负载均衡和错误恢复的功能。
二 下载安装这种模式是我们最常见的,也非常实用,日常 web 应用通常分布在上百个服务器,大者甚至上千个、上万个服务器。产生的日志,处理起来也非常麻烦。用 flume 的这种组合方式能很好的解决这一问题,每台服务器部署一个 flume 采集日志,传送到一个集中收集日志的flume,再由此 flume 上传到 hdfs、hive、hbase 等,进行日志分析。
前提:Java 运行时环境 - Java 1.8 或更高版本
从官网下载
解压
添加环境变量
export FLUME_HOME=/usr/local/big_data/apache-flume-1.9.0-bin export PATH=$PATH:$FLUME_HOME/bin
删除 lib 目录下的 guava-11.0.2.jar,因为它与hadoop的 guava 版本冲突。
rm $FLUME_HOME/lib/guava-11.0.2.jar
使用Flume监听一个端口,收集该端口数据,并打印到控制台。
# 安装netcat工具,一个轻量级通信工具 yum install nc -y # 判断端口是否被占用 netstat -nlp | grep 44444 # 在 flume 根目录下创建 mkdir job cd job vim flume-netcat-logger.conf
# 定义 Agent 上的组件 a1.sources = r1 a1.sinks = k1 a1.channels = c1 # 配置 source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # 配置 sink a1.sinks.k1.type = logger # 配置 channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # 将source和sink绑定到channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
# 启动 flume bin/flume-ng agent -c conf/ -n a1 -f job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console # 启动 netcat,可以在 flume 接收到在 netcat 输入的数据 nc localhost 444443.2 复制和多路复用
使用 Flume-1 监控文件变动,Flume-1 将变动内容传递给 Flume-2 和 Flume-3,Flume-2 负责存储到 HDFS。同时Flume-3 负责输出到 Local FileSystem。
Flume-1:flume-file-flume.conf
# Name the components on this agent a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 c2 # 将数据流复制给所有 channel a1.sources.r1.selector.type = replicating # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F $HIVE_HOME/logs/hive.log a1.sources.r1.shell = /bin/bash -c # Describe the sink # sink 端的 avro 是一个数据发送者 a1.sinks.k1.type = avro a1.sinks.k1.hostname = hadoop102 a1.sinks.k1.port = 4141 a1.sinks.k2.type = avro a1.sinks.k2.hostname = hadoop102 a1.sinks.k2.port = 4142 # Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.channels.c2.type = memory a1.channels.c2.capacity = 1000 a1.channels.c2.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 c2 a1.sinks.k1.channel = c1 a1.sinks.k2.channel = c2
Flume-2: flume-flume-hdfs.conf
# Name the components on this agent a2.sources = r1 a2.sinks = k1 a2.channels = c1 # Describe/configure the source # source 端的 avro 是一个数据接收服务 a2.sources.r1.type = avro a2.sources.r1.bind = hadoop102 a2.sources.r1.port = 4141 # Describe the sink a2.sinks.k1.type = hdfs a2.sinks.k1.hdfs.path = hdfs://mycluster/flume2/%Y%m%d/%H #上传文件的前缀 a2.sinks.k1.hdfs.filePrefix = flume2- #是否按照时间滚动文件夹 a2.sinks.k1.hdfs.round = true #多少时间单位创建一个新的文件夹 a2.sinks.k1.hdfs.roundValue = 1 #重新定义时间单位 a2.sinks.k1.hdfs.roundUnit = hour #是否使用本地时间戳 a2.sinks.k1.hdfs.useLocalTimeStamp = true #积攒多少个 Event 才 flush 到 HDFS 一次 a2.sinks.k1.hdfs.batchSize = 100 #设置文件类型,可支持压缩 a2.sinks.k1.hdfs.fileType = DataStream #多久生成一个新的文件 a2.sinks.k1.hdfs.rollInterval = 30 #设置每个文件的滚动大小大概是 128M a2.sinks.k1.hdfs.rollSize = 134217700 #文件的滚动与 Event 数量无关 a2.sinks.k1.hdfs.rollCount = 0 # Describe the channel a2.channels.c1.type = memory a2.channels.c1.capacity = 1000 a2.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a2.sources.r1.channels = c1 a2.sinks.k1.channel = c1
Flume-3:flume-flume-dir.conf
# Name the components on this agent a3.sources = r1 a3.sinks = k1 a3.channels = c2 # Describe/configure the source a3.sources.r1.type = avro a3.sources.r1.bind = hadoop102 a3.sources.r1.port = 4142 # Describe the sink a3.sinks.k1.type = file_roll # 输出的本地目录必须是已经存在的目录,如果该目录不存在,并不会创建新的目录 a3.sinks.k1.sink.directory = /tmp/data/flume3 # Describe the channel a3.channels.c2.type = memory a3.channels.c2.capacity = 1000 a3.channels.c2.transactionCapacity = 100 # Bind the source and sink to the channel a3.sources.r1.channels = c2 a3.sinks.k1.channel = c2
# 启动 flume-ng agent -c $FLUME_HOME/conf/ -n a3 -f $FLUME_HOME/job/group1/flume-flume-dir.conf flume-ng agent -c $FLUME_HOME/conf/ -n a2 -f $FLUME_HOME/job/group1/flume-flume-hdfs.conf flume-ng agent -c $FLUME_HOME/conf/ -n a1 -f $FLUME_HOME/job/group1/flume-file-flume.conf3.3 负载均衡和故障转移
使用 Flume1 监控一个端口,其 sink 组中的 sink 分别对接 Flume2 和 Flume3,采用 FailoverSinkProcessor,实现故障转移的功能。
Flume1:flume-netcat-flume.conf
# Name the components on this agent a1.sources = r1 a1.channels = c1 a1.sinkgroups = g1 a1.sinks = k1 k2 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # 负载均衡 # a1.sinkgroups.g1.processor.type = load_balance # a1.sinkgroups.g1.processor.backoff = true # 默认为轮询。random为随机 # a1.sinkgroups.g1.processor.selector = random # 故障转移 a1.sinkgroups.g1.processor.type = failover # 优先级 a1.sinkgroups.g1.processor.priority.k1 = 5 a1.sinkgroups.g1.processor.priority.k2 = 10 a1.sinkgroups.g1.processor.maxpenalty = 10000 # Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.hostname = hadoop102 a1.sinks.k1.port = 4141 a1.sinks.k2.type = avro a1.sinks.k2.hostname = hadoop102 a1.sinks.k2.port = 4142 # Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinkgroups.g1.sinks = k1 k2 a1.sinks.k1.channel = c1 a1.sinks.k2.channel = c1
Flume2:flume-flume-console1.conf
# Name the components on this agent a2.sources = r1 a2.sinks = k1 a2.channels = c1 # Describe/configure the source a2.sources.r1.type = avro a2.sources.r1.bind = hadoop102 a2.sources.r1.port = 4141 # Describe the sink a2.sinks.k1.type = logger # Describe the channel a2.channels.c1.type = memory a2.channels.c1.capacity = 1000 a2.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a2.sources.r1.channels = c1 a2.sinks.k1.channel = c1
Flume3:flume-flume-console2.conf
# Name the components on this agent a3.sources = r1 a3.sinks = k1 a3.channels = c2 # Describe/configure the source a3.sources.r1.type = avro a3.sources.r1.bind = hadoop102 a3.sources.r1.port = 4142 # Describe the sink a3.sinks.k1.type = logger # Describe the channel a3.channels.c2.type = memory a3.channels.c2.capacity = 1000 a3.channels.c2.tranactionCapacity = 100 # Bind the source and sink to the channel a3.sources.r1.channels = c2 a3.sinks.k1.channel = c2
# 启动 flume-ng agent -c $FLUME_HOME/conf/ -n a3 -f $FLUME_HOME/job/group2/flume-flume-console2.conf -Dflume.root.logger=INFO,console flume-ng agent -c $FLUME_HOME/conf/ -n a2 -f $FLUME_HOME/job/group2/flume-flume-console1.conf -Dflume.root.logger=INFO,console flume-ng agent -c $FLUME_HOME/conf/ -n a1 -f $FLUME_HOME/job/group2/flume-netcat-flume.conf3.4 聚合
hadoop102 上的 Flume-1 监控文件/tmp/data/group.log,hadoop103 上的 Flume-2 监控某一个端口的数据流,Flume-1 与 Flume-2 将数据发送给 hadoop104 上的 Flume-3,Flume-3 将最终数据打印到控制台。
Flume-1:flume1-logger-flume.conf
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F /tmp/data/group.log a1.sources.r1.shell = /bin/bash -c # Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.hostname = hadoop104 a1.sinks.k1.port = 4141 # Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
Flume-2:flume2-netcat-flume.conf
# Name the components on this agent a2.sources = r1 a2.sinks = k1 a2.channels = c1 # Describe/configure the source a2.sources.r1.type = netcat a2.sources.r1.bind = hadoop103 a2.sources.r1.port = 44444 # Describe the sink a2.sinks.k1.type = avro a2.sinks.k1.hostname = hadoop104 a2.sinks.k1.port = 4141 # Use a channel which buffers events in memory a2.channels.c1.type = memory a2.channels.c1.capacity = 1000 a2.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a2.sources.r1.channels = c1 a2.sinks.k1.channel = c1
Flume-3:flume3-flume-logger.conf
# Name the components on this agent a3.sources = r1 a3.sinks = k1 a3.channels = c1 # Describe/configure the source a3.sources.r1.type = avro a3.sources.r1.bind = hadoop104 a3.sources.r1.port = 4141 # Describe the sink a3.sinks.k1.type = logger # Describe the channel a3.channels.c1.type = memory a3.channels.c1.capacity = 1000 a3.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a3.sources.r1.channels = c1 a3.sinks.k1.channel = c1
# 启动 flume-ng agent -c $FLUME_HOME/conf/ -n a3 -f $FLUME_HOME/job/group3/flume3-flume-logger.conf -Dflume.root.logger=INFO,console flume-ng agent -c $FLUME_HOME/conf/ -n a2 -f $FLUME_HOME/job/group3/flume2-netcat-flume.conf flume-ng agent -c $FLUME_HOME/conf/ -n a1 -f $FLUME_HOME/job/group3/flume1-logger-flume.conf # 测试 echo 'hello' >> /tmp/data/group.log nc hadoop103 444443.5 自定义拦截器
3.5.1 编码在hadoop102启动flume-1监控 hadoop102 的端口 44444,如果数据中包含‘a’,则在 hadoop103 启动flume-2将数据打印到控制台,如果数据中包含‘null’,则直接过滤掉,其他数据在hadoop104启动flume-3打印到控制台。
引入依赖
4.0.0 org.apache.flume flume-ng-core 1.9.0
编写过滤器
package com.guoli.flume.interceptor;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
public class TypeInterceptor implements Interceptor {
private List addHeaderEvents;
@Override
public void initialize() {
// 初始化存放拦截器处理完成的事件的集合
addHeaderEvents = new ArrayList<>();
}
@Override
public Event intercept(Event event) {
// 1.获取事件中的头信息
Map headers = event.getHeaders();
// 2.获取事件中的 body 信息
String body = new String(event.getBody());
// 3.如果 body 包含 null ,则过滤掉
if (body.contains("null")) {
return null;
}
// 4.根据 body 中是否有"a"来决定添加怎样的头信息
if (body.contains("a")) {
// 4.添加头信息
headers.put("type", "a");
}
return event;
}
@Override
public List intercept(List events) {
// 1.清空集合
addHeaderEvents.clear();
// 2.遍历 events
for (Event event : events) {
// 3.给每一个事件添加头信息
addHeaderEvents.add(intercept(event));
}
// 4.返回结果
return addHeaderEvents;
}
@Override
public void close() {
}
public static class Builder implements Interceptor.Builder {
@Override
public Interceptor build() {
return new TypeInterceptor();
}
@Override
public void configure(Context context) {
}
}
}
打包后,放入 $FLUME_HOME/lib 目录下
flume-1:flume1-netcat-flume.conf
# Name the components on this agent a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 c2 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # 拦截器配置 # 指定拦截器,可以多个 a1.sources.r1.interceptors = i1 # 指定拦截器 i1 的全类名 a1.sources.r1.interceptors.i1.type = com.guoli.flume.interceptor.TypeInterceptor$Builder a1.sources.r1.selector.type = multiplexing # 指定拦截器中使用的 header 中的 key a1.sources.r1.selector.header = type # 指定拦截器中使用的 header 中的 value 为 a 的数据所绑定的 channel a1.sources.r1.selector.mapping.a = c1 # 指定数据默认绑定的 channel a1.sources.r1.selector.default = c2 # Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.hostname = hadoop103 a1.sinks.k1.port = 4141 a1.sinks.k2.type=avro a1.sinks.k2.hostname = hadoop104 a1.sinks.k2.port = 4242 # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Use a channel which buffers events in memory a1.channels.c2.type = memory a1.channels.c2.capacity = 1000 a1.channels.c2.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 c2 a1.sinks.k1.channel = c1 a1.sinks.k2.channel = c2
flume-2:flume2-flume-logger.conf
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = avro a1.sources.r1.bind = hadoop103 a1.sources.r1.port = 4141 a1.sinks.k1.type = logger a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sinks.k1.channel = c1 a1.sources.r1.channels = c1
flume-3:flume3-flume-logger.conf
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = avro a1.sources.r1.bind = hadoop104 a1.sources.r1.port = 4242 a1.sinks.k1.type = logger a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sinks.k1.channel = c1 a1.sources.r1.channels = c13.5.3 测试
# hadoop104启动 flume-ng agent -n a1 -c $FLUME_HOME/conf -f $FLUME_HOME/job/group4/flume3-flume-logger.conf -Dflume.root.logger=INFO,console # hadoop103启动 flume-ng agent -n a1 -c $FLUME_HOME/conf -f $FLUME_HOME/job/group4/flume2-flume-logger.conf -Dflume.root.logger=INFO,console # hadoop102启动 flume-ng agent -n a1 -c $FLUME_HOME/conf -f $FLUME_HOME/job/group4/flume1-netcat-flume.conf # hadoop102启动 nc localhost 444443.6 自定义source
官方文档
3.6.1 编码自定义source,内容为0-4,在配置文件配置前缀,输出到控制台。
引入依赖
4.0.0 org.apache.flume flume-ng-core 1.9.0
编写自定义source
package com.guoli.flume.source;
import org.apache.flume.Context;
import org.apache.flume.PollableSource;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;
import java.util.HashMap;
import java.util.Map;
public class MySource extends AbstractSource implements Configurable, PollableSource {
private long delay;
private String prefix;
@Override
public Status process() {
try {
// 创建事件头信息
Map headerMap = new HashMap<>();
// 创建事件
SimpleEvent event = new SimpleEvent();
// 循环封装事件
for (int i = 0; i < 5; i++) {
// 给事件设置头信息
event.setHeaders(headerMap);
// 给事件设置内容
event.setBody((prefix + i).getBytes());
// 将事件写入 channel
getChannelProcessor().processEvent(event);
Thread.sleep(delay);
}
} catch (Exception e) {
e.printStackTrace();
return Status.BACKOFF;
}
return Status.READY;
}
@Override
public long getBackOffSleepIncrement() {
return 0;
}
@Override
public long getMaxBackOffSleepInterval() {
return 0;
}
@Override
public void configure(Context context) {
delay = context.getLong("delay");
prefix = context.getString("field", "Hello!");
}
}
打包后,放入 $FLUME_HOME/lib 目录下
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = com.guoli.flume.source.MySource a1.sources.r1.delay = 1000 a1.sources.r1.prefix = hello # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c13.6.3 测试
flume-ng agent -n a1 -c $FLUME_HOME/conf/ -f $FLUME_HOME/job/flume-my-source.conf -Dflume.root.logger=INFO,console3.7 自定义sink
3.7.1 编码使用 flume 接收数据,并在 Sink 端给每条数据添加前缀和后缀,输出到控制台。前后缀可在 flume 任务配置文件中配置。
引入依赖
4.0.0 org.apache.flume flume-ng-core 1.9.0
编写自定义sink
package com.guoli.flume.sink;
import groovy.util.logging.Slf4j;
import org.apache.flume.Channel;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.Transaction;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.mortbay.log.Log;
@Slf4j
public class MySink extends AbstractSink implements Configurable {
private String prefix;
private String suffix;
@Override
public Status process() {
// 声明返回值状态信息
Status status;
// 获取当前 Sink 绑定的 Channel
Channel channel = getChannel();
// 获取事务
Transaction transaction = channel.getTransaction();
// 声明事件
Event event;
//开启事务
transaction.begin();
// 读取 Channel 中的事件,直到读取到事件结束循环
do {
event = channel.take();
} while (event == null);
try {
// 处理事件(打印)
Log.info(prefix + new String(event.getBody()) + suffix);
// 事务提交
transaction.commit();
status = Status.READY;
} catch (Exception e) {
// 遇到异常,事务回滚
transaction.rollback();
status = Status.BACKOFF;
} finally {
//关闭事务
transaction.close();
}
return status;
}
@Override
public void configure(Context context) {
// 读取配置文件内容,有默认值
prefix = context.getString("prefix", "hello:");
// 读取配置文件内容,无默认值
suffix = context.getString("suffix");
}
}
打包后,放入 $FLUME_HOME/lib 目录下
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = com.guoli.flume.sink.MySink a1.sinks.k1.prefix = hello: a1.sinks.k1.suffix = :world # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c13.7.3 测试
flume-ng agent -n a1 -c $FLUME_HOME/conf/ -f$FLUME_HOME/job/flume-my-sink.conf -Dflume.root.logger=INFO,console
context) {
// 读取配置文件内容,有默认值
prefix = context.getString(“prefix”, “hello:”);
// 读取配置文件内容,无默认值
suffix = context.getString(“suffix”);
}
}
3. 打包后,放入 $FLUME_HOME/lib 目录下 ### 3.7.2 配置 ~~~shell # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = com.guoli.flume.sink.MySink a1.sinks.k1.prefix = hello: a1.sinks.k1.suffix = :world # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c13.7.3 测试
flume-ng agent -n a1 -c $FLUME_HOME/conf/ -f$FLUME_HOME/job/flume-my-sink.conf -Dflume.root.logger=INFO,console



