Flink 的AggregateFunction是一个基于中间计算结果状态进行增量计算的函数。由于是迭代计算方式,所以,在窗口处理过程中,不用缓存整个窗口的数据,所以效率执行比较高。
@PublicEvolving public interface AggregateFunctionextends Function, Serializable { ............................... }
自定义聚合函数需要实现AggregateFunction接口类,它有四个接口实现方法:
a.创建一个新的累加器,启动一个新的聚合,负责迭代状态的初始化
ACC createAccumulator();
b.对于数据的每条数据,和迭代数据的聚合的具体实现
ACC add(IN value, ACC accumulator);
c.合并两个累加器,返回一个具有合并状态的累加器
ACC merge(ACC a, ACC b);
d.从累加器获取聚合的结果
OUT getResult(ACC accumulator);
3.自定义聚合函数MyCountAggregate
-
package com.hadoop.ljs.flink110.aggreagate; import org.apache.flink.api.common.functions.AggregateFunction; public class MyCountAggregate implements AggregateFunction
{ @Override public Long createAccumulator() { return 0L; } @Override public Long add(ProductViewData value, Long accumulator) { return accumulator+1; } @Override public Long getResult(Long accumulator) { return accumulator; } @Override public Long merge(Long a, Long b) { return a+b; } } 4.自定义窗口函数
-
package com.hadoop.ljs.flink110.aggreagate; import org.apache.flink.streaming.api.functions.windowing.WindowFunction; import org.apache.flink.streaming.api.windowing.windows.TimeWindow; import org.apache.flink.util.Collector; public class MyCountWindowFunction2 implements WindowFunction
{ @Override public void apply(String productId, TimeWindow window, Iterable input, Collector out) throws Exception { out.collect("----------------窗口时间:"+window.getEnd()); out.collect("商品ID: "+productId+" 浏览量: "+input.iterator().next()); } 5.主函数,代码如下:
-
import org.apache.flink.api.common.functions.FilterFunction; import org.apache.flink.api.common.functions.MapFunction; import org.apache.flink.api.java.functions.KeySelector; import org.apache.flink.streaming.api.TimeCharacteristic; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.functions.timestamps.AscendingTimestampExtractor; import org.apache.flink.streaming.api.windowing.time.Time; public class AggregateFunctionMain2 { public static int windowSize=6000; public static int windowSlider=3000; public static void main(String[] args) throws Exception { StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment(); senv.setParallelism(1); senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); //从文件读取数据,也可以从socket读取数据 DataStreamsourceData = senv.readTextFile("D:\projectData\ProductViewData2.txt"); DataStream productViewData = sourceData.map(new MapFunction () { @Override public ProductViewData map(String value) throws Exception { String[] record = value.split(","); return new ProductViewData(record[0], record[1], Long.valueOf(record[2]), Long.valueOf(record[3])); } }).assignTimestampsAndWatermarks(new AscendingTimestampExtractor (){ @Override public long extractAscendingTimestamp(ProductViewData element) { return element.timestamp; } }); DataStream productViewCount = productViewData.filter(new FilterFunction () { @Override public boolean filter(ProductViewData value) throws Exception { if(value.operationType==1){ return true; } return false; } }).keyBy(new KeySelector () { @Override public String getKey(ProductViewData value) throws Exception { return value.productId; } //时间窗口 6秒 滑动间隔3秒 }).timeWindow(Time.milliseconds(windowSize), Time.milliseconds(windowSlider)) .aggregate(new MyCountAggregate(), new MyCountWindowFunction2()); //聚合结果输出 productViewCount.print(); senv.execute("AggregateFunctionMain2"); } } 这里自定义聚合函数演示完毕,感谢关注!!!



