之前扒源码看到过Flink的窗口有很多种:
package org.apache.flink.streaming.api.windowing.assigners; @PublicEvolving public abstract class WindowAssignerimplements Serializable { private static final long serialVersionUID = 1L; public abstract Collection assignWindows( T element, long timestamp, WindowAssignerContext context); public abstract Trigger getDefaultTrigger(StreamExecutionEnvironment env); public abstract TypeSerializer getWindowSerializer(ExecutionConfig executionConfig); public abstract boolean isEventTime(); public abstract static class WindowAssignerContext { public abstract long getCurrentProcessingTime(); } }
窗口算子需要传的参数WindowAssigner有如下实现类:
源码:
package org.apache.flink.streaming.api.datastream; public class KeyedStreamextends DataStream { // ------------------------------------------------------------------------ // Windowing // ------------------------------------------------------------------------ @Deprecated public WindowedStream timeWindow(Time size) { if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime) { return window(TumblingProcessingTimeWindows.of(size)); } else { return window(TumblingEventTimeWindows.of(size)); } } @Deprecated public WindowedStream timeWindow(Time size, Time slide) { if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime) { return window(SlidingProcessingTimeWindows.of(size, slide)); } else { return window(SlidingEventTimeWindows.of(size, slide)); } } public WindowedStream countWindow(long size) { return window(GlobalWindows.create()).trigger(PurgingTrigger.of(CountTrigger.of(size))); } public WindowedStream countWindow(long size, long slide) { return window(GlobalWindows.create()) .evictor(CountEvictor.of(size)) .trigger(CountTrigger.of(slide)); } @PublicEvolving public WindowedStream window( WindowAssigner super T, W> assigner) { return new WindowedStream<>(this, assigner); } }
与Window有关的算子有已经过时的2种timeWindow时间窗口、countWindow计数窗口,以及一个window窗口分配器算子。
过时的时间窗口算子以处理时间为例子。事件时间的情况与之类似。之后尽量使用新API,但不会深究老API与老的DataSet批处理。
滚动时间窗口package com.zhiyong.flinkStream;
import com.zhiyong.flinkStudy.FlinkWordCountDemo2FlatMapFunction;
import com.zhiyong.flinkStudy.WordCountSource1ps;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.table.planner.expressions.In;
import org.apache.flink.util.Collector;
public class WindowDemo {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
//过时方法,必须设置,才能正常使用算子,否则报错
DataStreamSource data = env.addSource(new WordCountSource1ps());
SingleOutputStreamOperator> result1 = data.flatMap(new FlatMapFunction1())
.keyBy(new KeySelector, Object>() {
@Override
public Object getKey(Tuple2 value) throws Exception {
return value.f0;
}
})
.timeWindow(Time.seconds(30))
//过时的滚动时间窗口
.sum(1);
result1.print("过时的滚动时间窗口");
env.execute("streaming");
}
private static class FlatMapFunction1 implements FlatMapFunction>{
@Override
public void flatMap(String value, Collector> out) throws Exception {
for(String cell:value.split("\s+")){
out.collect(Tuple2.of(cell,1));
}
}
}
}
由于自定义数据源不带时间戳,不使用上述过时方法设置env会报错:
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144) at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:258) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1389) at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93) at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47) at akka.dispatch.OnComplete.internal(Future.scala:300) at akka.dispatch.OnComplete.internal(Future.scala:297) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:621) at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:24) at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:23) at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:532) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:63) at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:100) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81) at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:100) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1067) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1703) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:172) Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:252) at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:242) at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:233) at org.apache.flink.runtime.scheduler.Schedulerbase.updateTaskExecutionState(Schedulerbase.java:684) at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316) at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at akka.actor.Actor.aroundReceive(Actor.scala:537) at akka.actor.Actor.aroundReceive$(Actor.scala:535) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580) at akka.actor.ActorCell.invoke(ActorCell.scala:548) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) at akka.dispatch.Mailbox.run(Mailbox.scala:231) at akka.dispatch.Mailbox.exec(Mailbox.scala:243) ... 4 more Caused by: java.lang.RuntimeException: Record has Long.MIN_VALUE timestamp (= no timestamp marker). Is the time characteristic set to 'ProcessingTime', or did you forget to call 'DataStream.assignTimestampsAndWatermarks(...)'? at org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows.assignWindows(TumblingEventTimeWindows.java:83) at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:293) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233) at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) at java.lang.Thread.run(Thread.java:750) Process finished with exit code 1
为了使用该过时的时间窗口算子,使用了该过时方法。设置后可以成功运行:
过时的滚动时间窗口> (zhiyong13,1) 过时的滚动时间窗口> (zhiyong6,1) 过时的滚动时间窗口> (zhiyong0,1) 过时的滚动时间窗口> (zhiyong1,1) 过时的滚动时间窗口> (zhiyong18,1) 过时的滚动时间窗口> (zhiyong11,1) 过时的滚动时间窗口> (zhiyong8,1) 过时的滚动时间窗口> (zhiyong4,2) 过时的滚动时间窗口> (zhiyong17,3) 过时的滚动时间窗口> (zhiyong9,2) 过时的滚动时间窗口> (zhiyong19,2) 过时的滚动时间窗口> (zhiyong15,1) 过时的滚动时间窗口> (zhiyong5,3) 过时的滚动时间窗口> (zhiyong16,3) 过时的滚动时间窗口> (zhiyong14,3) Process finished with exit code 130滑动时间窗口
SingleOutputStreamOperator> result2 = data.flatMap(new FlatMapFunction1()) .keyBy(new KeySelector , Object>() { @Override public Object getKey(Tuple2 value) throws Exception { return value.f0; } }) .timeWindow(Time.seconds(10),Time.seconds(5)) //过时的滑动时间窗口 .sum(1); result2.print("过时的滑动时间窗口");
不设置env同样会报错。设置后可以正常运行:
过时的滑动时间窗口> (zhiyong14,1) 过时的滑动时间窗口> (zhiyong14,1) 过时的滑动时间窗口> (zhiyong11,1) 过时的滑动时间窗口> (zhiyong4,1) 过时的滑动时间窗口> (zhiyong15,1) 过时的滑动时间窗口> (zhiyong3,1) 过时的滑动时间窗口> (zhiyong6,1) 过时的滑动时间窗口> (zhiyong6,1) 过时的滑动时间窗口> (zhiyong7,1) 过时的滑动时间窗口> (zhiyong11,1) 过时的滑动时间窗口> (zhiyong3,2) 过时的滑动时间窗口> (zhiyong16,1) 过时的滑动时间窗口> (zhiyong15,3) 过时的滑动时间窗口> (zhiyong4,1) 过时的滑动时间窗口> (zhiyong16,2) 过时的滑动时间窗口> (zhiyong3,1) 过时的滑动时间窗口> (zhiyong7,2) 过时的滑动时间窗口> (zhiyong1,1) 过时的滑动时间窗口> (zhiyong15,2) 过时的滑动时间窗口> (zhiyong11,1) 过时的滑动时间窗口> (zhiyong4,1) Process finished with exit code 130
大约每次打印5条数据,结合自定义数据源约是每秒1条,滑动窗口显然功能正确。
计数窗口算子 滚动计数窗口 env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
//过时方法,必须设置,才能正常使用算子,否则报错
SingleOutputStreamOperator> result3 = data.flatMap(new FlatMapFunction1())
.keyBy(new KeySelector, Object>() {
@Override
public Object getKey(Tuple2 value) throws Exception {
return value.f0;
}
})
.countWindow(5)
.sum(1);
result3.print("滚动计数窗口");
不设置env同样会报错。设置后可以正常运行:
滚动计数窗口> (zhiyong8,5) 滚动计数窗口> (zhiyong12,5) 滚动计数窗口> (zhiyong4,5) 滚动计数窗口> (zhiyong11,5) 滚动计数窗口> (zhiyong6,5) 滚动计数窗口> (zhiyong19,5) Process finished with exit code 130
每次都是凑齐5个才会打印。
滑动计数窗口SingleOutputStreamOperator> result4 = data.flatMap(new FlatMapFunction1()) .keyBy(new KeySelector , Object>() { @Override public Object getKey(Tuple2 value) throws Exception { return value.f0; } }) .countWindow(5,3) .sum(1); result4.print("滑动计数窗口");
同样是使用过时方法设置env,可以看到:
滑动计数窗口> (zhiyong0,3) 滑动计数窗口> (zhiyong8,3) 滑动计数窗口> (zhiyong17,3) 滑动计数窗口> (zhiyong7,3) 滑动计数窗口> (zhiyong9,3) 滑动计数窗口> (zhiyong12,3) 滑动计数窗口> (zhiyong18,3) 滑动计数窗口> (zhiyong1,3) 滑动计数窗口> (zhiyong10,3) 滑动计数窗口> (zhiyong4,3) 滑动计数窗口> (zhiyong16,3) 滑动计数窗口> (zhiyong19,3) 滑动计数窗口> (zhiyong9,5) 滑动计数窗口> (zhiyong8,5) 滑动计数窗口> (zhiyong17,5) 滑动计数窗口> (zhiyong1,5) 滑动计数窗口> (zhiyong19,5) 滑动计数窗口> (zhiyong9,5) 滑动计数窗口> (zhiyong5,3) 滑动计数窗口> (zhiyong3,3) 滑动计数窗口> (zhiyong11,3) 滑动计数窗口> (zhiyong12,5) 滑动计数窗口> (zhiyong17,5) 滑动计数窗口> (zhiyong18,5) 滑动计数窗口> (zhiyong2,3) 滑动计数窗口> (zhiyong1,5) 滑动计数窗口> (zhiyong10,5) 滑动计数窗口> (zhiyong4,5) 滑动计数窗口> (zhiyong12,5) 滑动计数窗口> (zhiyong7,5) 滑动计数窗口> (zhiyong8,5) 滑动计数窗口> (zhiyong19,5) 滑动计数窗口> (zhiyong11,5) 滑动计数窗口> (zhiyong1,5) 滑动计数窗口> (zhiyong17,5) 滑动计数窗口> (zhiyong9,5) 滑动计数窗口> (zhiyong0,5) 滑动计数窗口> (zhiyong3,5) 滑动计数窗口> (zhiyong14,3) 滑动计数窗口> (zhiyong2,5) 滑动计数窗口> (zhiyong19,5) 滑动计数窗口> (zhiyong4,5) 滑动计数窗口> (zhiyong5,5) 滑动计数窗口> (zhiyong2,5) 滑动计数窗口> (zhiyong8,5) 滑动计数窗口> (zhiyong6,3) 滑动计数窗口> (zhiyong15,3) 滑动计数窗口> (zhiyong11,5) 滑动计数窗口> (zhiyong13,3) 滑动计数窗口> (zhiyong16,5) 滑动计数窗口> (zhiyong19,5) 滑动计数窗口> (zhiyong10,5) 滑动计数窗口> (zhiyong0,5) 滑动计数窗口> (zhiyong2,5) 滑动计数窗口> (zhiyong16,5) 滑动计数窗口> (zhiyong4,5) 滑动计数窗口> (zhiyong15,5) 滑动计数窗口> (zhiyong13,5) 滑动计数窗口> (zhiyong7,5) 滑动计数窗口> (zhiyong1,5) 滑动计数窗口> (zhiyong19,5) 滑动计数窗口> (zhiyong10,5) 滑动计数窗口> (zhiyong3,5) 滑动计数窗口> (zhiyong11,5) 滑动计数窗口> (zhiyong6,5) 滑动计数窗口> (zhiyong8,5) 滑动计数窗口> (zhiyong16,5) 滑动计数窗口> (zhiyong0,5) 滑动计数窗口> (zhiyong14,5) 滑动计数窗口> (zhiyong12,5) Process finished with exit code 130
正好20个word,正好20个count到结果为3的数据。其余均是20。显然,最开始累计出现3次就开始滑动,之后每来3条滑动1次,但是窗口大小只有5个,所以之后的count值永远是5个。
窗口分配器的窗口算子 会话窗口 静态时间间隔的处理时间会话窗口先构建数据源:
package com.zhiyong.flinkStream; import org.apache.flink.streaming.api.functions.source.SourceFunction; import java.util.ArrayList; import java.util.Random; public class FlinkRandom10sSource implements SourceFunction{ boolean needRun=true; @Override public void run(SourceContext ctx) throws Exception { while (needRun){ ArrayList result = new ArrayList<>(); for (int i = 0; i < 10; i++) { result.add("zhiyong" + i); } int index=new Random().nextInt(10); ctx.collect(result.get(index)); int delayTime = new Random().nextInt(10)*1000; System.out.println("产生数据“" + result.get(index)); System.out.println("延时"+ delayTime + "ms"); Thread.sleep(delayTime); } } @Override public void cancel() { needRun=false; } }
测试会话窗口:
package com.zhiyong.flinkStream;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.ProcessingTimeSessionWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;
public class SessionWindowDemo {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
DataStreamSource data = env.addSource(new FlinkRandom10sSource());
SingleOutputStreamOperator> result1 = data.flatMap(new FlatMapFunction1())
.keyBy(new KeySelector, Object>() {
@Override
public Object getKey(Tuple2 value) throws Exception {
return value.f0;
}
})
.window(ProcessingTimeSessionWindows.withGap(Time.seconds(5)))
.sum(1);
result1.print("静态时间间隔的处理时间会话窗口");
env.execute("会话窗口");
}
private static class FlatMapFunction1 implements FlatMapFunction>{
@Override
public void flatMap(String value, Collector> out) throws Exception {
for (String cell:value.split("\s+")){
out.collect(Tuple2.of(cell,1));
}
}
}
}
结果:
产生数据“zhiyong4 延时6000ms 静态时间间隔的处理时间会话窗口> (zhiyong4,1) 产生数据“zhiyong0 延时4000ms 产生数据“zhiyong0 延时9000ms 静态时间间隔的处理时间会话窗口> (zhiyong0,2) 产生数据“zhiyong4 延时7000ms 静态时间间隔的处理时间会话窗口> (zhiyong4,1) 产生数据“zhiyong3 延时3000ms 产生数据“zhiyong8 延时0ms 产生数据“zhiyong1 延时4000ms 静态时间间隔的处理时间会话窗口> (zhiyong3,1) 产生数据“zhiyong6 延时2000ms 静态时间间隔的处理时间会话窗口> (zhiyong8,1) 静态时间间隔的处理时间会话窗口> (zhiyong1,1) 产生数据“zhiyong5 延时5000ms 静态时间间隔的处理时间会话窗口> (zhiyong6,1) 产生数据“zhiyong0 延时0ms 产生数据“zhiyong9 延时6000ms 静态时间间隔的处理时间会话窗口> (zhiyong5,1) 静态时间间隔的处理时间会话窗口> (zhiyong0,1) 静态时间间隔的处理时间会话窗口> (zhiyong9,1) 产生数据“zhiyong7 延时6000ms 静态时间间隔的处理时间会话窗口> (zhiyong7,1) 产生数据“zhiyong5 延时8000ms 静态时间间隔的处理时间会话窗口> (zhiyong5,1) 产生数据“zhiyong4 延时7000ms 静态时间间隔的处理时间会话窗口> (zhiyong4,1) 产生数据“zhiyong3 延时2000ms 产生数据“zhiyong8 延时9000ms 静态时间间隔的处理时间会话窗口> (zhiyong3,1) 静态时间间隔的处理时间会话窗口> (zhiyong8,1) 产生数据“zhiyong0 延时4000ms 产生数据“zhiyong6 延时0ms 产生数据“zhiyong6 延时3000ms 静态时间间隔的处理时间会话窗口> (zhiyong0,1) 产生数据“zhiyong0 延时4000ms 静态时间间隔的处理时间会话窗口> (zhiyong6,2) 产生数据“zhiyong2 延时9000ms 静态时间间隔的处理时间会话窗口> (zhiyong0,1) 静态时间间隔的处理时间会话窗口> (zhiyong2,1) 产生数据“zhiyong5 延时6000ms 静态时间间隔的处理时间会话窗口> (zhiyong5,1) 产生数据“zhiyong7 延时3000ms 产生数据“zhiyong3 延时8000ms 静态时间间隔的处理时间会话窗口> (zhiyong7,1) 静态时间间隔的处理时间会话窗口> (zhiyong3,1) 产生数据“zhiyong2 延时4000ms Process finished with exit code 130动态时间间隔的处理时间会话窗口
SingleOutputStreamOperator> result2 = data.flatMap(new FlatMapFunction1()) .keyBy(new KeySelector , Object>() { @Override public Object getKey(Tuple2 value) throws Exception { return value.f0; } }) .window(ProcessingTimeSessionWindows.withDynamicGap(new SessionWindowTimeGapExtractor >() { @Override public long extract(Tuple2 element) { return (Long.parseLong(element.f0.toString().split("g")[1])+5)*1000; } })) .sum(1); result2.print("动态时间间隔的处理时间会话窗口");
执行后:
产生数据“zhiyong2 延时4000ms 产生数据“zhiyong7 延时4000ms 动态时间间隔的处理时间会话窗口> (zhiyong2,1) 产生数据“zhiyong7 延时0ms 产生数据“zhiyong9 延时8000ms 产生数据“zhiyong9 延时1000ms 产生数据“zhiyong9 延时3000ms 产生数据“zhiyong6 延时2000ms 动态时间间隔的处理时间会话窗口> (zhiyong7,2) 产生数据“zhiyong1 延时0ms 产生数据“zhiyong6 延时2000ms 产生数据“zhiyong7 延时8000ms 动态时间间隔的处理时间会话窗口> (zhiyong1,1) 动态时间间隔的处理时间会话窗口> (zhiyong9,3) 产生数据“zhiyong8 延时9000ms 动态时间间隔的处理时间会话窗口> (zhiyong6,2) 动态时间间隔的处理时间会话窗口> (zhiyong7,1) 产生数据“zhiyong2 延时0ms 产生数据“zhiyong7 延时9000ms 动态时间间隔的处理时间会话窗口> (zhiyong8,1) 动态时间间隔的处理时间会话窗口> (zhiyong2,1) 产生数据“zhiyong0 延时8000ms 动态时间间隔的处理时间会话窗口> (zhiyong7,1) 动态时间间隔的处理时间会话窗口> (zhiyong0,1) 产生数据“zhiyong2 延时9000ms 动态时间间隔的处理时间会话窗口> (zhiyong2,1) 产生数据“zhiyong7 延时3000ms 产生数据“zhiyong9 延时6000ms 产生数据“zhiyong3 延时3000ms 产生数据“zhiyong7 延时0ms 产生数据“zhiyong8 延时6000ms 动态时间间隔的处理时间会话窗口> (zhiyong7,1) 动态时间间隔的处理时间会话窗口> (zhiyong9,1) 动态时间间隔的处理时间会话窗口> (zhiyong3,1) 产生数据“zhiyong7 延时4000ms 产生数据“zhiyong9 延时1000ms 产生数据“zhiyong7 延时0ms 产生数据“zhiyong4 延时3000ms 动态时间间隔的处理时间会话窗口> (zhiyong8,1) 产生数据“zhiyong8 延时6000ms 产生数据“zhiyong4 延时1000ms 动态时间间隔的处理时间会话窗口> (zhiyong4,1) 产生数据“zhiyong9 延时8000ms 动态时间间隔的处理时间会话窗口> (zhiyong7,3) 动态时间间隔的处理时间会话窗口> (zhiyong8,1) 产生数据“zhiyong5 延时9000ms 动态时间间隔的处理时间会话窗口> (zhiyong4,1) Process finished with exit code 130
可以看出这种情况时间间隔可以动态变化,同样是超时便关闭窗口。
静态时间间隔的事件时间会话窗口.window(EventTimeSessionWindows.withGap(Time.seconds(5)))
算子层面区别不大。
动态时间间隔的事件时间会话窗口.window(EventTimeSessionWindows.withDynamicGap(new SessionWindowTimeGapExtractor>() { @Override public long extract(Tuple2 element) { return (Long.parseLong(element.f0.toString().split("g")[1])+5)*1000; } }))
算子层面区别不大。
事件时间的这2种情况与处理时间类似,不再赘述。
时间窗口.timeWindow(Time.seconds(30))//这种是滚动窗口 .timeWindow(Time.seconds(30),Time.seconds(30))//这种是滑动窗口
这种算子已经过时。但是时间窗口使用频率偏偏最高。新API可以设置偏移量来消除时区的影响。比如东八区。。。
滚动事件时间窗口SingleOutputStreamOperator滚动处理时间窗口> result5 = data.flatMap(new FlatMapFunction1()) .keyBy(new KeySelector , Object>() { @Override public Object getKey(Tuple2 value) throws Exception { return value.f0; } }) .window(TumblingEventTimeWindows.of(Time.seconds(10))) //.window(TumblingEventTimeWindows.of(Time.seconds(10)),Time.hours(-8))//东八区 .sum(1); result5.print("滚动事件时间窗口");
SingleOutputStreamOperator滑动事件时间窗口> result6 = data.flatMap(new FlatMapFunction1()) .keyBy(new KeySelector , Object>() { @Override public Object getKey(Tuple2 value) throws Exception { return value.f0; } }) .window(TumblingProcessingTimeWindows.of(Time.seconds(10))) //.window(TumblingProcessingTimeWindows.of(Time.seconds(10)),Time.hours(-8))//东八区 .sum(1); result6.print("滚动处理时间窗口");
SingleOutputStreamOperator滑动处理时间窗口> result7 = data.flatMap(new FlatMapFunction1()) .keyBy(new KeySelector , Object>() { @Override public Object getKey(Tuple2 value) throws Exception { return value.f0; } }) .window(SlidingEventTimeWindows.of(Time.seconds(5), Time.seconds(3))) //.window(SlidingEventTimeWindows.of(Time.seconds(5),Time.seconds(3),Time.hours(-8)))//东八区 .sum(1); result7.print("滑动事件时间窗口");
SingleOutputStreamOperator> result8 = data.flatMap(new FlatMapFunction1()) .keyBy(new KeySelector , Object>() { @Override public Object getKey(Tuple2 value) throws Exception { return value.f0; } }) .window(SlidingProcessingTimeWindows.of(Time.seconds(5), Time.seconds(3))) //.window(SlidingProcessingTimeWindows.of(Time.seconds(5),Time.seconds(3),Time.hours(-8)))//东八区 .sum(1); result8.print("滑动处理时间窗口");



