栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Flink窗口转换算子

Flink窗口转换算子

窗口转换算子预览

之前扒源码看到过Flink的窗口有很多种:

package org.apache.flink.streaming.api.windowing.assigners;

@PublicEvolving
public abstract class WindowAssigner implements Serializable {
    private static final long serialVersionUID = 1L;

    
    public abstract Collection assignWindows(
            T element, long timestamp, WindowAssignerContext context);

    
    public abstract Trigger getDefaultTrigger(StreamExecutionEnvironment env);

    
    public abstract TypeSerializer getWindowSerializer(ExecutionConfig executionConfig);

    
    public abstract boolean isEventTime();

    
    public abstract static class WindowAssignerContext {

        
        public abstract long getCurrentProcessingTime();
    }
}

窗口算子需要传的参数WindowAssigner有如下实现类:

源码:

package org.apache.flink.streaming.api.datastream;
public class KeyedStream extends DataStream {
        // ------------------------------------------------------------------------
    //  Windowing
    // ------------------------------------------------------------------------

    
    @Deprecated
    public WindowedStream timeWindow(Time size) {
        if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime) {
            return window(TumblingProcessingTimeWindows.of(size));
        } else {
            return window(TumblingEventTimeWindows.of(size));
        }
    }

    
    @Deprecated
    public WindowedStream timeWindow(Time size, Time slide) {
        if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime) {
            return window(SlidingProcessingTimeWindows.of(size, slide));
        } else {
            return window(SlidingEventTimeWindows.of(size, slide));
        }
    }

    
    public WindowedStream countWindow(long size) {
        return window(GlobalWindows.create()).trigger(PurgingTrigger.of(CountTrigger.of(size)));
    }

    
    public WindowedStream countWindow(long size, long slide) {
        return window(GlobalWindows.create())
                .evictor(CountEvictor.of(size))
                .trigger(CountTrigger.of(slide));
    }

    
    @PublicEvolving
    public  WindowedStream window(
            WindowAssigner assigner) {
        return new WindowedStream<>(this, assigner);
    }
}

与Window有关的算子有已经过时的2种timeWindow时间窗口、countWindow计数窗口,以及一个window窗口分配器算子。

过时的时间窗口算子

以处理时间为例子。事件时间的情况与之类似。之后尽量使用新API,但不会深究老API与老的DataSet批处理。

滚动时间窗口
package com.zhiyong.flinkStream;

import com.zhiyong.flinkStudy.FlinkWordCountDemo2FlatMapFunction;
import com.zhiyong.flinkStudy.WordCountSource1ps;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.table.planner.expressions.In;
import org.apache.flink.util.Collector;


public class WindowDemo {
    public static void main(String[] args) throws Exception{
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);

        env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
        //过时方法,必须设置,才能正常使用算子,否则报错

        DataStreamSource data = env.addSource(new WordCountSource1ps());

        SingleOutputStreamOperator> result1 = data.flatMap(new FlatMapFunction1())
                .keyBy(new KeySelector, Object>() {
                    @Override
                    public Object getKey(Tuple2 value) throws Exception {
                        return value.f0;
                    }
                })
                .timeWindow(Time.seconds(30))
                //过时的滚动时间窗口
                .sum(1);

        result1.print("过时的滚动时间窗口");

        env.execute("streaming");

    }

    private static class FlatMapFunction1 implements FlatMapFunction>{

        @Override
        public void flatMap(String value, Collector> out) throws Exception {
            for(String cell:value.split("\s+")){
                out.collect(Tuple2.of(cell,1));
            }
        }
    }

}

由于自定义数据源不带时间戳,不使用上述过时方法设置env会报错:

Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
	at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
	at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
	at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:258)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
	at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1389)
	at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
	at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
	at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
	at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
	at akka.dispatch.OnComplete.internal(Future.scala:300)
	at akka.dispatch.OnComplete.internal(Future.scala:297)
	at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
	at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
	at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65)
	at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
	at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
	at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
	at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
	at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:621)
	at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:24)
	at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:23)
	at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:532)
	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
	at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:63)
	at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:100)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
	at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:100)
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1067)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1703)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:172)
Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
	at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
	at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
	at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:252)
	at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:242)
	at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:233)
	at org.apache.flink.runtime.scheduler.Schedulerbase.updateTaskExecutionState(Schedulerbase.java:684)
	at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79)
	at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
	at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
	at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
	at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
	at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
	at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
	at akka.actor.Actor.aroundReceive(Actor.scala:537)
	at akka.actor.Actor.aroundReceive$(Actor.scala:535)
	at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
	at akka.actor.ActorCell.invoke(ActorCell.scala:548)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
	at akka.dispatch.Mailbox.run(Mailbox.scala:231)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
	... 4 more
Caused by: java.lang.RuntimeException: Record has Long.MIN_VALUE timestamp (= no timestamp marker). Is the time characteristic set to 'ProcessingTime', or did you forget to call 'DataStream.assignTimestampsAndWatermarks(...)'?
	at org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows.assignWindows(TumblingEventTimeWindows.java:83)
	at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:293)
	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233)
	at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
	at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496)
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
	at java.lang.Thread.run(Thread.java:750)

Process finished with exit code 1

为了使用该过时的时间窗口算子,使用了该过时方法。设置后可以成功运行:

过时的滚动时间窗口> (zhiyong13,1)
过时的滚动时间窗口> (zhiyong6,1)
过时的滚动时间窗口> (zhiyong0,1)
过时的滚动时间窗口> (zhiyong1,1)
过时的滚动时间窗口> (zhiyong18,1)
过时的滚动时间窗口> (zhiyong11,1)
过时的滚动时间窗口> (zhiyong8,1)
过时的滚动时间窗口> (zhiyong4,2)
过时的滚动时间窗口> (zhiyong17,3)
过时的滚动时间窗口> (zhiyong9,2)
过时的滚动时间窗口> (zhiyong19,2)
过时的滚动时间窗口> (zhiyong15,1)
过时的滚动时间窗口> (zhiyong5,3)
过时的滚动时间窗口> (zhiyong16,3)
过时的滚动时间窗口> (zhiyong14,3)

Process finished with exit code 130
滑动时间窗口
        SingleOutputStreamOperator> result2 = data.flatMap(new FlatMapFunction1())
                .keyBy(new KeySelector, Object>() {
                    @Override
                    public Object getKey(Tuple2 value) throws Exception {
                        return value.f0;
                    }
                })
                .timeWindow(Time.seconds(10),Time.seconds(5))
                //过时的滑动时间窗口
                .sum(1);

        result2.print("过时的滑动时间窗口");

不设置env同样会报错。设置后可以正常运行:

过时的滑动时间窗口> (zhiyong14,1)
过时的滑动时间窗口> (zhiyong14,1)
过时的滑动时间窗口> (zhiyong11,1)
过时的滑动时间窗口> (zhiyong4,1)
过时的滑动时间窗口> (zhiyong15,1)
过时的滑动时间窗口> (zhiyong3,1)
过时的滑动时间窗口> (zhiyong6,1)
过时的滑动时间窗口> (zhiyong6,1)
过时的滑动时间窗口> (zhiyong7,1)
过时的滑动时间窗口> (zhiyong11,1)
过时的滑动时间窗口> (zhiyong3,2)
过时的滑动时间窗口> (zhiyong16,1)
过时的滑动时间窗口> (zhiyong15,3)
过时的滑动时间窗口> (zhiyong4,1)
过时的滑动时间窗口> (zhiyong16,2)
过时的滑动时间窗口> (zhiyong3,1)
过时的滑动时间窗口> (zhiyong7,2)
过时的滑动时间窗口> (zhiyong1,1)
过时的滑动时间窗口> (zhiyong15,2)
过时的滑动时间窗口> (zhiyong11,1)
过时的滑动时间窗口> (zhiyong4,1)

Process finished with exit code 130

大约每次打印5条数据,结合自定义数据源约是每秒1条,滑动窗口显然功能正确。

计数窗口算子 滚动计数窗口
        env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
        //过时方法,必须设置,才能正常使用算子,否则报错

        SingleOutputStreamOperator> result3 = data.flatMap(new FlatMapFunction1())
                .keyBy(new KeySelector, Object>() {
                    @Override
                    public Object getKey(Tuple2 value) throws Exception {
                        return value.f0;
                    }
                })
                .countWindow(5)
                .sum(1);

        result3.print("滚动计数窗口");

不设置env同样会报错。设置后可以正常运行:

滚动计数窗口> (zhiyong8,5)
滚动计数窗口> (zhiyong12,5)
滚动计数窗口> (zhiyong4,5)
滚动计数窗口> (zhiyong11,5)
滚动计数窗口> (zhiyong6,5)
滚动计数窗口> (zhiyong19,5)

Process finished with exit code 130

每次都是凑齐5个才会打印。

滑动计数窗口
        SingleOutputStreamOperator> result4 = data.flatMap(new FlatMapFunction1())
                .keyBy(new KeySelector, Object>() {
                    @Override
                    public Object getKey(Tuple2 value) throws Exception {
                        return value.f0;
                    }
                })
                .countWindow(5,3)
                .sum(1);

        result4.print("滑动计数窗口");

同样是使用过时方法设置env,可以看到:

滑动计数窗口> (zhiyong0,3)
滑动计数窗口> (zhiyong8,3)
滑动计数窗口> (zhiyong17,3)
滑动计数窗口> (zhiyong7,3)
滑动计数窗口> (zhiyong9,3)
滑动计数窗口> (zhiyong12,3)
滑动计数窗口> (zhiyong18,3)
滑动计数窗口> (zhiyong1,3)
滑动计数窗口> (zhiyong10,3)
滑动计数窗口> (zhiyong4,3)
滑动计数窗口> (zhiyong16,3)
滑动计数窗口> (zhiyong19,3)
滑动计数窗口> (zhiyong9,5)
滑动计数窗口> (zhiyong8,5)
滑动计数窗口> (zhiyong17,5)
滑动计数窗口> (zhiyong1,5)
滑动计数窗口> (zhiyong19,5)
滑动计数窗口> (zhiyong9,5)
滑动计数窗口> (zhiyong5,3)
滑动计数窗口> (zhiyong3,3)
滑动计数窗口> (zhiyong11,3)
滑动计数窗口> (zhiyong12,5)
滑动计数窗口> (zhiyong17,5)
滑动计数窗口> (zhiyong18,5)
滑动计数窗口> (zhiyong2,3)
滑动计数窗口> (zhiyong1,5)
滑动计数窗口> (zhiyong10,5)
滑动计数窗口> (zhiyong4,5)
滑动计数窗口> (zhiyong12,5)
滑动计数窗口> (zhiyong7,5)
滑动计数窗口> (zhiyong8,5)
滑动计数窗口> (zhiyong19,5)
滑动计数窗口> (zhiyong11,5)
滑动计数窗口> (zhiyong1,5)
滑动计数窗口> (zhiyong17,5)
滑动计数窗口> (zhiyong9,5)
滑动计数窗口> (zhiyong0,5)
滑动计数窗口> (zhiyong3,5)
滑动计数窗口> (zhiyong14,3)
滑动计数窗口> (zhiyong2,5)
滑动计数窗口> (zhiyong19,5)
滑动计数窗口> (zhiyong4,5)
滑动计数窗口> (zhiyong5,5)
滑动计数窗口> (zhiyong2,5)
滑动计数窗口> (zhiyong8,5)
滑动计数窗口> (zhiyong6,3)
滑动计数窗口> (zhiyong15,3)
滑动计数窗口> (zhiyong11,5)
滑动计数窗口> (zhiyong13,3)
滑动计数窗口> (zhiyong16,5)
滑动计数窗口> (zhiyong19,5)
滑动计数窗口> (zhiyong10,5)
滑动计数窗口> (zhiyong0,5)
滑动计数窗口> (zhiyong2,5)
滑动计数窗口> (zhiyong16,5)
滑动计数窗口> (zhiyong4,5)
滑动计数窗口> (zhiyong15,5)
滑动计数窗口> (zhiyong13,5)
滑动计数窗口> (zhiyong7,5)
滑动计数窗口> (zhiyong1,5)
滑动计数窗口> (zhiyong19,5)
滑动计数窗口> (zhiyong10,5)
滑动计数窗口> (zhiyong3,5)
滑动计数窗口> (zhiyong11,5)
滑动计数窗口> (zhiyong6,5)
滑动计数窗口> (zhiyong8,5)
滑动计数窗口> (zhiyong16,5)
滑动计数窗口> (zhiyong0,5)
滑动计数窗口> (zhiyong14,5)
滑动计数窗口> (zhiyong12,5)

Process finished with exit code 130

正好20个word,正好20个count到结果为3的数据。其余均是20。显然,最开始累计出现3次就开始滑动,之后每来3条滑动1次,但是窗口大小只有5个,所以之后的count值永远是5个。

窗口分配器的窗口算子 会话窗口 静态时间间隔的处理时间会话窗口

先构建数据源:

package com.zhiyong.flinkStream;

import org.apache.flink.streaming.api.functions.source.SourceFunction;

import java.util.ArrayList;
import java.util.Random;


public class FlinkRandom10sSource implements SourceFunction {
    boolean needRun=true;
    @Override
    public void run(SourceContext ctx) throws Exception {
        while (needRun){
            ArrayList result = new ArrayList<>();
            for (int i = 0; i < 10; i++) {
                result.add("zhiyong" + i);
            }
            int index=new Random().nextInt(10);
            ctx.collect(result.get(index));
            int delayTime = new Random().nextInt(10)*1000;
            System.out.println("产生数据“" + result.get(index));
            System.out.println("延时"+ delayTime + "ms");
            Thread.sleep(delayTime);
        }
    }

    @Override
    public void cancel() {
        needRun=false;
    }
}

测试会话窗口:

package com.zhiyong.flinkStream;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.ProcessingTimeSessionWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;


public class SessionWindowDemo {
    public static void main(String[] args) throws Exception{
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        DataStreamSource data = env.addSource(new FlinkRandom10sSource());

        SingleOutputStreamOperator> result1 = data.flatMap(new FlatMapFunction1())
                .keyBy(new KeySelector, Object>() {
                    @Override
                    public Object getKey(Tuple2 value) throws Exception {
                        return value.f0;
                    }
                })
                .window(ProcessingTimeSessionWindows.withGap(Time.seconds(5)))
                .sum(1);

        result1.print("静态时间间隔的处理时间会话窗口");

        env.execute("会话窗口");
    }

    private static class FlatMapFunction1 implements FlatMapFunction>{

        @Override
        public void flatMap(String value, Collector> out) throws Exception {
            for (String cell:value.split("\s+")){
                out.collect(Tuple2.of(cell,1));
            }
        }
    }
}

结果:

产生数据“zhiyong4
延时6000ms
静态时间间隔的处理时间会话窗口> (zhiyong4,1)
产生数据“zhiyong0
延时4000ms
产生数据“zhiyong0
延时9000ms
静态时间间隔的处理时间会话窗口> (zhiyong0,2)
产生数据“zhiyong4
延时7000ms
静态时间间隔的处理时间会话窗口> (zhiyong4,1)
产生数据“zhiyong3
延时3000ms
产生数据“zhiyong8
延时0ms
产生数据“zhiyong1
延时4000ms
静态时间间隔的处理时间会话窗口> (zhiyong3,1)
产生数据“zhiyong6
延时2000ms
静态时间间隔的处理时间会话窗口> (zhiyong8,1)
静态时间间隔的处理时间会话窗口> (zhiyong1,1)
产生数据“zhiyong5
延时5000ms
静态时间间隔的处理时间会话窗口> (zhiyong6,1)
产生数据“zhiyong0
延时0ms
产生数据“zhiyong9
延时6000ms
静态时间间隔的处理时间会话窗口> (zhiyong5,1)
静态时间间隔的处理时间会话窗口> (zhiyong0,1)
静态时间间隔的处理时间会话窗口> (zhiyong9,1)
产生数据“zhiyong7
延时6000ms
静态时间间隔的处理时间会话窗口> (zhiyong7,1)
产生数据“zhiyong5
延时8000ms
静态时间间隔的处理时间会话窗口> (zhiyong5,1)
产生数据“zhiyong4
延时7000ms
静态时间间隔的处理时间会话窗口> (zhiyong4,1)
产生数据“zhiyong3
延时2000ms
产生数据“zhiyong8
延时9000ms
静态时间间隔的处理时间会话窗口> (zhiyong3,1)
静态时间间隔的处理时间会话窗口> (zhiyong8,1)
产生数据“zhiyong0
延时4000ms
产生数据“zhiyong6
延时0ms
产生数据“zhiyong6
延时3000ms
静态时间间隔的处理时间会话窗口> (zhiyong0,1)
产生数据“zhiyong0
延时4000ms
静态时间间隔的处理时间会话窗口> (zhiyong6,2)
产生数据“zhiyong2
延时9000ms
静态时间间隔的处理时间会话窗口> (zhiyong0,1)
静态时间间隔的处理时间会话窗口> (zhiyong2,1)
产生数据“zhiyong5
延时6000ms
静态时间间隔的处理时间会话窗口> (zhiyong5,1)
产生数据“zhiyong7
延时3000ms
产生数据“zhiyong3
延时8000ms
静态时间间隔的处理时间会话窗口> (zhiyong7,1)
静态时间间隔的处理时间会话窗口> (zhiyong3,1)
产生数据“zhiyong2
延时4000ms

Process finished with exit code 130
动态时间间隔的处理时间会话窗口
        SingleOutputStreamOperator> result2 = data.flatMap(new FlatMapFunction1())
                .keyBy(new KeySelector, Object>() {
                    @Override
                    public Object getKey(Tuple2 value) throws Exception {
                        return value.f0;
                    }
                })
                .window(ProcessingTimeSessionWindows.withDynamicGap(new SessionWindowTimeGapExtractor>() {
                    @Override
                    public long extract(Tuple2 element) {
                        return (Long.parseLong(element.f0.toString().split("g")[1])+5)*1000;
                    }
                }))
                .sum(1);

        result2.print("动态时间间隔的处理时间会话窗口");

执行后:

产生数据“zhiyong2
延时4000ms
产生数据“zhiyong7
延时4000ms
动态时间间隔的处理时间会话窗口> (zhiyong2,1)
产生数据“zhiyong7
延时0ms
产生数据“zhiyong9
延时8000ms
产生数据“zhiyong9
延时1000ms
产生数据“zhiyong9
延时3000ms
产生数据“zhiyong6
延时2000ms
动态时间间隔的处理时间会话窗口> (zhiyong7,2)
产生数据“zhiyong1
延时0ms
产生数据“zhiyong6
延时2000ms
产生数据“zhiyong7
延时8000ms
动态时间间隔的处理时间会话窗口> (zhiyong1,1)
动态时间间隔的处理时间会话窗口> (zhiyong9,3)
产生数据“zhiyong8
延时9000ms
动态时间间隔的处理时间会话窗口> (zhiyong6,2)
动态时间间隔的处理时间会话窗口> (zhiyong7,1)
产生数据“zhiyong2
延时0ms
产生数据“zhiyong7
延时9000ms
动态时间间隔的处理时间会话窗口> (zhiyong8,1)
动态时间间隔的处理时间会话窗口> (zhiyong2,1)
产生数据“zhiyong0
延时8000ms
动态时间间隔的处理时间会话窗口> (zhiyong7,1)
动态时间间隔的处理时间会话窗口> (zhiyong0,1)
产生数据“zhiyong2
延时9000ms
动态时间间隔的处理时间会话窗口> (zhiyong2,1)
产生数据“zhiyong7
延时3000ms
产生数据“zhiyong9
延时6000ms
产生数据“zhiyong3
延时3000ms
产生数据“zhiyong7
延时0ms
产生数据“zhiyong8
延时6000ms
动态时间间隔的处理时间会话窗口> (zhiyong7,1)
动态时间间隔的处理时间会话窗口> (zhiyong9,1)
动态时间间隔的处理时间会话窗口> (zhiyong3,1)
产生数据“zhiyong7
延时4000ms
产生数据“zhiyong9
延时1000ms
产生数据“zhiyong7
延时0ms
产生数据“zhiyong4
延时3000ms
动态时间间隔的处理时间会话窗口> (zhiyong8,1)
产生数据“zhiyong8
延时6000ms
产生数据“zhiyong4
延时1000ms
动态时间间隔的处理时间会话窗口> (zhiyong4,1)
产生数据“zhiyong9
延时8000ms
动态时间间隔的处理时间会话窗口> (zhiyong7,3)
动态时间间隔的处理时间会话窗口> (zhiyong8,1)
产生数据“zhiyong5
延时9000ms
动态时间间隔的处理时间会话窗口> (zhiyong4,1)

Process finished with exit code 130

可以看出这种情况时间间隔可以动态变化,同样是超时便关闭窗口。

静态时间间隔的事件时间会话窗口
.window(EventTimeSessionWindows.withGap(Time.seconds(5)))

算子层面区别不大。

动态时间间隔的事件时间会话窗口
.window(EventTimeSessionWindows.withDynamicGap(new SessionWindowTimeGapExtractor>() {
                    @Override
                    public long extract(Tuple2 element) {
                        return (Long.parseLong(element.f0.toString().split("g")[1])+5)*1000;
                    }
                }))

算子层面区别不大。

事件时间的这2种情况与处理时间类似,不再赘述。

时间窗口
.timeWindow(Time.seconds(30))//这种是滚动窗口
.timeWindow(Time.seconds(30),Time.seconds(30))//这种是滑动窗口

这种算子已经过时。但是时间窗口使用频率偏偏最高。新API可以设置偏移量来消除时区的影响。比如东八区。。。

滚动事件时间窗口
        SingleOutputStreamOperator> result5 = data.flatMap(new FlatMapFunction1())
                .keyBy(new KeySelector, Object>() {
                    @Override
                    public Object getKey(Tuple2 value) throws Exception {
                        return value.f0;
                    }
                })
                .window(TumblingEventTimeWindows.of(Time.seconds(10)))
                //.window(TumblingEventTimeWindows.of(Time.seconds(10)),Time.hours(-8))//东八区
                .sum(1);
        result5.print("滚动事件时间窗口");
滚动处理时间窗口
        SingleOutputStreamOperator> result6 = data.flatMap(new FlatMapFunction1())
                .keyBy(new KeySelector, Object>() {
                    @Override
                    public Object getKey(Tuple2 value) throws Exception {
                        return value.f0;
                    }
                })
                .window(TumblingProcessingTimeWindows.of(Time.seconds(10)))
                //.window(TumblingProcessingTimeWindows.of(Time.seconds(10)),Time.hours(-8))//东八区
                .sum(1);
        result6.print("滚动处理时间窗口");
滑动事件时间窗口
        SingleOutputStreamOperator> result7 = data.flatMap(new FlatMapFunction1())
                .keyBy(new KeySelector, Object>() {
                    @Override
                    public Object getKey(Tuple2 value) throws Exception {
                        return value.f0;
                    }
                })
                .window(SlidingEventTimeWindows.of(Time.seconds(5), Time.seconds(3)))
                //.window(SlidingEventTimeWindows.of(Time.seconds(5),Time.seconds(3),Time.hours(-8)))//东八区
                .sum(1);
        result7.print("滑动事件时间窗口");
滑动处理时间窗口
        SingleOutputStreamOperator> result8 = data.flatMap(new FlatMapFunction1())
                .keyBy(new KeySelector, Object>() {
                    @Override
                    public Object getKey(Tuple2 value) throws Exception {
                        return value.f0;
                    }
                })
                .window(SlidingProcessingTimeWindows.of(Time.seconds(5), Time.seconds(3)))
                //.window(SlidingProcessingTimeWindows.of(Time.seconds(5),Time.seconds(3),Time.hours(-8)))//东八区
                .sum(1);
        result8.print("滑动处理时间窗口");
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/780412.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号