流处理简介
一、flink是什么?二、批处理vs流处理三、流处理演变四、环境搭建
流处理简介 一、flink是什么?框架&分布式处理引擎,对无界、有界数据流进行状态计算。
https://flink.apache.org/
- 有状态的流式处理(不准确)
lambda架构(复杂)
flink
高吞吐、低延迟结果准确精确一次的状态一致性保证与众多常用存储系统链接高可用,动态扩展 四、环境搭建
maven
4.0.0 com.shinho FlinkStudy1.0-SNAPSHOT org.apache.flink flink-clients_2.121.13.0 org.apache.flink flink-java1.13.0 org.apache.flink flink-streaming-java_2.121.13.0 provided org.slf4j slf4j-api1.7.30 org.slf4j slf4j-log4j121.7.30 test org.apache.logging.log4j log4j-to-slf4j2.13.3 maven-compiler-plugin 2.3.2 1.8 1.8
经典dataset api wordcount(已经半弃用)。
package com.shinho.wc;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.operators.AggregateOperator;
import org.apache.flink.api.java.operators.DataSource;
import org.apache.flink.api.java.operators.FlatMapOperator;
import org.apache.flink.api.java.operators.UnsortedGrouping;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
public class BatchWordCount {
public static void main(String[] args) throws Exception {
//1创建执行环境
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
//2.读取数据源
DataSource dataSource = env.readTextFile("input/words.txt");
//3.每行数据进行分词,二元组
FlatMapOperator> wordAndOne = dataSource.flatMap((String line, Collector> out) -> {
String[] words = line.split(" ");
//每个单词转换二元组
for (String word : words) {
out.collect(Tuple2.of(word, 1L));
}
}).returns(Types.TUPLE(Types.STRING, Types.LONG));
//4按word分组
UnsortedGrouping> wordAndoneGroup = wordAndOne.groupBy(0);
//5 每一组聚合叠加
AggregateOperator> sum = wordAndOneGroup.sum(1);
//6.结果输出
sum.print();
}
}



