所有的技术学习先从helloword开始,学习flink也不例外,相对hadoop来说,flink编写helloword代码更加简洁,灵活性更高,下面通过两种方式基于flink实现helloword
环境准备导入基础的maven依赖
方式一org.apache.flink flink-java1.10.1 org.apache.flink flink-streaming-java_2.121.10.1
使用批处理的方式实现helloword
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.operators.AggregateOperator;
import org.apache.flink.api.java.operators.DataSource;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
public class WordCount1 {
public static void main(String[] args) throws Exception {
// 创建执行环境
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// 从文件中读取数据
String inputPath = "F:\hello.txt";
DataSource dataSource = env.readTextFile(inputPath);
// 空格分词打散之后,对单词进行 groupby 分组,然后用 sum 进行聚合
AggregateOperator> sum = dataSource.flatMap(new MyFlatMapper())
.groupBy(0)
.sum(1);
sum.print();
}
public static class MyFlatMapper implements FlatMapFunction> {
public void flatMap(String value, Collector> out) throws Exception {
String[] words = value.split(" ");
for (String word : words) {
out.collect(new Tuple2(word, 1));
}
}
}
}
hello.txt格式如下:
运行main函数代码,观察控制条输出结果,这样就统计出了每个单词的出现次数
方式二使用流处理实现 wordcount
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
public class WordCount2 {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
String inputPath = "F:\hello.txt";
DataStreamSource dataStreamSource = env.readTextFile(inputPath);
SingleOutputStreamOperator> sum = dataStreamSource
.flatMap(new MyFlatMapper())
.keyBy(0)
.sum(1);
sum.print().setParallelism(1);
env.execute();
}
public static class MyFlatMapper implements FlatMapFunction> {
public void flatMap(String value, Collector> out) throws Exception {
String[] words = value.split(" ");
for (String word : words) {
out.collect(new Tuple2<>(word, 1));
}
}
}
}
运行这段代码,观察控制台输出,仍然可以统计出单词的出现次数
对比两种方式的控制台结果输出,可以发现一点差别,第一种方式直接输出了各个单词的最后结果,而第二种方式从上到下统计时,是读到一个单词统计一次,根据行偏移量的不同,各个单词出现的位置编号不一样,这也是flink的批处理和流处理的不同之处



