一、下载安装IDEA二、搭建本地hadoop环境(window10)三、安装Maven四、新建项目和模块
1)新建maven项目2)新建flink模块 五、配置IDEA环境(scala)
1)下载安装scala插件2)配置scala插件到模块或者全局环境3)创建scala项目4)DataStream API配置
1、Maven配置2、示例演示 5)Table API & SQL配置
1、Maven配置2、示例演示 6)HiveCatalog
1、Maven配置2、Hadoop与Hive Guava冲突问题3、示例演示 7)下载flink并本地启动集群(window)8)完成版配置
1、maven配置2、log4j2.xml配置3、hive-site.xml配置 六、配置IDEA环境(java)
1)maven配置2)log4j2.xml配置3)hive-site.xml配置
一、下载安装IDEA可以参考我之前的文章:https://liugp.blog.csdn.net/article/details/123058589
二、搭建本地hadoop环境(window10)可以看我之前的文章:大数据Hadoop之——部署hadoop+hive环境(window10环境)
三、安装Maven可以看我之前的文章:Java-Maven详解
四、新建项目和模块 1)新建maven项目
因为之前我创建过了,所以会标红
把自动生成的src删掉,以后是通过模块来管理项目,因为一个项目一般会包含很多模块。
目录结构,新建没有的目录
设置目录属性
因为之前创建过项目,所以这里创建一个新项目来演示:bigdata-test2023
File-》Settings
2)配置scala插件到模块或者全局环境intellij IDEA本来是不能开发Scala程序的,但是通过配置是可以的,我之前已经装过了,没装过的小伙伴,点击这里安装即可。
添加完scala插件之后就可以创建scala项目了
创建Object类
4)DataStream API配置 1、Maven配置【温馨提示】类只会被编译,不能直接被执行。
在flink模块目录下pom.xml配置如下内容:
【温馨提示】这里的scala版本要与上面插件版本一致
org.apache.flink flink-scala_2.12 1.14.3 provided org.apache.flink flink-streaming-scala_2.12 1.14.3 provided org.apache.flink flink-streaming-scala_2.12 1.14.3 provided
【问题】IDEA 在使用Maven项目时,未加载 provided 范围的依赖包,导致启动时报错
【原因】就是 Run Application时,IDEA未加载 provided 范围的依赖包,导致启动时报错,这是IDEA的bug
【解决】在IDEA中设置
(官网示例)
package com
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
object WindowWordCount {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val text = env.socketTextStream("localhost", 9999)
val counts = text.flatMap { _.toLowerCase.split("\W+") filter { _.nonEmpty } }
.map { (_, 1) }
.keyBy(_._1)
.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
.sum(1)
counts.print()
env.execute("Window Stream WordCount")
}
}
在命令行起一个9999端口的服务
$ nc -lk 9999
运行测试
2、示例演示org.apache.flink flink-table-planner_2.12 1.14.3 provided org.apache.flink flink-streaming-scala_2.12 1.14.3 provided org.apache.flink flink-table-common 1.14.3 provided
这里使用filesystem,不需要引用相应得maven配置,像kafka,ES等连接器是需要引入相应的maven配置,但是这里使用到了format csv,所以得引入相应得配置,配置如下:
更多连接器的介绍,你看官方文档
org.apache.flink flink-csv 1.14.3
源码
package com
import org.apache.flink.table.api._
object TableSQL {
def main(args: Array[String]): Unit = {
val settings = EnvironmentSettings.inStreamingMode()
val tableEnv = TableEnvironment.create(settings)
// create an output Table
val schema = Schema.newBuilder()
.column("a", DataTypes.STRING())
.column("b", DataTypes.STRING())
.column("c", DataTypes.STRING())
.build()
tableEnv.createTemporaryTable("CsvSourceTable", TableDescriptor.forConnector("filesystem")
.schema(schema)
.option("path", "flink/data/source")
.format(FormatDescriptor.forFormat("csv")
.option("field-delimiter", "|")
.build())
.build())
tableEnv.createTemporaryTable("CsvSinkTable", TableDescriptor.forConnector("filesystem")
.schema(schema)
.option("path", "flink/data/")
.format(FormatDescriptor.forFormat("csv")
.option("field-delimiter", "|")
.build())
.build())
// 创建一个查询语句
val sourceTable = tableEnv.sqlQuery("SELECt * FROM CsvSourceTable limit 2")
// 将查询到的数据转到下游存储
sourceTable.executeInsert("CsvSinkTable")
}
}
6)HiveCatalog
1、Maven配置
基础配置
org.apache.flink flink-connector-hive_2.11 1.14.3 provided org.apache.flink flink-table-api-java-bridge_2.11 1.14.3 provided org.apache.hive hive-exec 3.1.2 provided
【温馨提示】在IDEA中scope设置provided的时候,必须对应的运行文件设置加载provided的依赖到classpath
Log4j2 配置(log4j2.xml)
配置hive-site.xml
javax.jdo.option.ConnectionURL jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false&serverTimezone=Asia/Shanghai javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver MySQL JDBC driver class javax.jdo.option.ConnectionUserName root user name for connecting to mysql server javax.jdo.option.ConnectionPassword 123456 password for connecting to mysql server hive.metastore.uris thrift://localhost:9083 IP address (or fully-qualified domain name) and port of the metastore host hive.server2.thrift.bind.host localhost Bind host on which to run the HiveServer2 Thrift service. hive.server2.thrift.port 10001 hive.metastore.schema.verification true
【温馨提示】必须启动metastore和hiveserver2服务,不清楚的小伙拍可以参考我之前的文章:大数据Hadoop之——部署hadoop+hive环境(window10环境)
$ hive --service metastore $ hive --service hiveserver22、Hadoop与Hive Guava冲突问题
【问题】Hadoop和hive-exec-3.1.2的Guava的版本冲突导致Flink任务启动异常
【解决】删掉%HIVE_HOME%lib目录下的guava-19.0.jar,再把%HADOOP_HOME%sharehadoopcommonlibguava-27.0-jre.jar复制到%HIVE_HOME%lib目录下。
package com
import org.apache.flink.table.api.{EnvironmentSettings, TableEnvironment}
import org.apache.flink.table.catalog.hive.HiveCatalog
object HiveCatalogTest {
def main(args: Array[String]): Unit = {
val settings = EnvironmentSettings.inStreamingMode()
val tableEnv = TableEnvironment.create(settings)
val name = "myhive"
val defaultDatabase = "default"
val hiveConfDir = "flink/data/"
val hive = new HiveCatalog(name, defaultDatabase, hiveConfDir)
// 注册catalog,会话结束自动消失
tableEnv.registerCatalog("myhive", hive)
// 显示有多少个catalog
tableEnv.executeSql("show catalogs").print()
// 切换到myhive 的catalog
tableEnv.useCatalog("myhive")
// 创建库,已经持久化到hive了,会话结束依然存在
tableEnv.executeSql("CREATE DATAbase IF NOT EXISTS mydatabase")
// 显示有多少个database
tableEnv.executeSql("show databases").print()
// 切换数据库
tableEnv.useDatabase("mydatabase")
// 切换表
tableEnv.executeSql("CREATE TABLE IF NOT EXISTS user_behavior (n user_id BIGINT,n item_id BIGINT,n category_id BIGINT,n behavior STRING,n ts TIMESTAMP(3)n) WITH (n 'connector' = 'kafka',n 'topic' = 'user_behavior',n 'properties.bootstrap.servers' = 'hadoop-node1:9092',n 'properties.group.id' = 'testGroup',n 'format' = 'json',n 'json.fail-on-missing-field' = 'false',n 'json.ignore-parse-errors' = 'true'n)")
tableEnv.executeSql("show tables").print()
}
}
看下面通过hive客户端连接查看上面程序创建的库和表,依然是存在的
从上面验证显示,一切ok,记得开发的时候引入连接器的时候需要引入对应的maven配置
下载地址:https://flink.apache.org/downloads.html
flink-1.14.3:https://dlcdn.apache.org/flink/flink-1.14.3/flink-1.14.3-bin-scala_2.12.tgz
【温馨提示】在新版中start-cluster.cmd和flink.cmd已经找不到了,但是可以从以前的版本中复制过来。下载下面的老版本
flink-1.9.1:https://archive.apache.org/dist/flink/flink-1.9.1/flink-1.9.1-bin-scala_2.11.tgz
其实主要从flink-1.9.1中copy以下两个文件到新版本中
下载比较慢,所以我这里还是提供一下这两个文件
flink.cmd
::############################################################################### :: Licensed to the Apache Software Foundation (ASF) under one :: or more contributor license agreements. See the NOTICE file :: distributed with this work for additional information :: regarding copyright ownership. The ASF licenses this file :: to you under the Apache License, Version 2.0 (the :: "License"); you may not use this file except in compliance :: with the License. You may obtain a copy of the License at :: :: http://www.apache.org/licenses/LICENSE-2.0 :: :: Unless required by applicable law or agreed to in writing, software :: distributed under the License is distributed on an "AS IS" BASIS, :: WITHOUT WARRANTIES OR ConDITIONS OF ANY KIND, either express or implied. :: See the License for the specific language governing permissions and :: limitations under the License. ::############################################################################### @echo off setlocal SET bin=%~dp0 SET Flink_HOME=%bin%.. SET Flink_LIB_DIR=%Flink_HOME%lib SET Flink_PLUGINS_DIR=%Flink_HOME%plugins SET JVM_ARGS=-Xmx512m SET Flink_JM_CLASSPATH=%Flink_LIB_DIR%* java %JVM_ARGS% -cp "%Flink_JM_CLASSPATH%"; org.apache.flink.client.cli.CliFrontend %* endlocal
start-cluster.bat
::###############################################################################
:: Licensed to the Apache Software Foundation (ASF) under one
:: or more contributor license agreements. See the NOTICE file
:: distributed with this work for additional information
:: regarding copyright ownership. The ASF licenses this file
:: to you under the Apache License, Version 2.0 (the
:: "License"); you may not use this file except in compliance
:: with the License. You may obtain a copy of the License at
::
:: http://www.apache.org/licenses/LICENSE-2.0
::
:: Unless required by applicable law or agreed to in writing, software
:: distributed under the License is distributed on an "AS IS" BASIS,
:: WITHOUT WARRANTIES OR ConDITIONS OF ANY KIND, either express or implied.
:: See the License for the specific language governing permissions and
:: limitations under the License.
::###############################################################################
@echo off
setlocal EnableDelayedExpansion
SET bin=%~dp0
SET Flink_HOME=%bin%..
SET Flink_LIB_DIR=%Flink_HOME%lib
SET Flink_PLUGINS_DIR=%Flink_HOME%plugins
SET Flink_CONF_DIR=%Flink_HOME%conf
SET Flink_LOG_DIR=%Flink_HOME%log
SET JVM_ARGS=-Xms1024m -Xmx1024m
SET Flink_CLASSPATH=%Flink_LIB_DIR%*
SET logname_jm=flink-%username%-jobmanager.log
SET logname_tm=flink-%username%-taskmanager.log
SET log_jm=%Flink_LOG_DIR%%logname_jm%
SET log_tm=%Flink_LOG_DIR%%logname_tm%
SET outname_jm=flink-%username%-jobmanager.out
SET outname_tm=flink-%username%-taskmanager.out
SET out_jm=%Flink_LOG_DIR%%outname_jm%
SET out_tm=%Flink_LOG_DIR%%outname_tm%
SET log_setting_jm=-Dlog.file="%log_jm%" -Dlogback.configurationFile=file:"%Flink_CONF_DIR%/logback.xml" -Dlog4j.configuration=file:"%Flink_CONF_DIR%/log4j.properties"
SET log_setting_tm=-Dlog.file="%log_tm%" -Dlogback.configurationFile=file:"%Flink_CONF_DIR%/logback.xml" -Dlog4j.configuration=file:"%Flink_CONF_DIR%/log4j.properties"
:: Log rotation (quick and dirty)
CD "%Flink_LOG_DIR%"
for /l %%x in (5, -1, 1) do (
SET /A y = %%x+1
RENAME "%logname_jm%.%%x" "%logname_jm%.!y!" 2> nul
RENAME "%logname_tm%.%%x" "%logname_tm%.!y!" 2> nul
RENAME "%outname_jm%.%%x" "%outname_jm%.!y!" 2> nul
RENAME "%outname_tm%.%%x" "%outname_tm%.!y!" 2> nul
)
RENAME "%logname_jm%" "%logname_jm%.0" 2> nul
RENAME "%logname_tm%" "%logname_tm%.0" 2> nul
RENAME "%outname_jm%" "%outname_jm%.0" 2> nul
RENAME "%outname_tm%" "%outname_tm%.0" 2> nul
DEL "%logname_jm%.6" 2> nul
DEL "%logname_tm%.6" 2> nul
DEL "%outname_jm%.6" 2> nul
DEL "%outname_tm%.6" 2> nul
for %%X in (java.exe) do (set FOUND=%%~$PATH:X)
if not defined FOUND (
echo java.exe was not found in PATH variable
goto :eof
)
echo Starting a local cluster with one JobManager process and one TaskManager process.
echo You can terminate the processes via CTRL-C in the spawned shell windows.
echo Web interface by default on http://localhost:8081/.
start java %JVM_ARGS% %log_setting_jm% -cp "%Flink_CLASSPATH%"; org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint --configDir "%Flink_CONF_DIR%" > "%out_jm%" 2>&1
start java %JVM_ARGS% %log_setting_tm% -cp "%Flink_CLASSPATH%"; org.apache.flink.runtime.taskexecutor.TaskManagerRunner --configDir "%Flink_CONF_DIR%" > "%out_tm%" 2>&1
endlocal
启动flink集群很简单,只要双击start-cluster.bat
通过sql客户端验证一下
$ SELECT 'Hello World';
【错误】NoResourceAvailableException: Could not acquire the minimum required resources
【解决】是因为资源太小,不足以跑任务,扩大配置,修改如下配置:
jobmanager.memory.process.size: 3200m taskmanager.memory.process.size: 2728m taskmanager.memory.flink.size: 2280m
但是我这里调大了还是太小了,自己电脑配置有限,如果有小伙伴的配置高,可以再调大验证一下。
2、log4j2.xml配置bigdata-test2023 com.bigdata.test2023 1.0-SNAPSHOT 4.0.0 flink org.apache.flink flink-scala_2.12 1.14.3 provided org.apache.flink flink-streaming-scala_2.12 1.14.3 provided org.apache.flink flink-clients_2.12 1.14.3 org.apache.flink flink-table-planner_2.12 1.14.3 provided org.apache.flink flink-table-common 1.14.3 provided org.apache.flink flink-csv 1.14.3 org.apache.flink flink-connector-hive_2.11 1.14.3 provided org.apache.flink flink-table-api-java-bridge_2.11 1.14.3 provided org.apache.hive hive-exec 3.1.2 provided org.apache.hadoop hadoop-mapreduce-client-core 3.3.1 provided org.apache.hadoop hadoop-common 3.3.1 provided org.apache.hadoop hadoop-mapreduce-client-common 3.3.1 provided org.apache.hadoop hadoop-mapreduce-client-jobclient 3.3.1 provided
3、hive-site.xml配置
六、配置IDEA环境(java) 1)maven配置javax.jdo.option.ConnectionURL jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false&serverTimezone=Asia/Shanghai javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver MySQL JDBC driver class javax.jdo.option.ConnectionUserName root user name for connecting to mysql server javax.jdo.option.ConnectionPassword 123456 password for connecting to mysql server hive.metastore.uris thrift://localhost:9083 IP address (or fully-qualified domain name) and port of the metastore host hive.server2.thrift.bind.host localhost Bind host on which to run the HiveServer2 Thrift service. hive.server2.thrift.port 10001 hive.metastore.schema.verification true
bigdata-test2023 com.bigdata.test2023 1.0-SNAPSHOT 4.0.0 flink org.apache.flink flink-java 1.14.3 provided org.apache.flink flink-streaming-java 1.14.3 provided org.apache.flink flink-clients_2.12 1.14.3 org.apache.flink flink-table-planner_2.12 1.14.3 provided org.apache.flink flink-table-common 1.14.3 provided org.apache.flink flink-csv 1.14.3 org.apache.flink flink-connector-hive_2.11 1.14.3 provided org.apache.flink flink-table-api-java-bridge_2.11 1.14.3 provided org.apache.hive hive-exec 3.1.2 provided org.apache.hadoop hadoop-mapreduce-client-core 3.3.1 provided org.apache.hadoop hadoop-common 3.3.1 provided org.apache.hadoop hadoop-mapreduce-client-common 3.3.1 provided org.apache.hadoop hadoop-mapreduce-client-jobclient 3.3.1 provided
2)log4j2.xml配置【温馨提示】其实log4j2.xml和hive-site.xml不区分java和scala的,为了方便这里还是再复制一份。
3)hive-site.xml配置
javax.jdo.option.ConnectionURL jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false&serverTimezone=Asia/Shanghai javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver MySQL JDBC driver class javax.jdo.option.ConnectionUserName root user name for connecting to mysql server javax.jdo.option.ConnectionPassword 123456 password for connecting to mysql server hive.metastore.uris thrift://localhost:9083 IP address (or fully-qualified domain name) and port of the metastore host hive.server2.thrift.bind.host localhost Bind host on which to run the HiveServer2 Thrift service. hive.server2.thrift.port 10001 hive.metastore.schema.verification true
关于更多大数据的内容,请耐心等待~



