大数据Hadoop之——搭建本地flink开发环境详解（window10）

文章目录

一、下载安装IDEA二、搭建本地hadoop环境（window10）三、安装Maven四、新建项目和模块

1）新建maven项目2）新建flink模块五、配置IDEA环境（scala）

1）下载安装scala插件2）配置scala插件到模块或者全局环境3）创建scala项目4）DataStream API配置

1、Maven配置2、示例演示 5）Table API & SQL配置

1、Maven配置2、示例演示 6）HiveCatalog

1、Maven配置2、Hadoop与Hive Guava冲突问题3、示例演示 7）下载flink并本地启动集群（window）8）完成版配置

1、maven配置2、log4j2.xml配置3、hive-site.xml配置六、配置IDEA环境（java）

1）maven配置2）log4j2.xml配置3）hive-site.xml配置

一、下载安装IDEA

可以参考我之前的文章：https://liugp.blog.csdn.net/article/details/123058589

二、搭建本地hadoop环境（window10）

可以看我之前的文章：大数据Hadoop之——部署hadoop+hive环境（window10环境）

三、安装Maven

可以看我之前的文章：Java-Maven详解

四、新建项目和模块 1）新建maven项目

因为之前我创建过了，所以会标红

把自动生成的src删掉，以后是通过模块来管理项目，因为一个项目一般会包含很多模块。

2）新建flink模块

目录结构，新建没有的目录

设置目录属性

因为之前创建过项目，所以这里创建一个新项目来演示：bigdata-test2023

五、配置IDEA环境（scala） 1）下载安装scala插件

File-》Settings

intellij IDEA本来是不能开发Scala程序的,但是通过配置是可以的，我之前已经装过了，没装过的小伙伴，点击这里安装即可。

2）配置scala插件到模块或者全局环境

添加完scala插件之后就可以创建scala项目了

3）创建scala项目

创建Object类

【温馨提示】类只会被编译，不能直接被执行。

4）DataStream API配置 1、Maven配置

在flink模块目录下pom.xml配置如下内容：

【温馨提示】这里的scala版本要与上面插件版本一致


	org.apache.flink
	flink-scala_2.12
	1.14.3
	provided




	org.apache.flink
	flink-streaming-scala_2.12
	1.14.3
	provided



	org.apache.flink
	flink-streaming-scala_2.12
	1.14.3
	provided

【问题】IDEA 在使用Maven项目时，未加载 provided 范围的依赖包，导致启动时报错
【原因】就是 Run Application时，IDEA未加载 provided 范围的依赖包，导致启动时报错，这是IDEA的bug
【解决】在IDEA中设置

2、示例演示

（官网示例）

package com
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time

object WindowWordCount {
  def main(args: Array[String]) {

    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val text = env.socketTextStream("localhost", 9999)

    val counts = text.flatMap { _.toLowerCase.split("\W+") filter { _.nonEmpty } }
      .map { (_, 1) }
      .keyBy(_._1)
      .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
      .sum(1)

    counts.print()

    env.execute("Window Stream WordCount")
  }
}

在命令行起一个9999端口的服务

$ nc -lk 9999

运行测试

5）Table API & SQL配置 1、Maven配置


	org.apache.flink
	flink-table-planner_2.12
	1.14.3
	provided


	org.apache.flink
	flink-streaming-scala_2.12
	1.14.3
	provided


	org.apache.flink
	flink-table-common
	1.14.3
	provided

2、示例演示

这里使用filesystem，不需要引用相应得maven配置，像kafka，ES等连接器是需要引入相应的maven配置，但是这里使用到了format csv，所以得引入相应得配置，配置如下：

更多连接器的介绍，你看官方文档


    org.apache.flink
    flink-csv
    1.14.3

源码

package com

import org.apache.flink.table.api._

object TableSQL {
  def main(args: Array[String]): Unit = {
    val settings = EnvironmentSettings.inStreamingMode()
    val tableEnv = TableEnvironment.create(settings)

    // create an output Table
    val schema = Schema.newBuilder()
      .column("a", DataTypes.STRING())
      .column("b", DataTypes.STRING())
      .column("c", DataTypes.STRING())
      .build()

    tableEnv.createTemporaryTable("CsvSourceTable", TableDescriptor.forConnector("filesystem")
      .schema(schema)
      .option("path", "flink/data/source")
      .format(FormatDescriptor.forFormat("csv")
        .option("field-delimiter", "|")
        .build())
      .build())

    tableEnv.createTemporaryTable("CsvSinkTable", TableDescriptor.forConnector("filesystem")
      .schema(schema)
      .option("path", "flink/data/")
      .format(FormatDescriptor.forFormat("csv")
        .option("field-delimiter", "|")
        .build())
      .build())

    // 创建一个查询语句
    val sourceTable = tableEnv.sqlQuery("SELECt * FROM CsvSourceTable limit 2")

    // 将查询到的数据转到下游存储
    sourceTable.executeInsert("CsvSinkTable")
  }
}

6）HiveCatalog 1、Maven配置

基础配置


  org.apache.flink
  flink-connector-hive_2.11
  1.14.3
  provided



  org.apache.flink
  flink-table-api-java-bridge_2.11
  1.14.3
  provided




    org.apache.hive
    hive-exec
    3.1.2
    provided

【温馨提示】在IDEA中scope设置provided的时候，必须对应的运行文件设置加载provided的依赖到classpath

Log4j2 配置（log4j2.xml）

配置hive-site.xml





    
    
        javax.jdo.option.ConnectionURL
        jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false&serverTimezone=Asia/Shanghai
    

    
    
        javax.jdo.option.ConnectionDriverName
        com.mysql.jdbc.Driver
        MySQL JDBC driver class
    

    
    
        javax.jdo.option.ConnectionUserName
        root
        user name for connecting to mysql server
    

    
    
        javax.jdo.option.ConnectionPassword
        123456
        password for connecting to mysql server
    

    
        hive.metastore.uris
        thrift://localhost:9083
        IP address (or fully-qualified domain name) and port of the metastore host
    

    
    
        hive.server2.thrift.bind.host
        localhost
        Bind host on which to run the HiveServer2 Thrift service.
    

    
    
        hive.server2.thrift.port
        10001
    

    
        hive.metastore.schema.verification
        true

【温馨提示】必须启动metastore和hiveserver2服务，不清楚的小伙拍可以参考我之前的文章：大数据Hadoop之——部署hadoop+hive环境（window10环境）

$ hive --service metastore
$ hive --service hiveserver2

2、Hadoop与Hive Guava冲突问题

【问题】Hadoop和hive-exec-3.1.2的Guava的版本冲突导致Flink任务启动异常
【解决】删掉%HIVE_HOME%lib目录下的guava-19.0.jar，再把%HADOOP_HOME%sharehadoopcommonlibguava-27.0-jre.jar复制到%HIVE_HOME%lib目录下。

3、示例演示

package com

import org.apache.flink.table.api.{EnvironmentSettings, TableEnvironment}
import org.apache.flink.table.catalog.hive.HiveCatalog

object HiveCatalogTest {
  def main(args: Array[String]): Unit = {
    val settings = EnvironmentSettings.inStreamingMode()
    val tableEnv = TableEnvironment.create(settings)
    val name            = "myhive"
    val defaultDatabase = "default"
    val hiveConfDir     = "flink/data/"
    val hive = new HiveCatalog(name, defaultDatabase, hiveConfDir)
    // 注册catalog，会话结束自动消失
    tableEnv.registerCatalog("myhive", hive)
    // 显示有多少个catalog
    tableEnv.executeSql("show catalogs").print()
    // 切换到myhive 的catalog
    tableEnv.useCatalog("myhive")
    // 创建库，已经持久化到hive了，会话结束依然存在
    tableEnv.executeSql("CREATE DATAbase IF NOT EXISTS mydatabase")
    // 显示有多少个database
    tableEnv.executeSql("show databases").print()
    // 切换数据库
    tableEnv.useDatabase("mydatabase")
    // 切换表
    tableEnv.executeSql("CREATE TABLE IF NOT EXISTS user_behavior (n  user_id BIGINT,n  item_id BIGINT,n  category_id BIGINT,n  behavior STRING,n  ts TIMESTAMP(3)n) WITH (n 'connector' = 'kafka',n 'topic' = 'user_behavior',n 'properties.bootstrap.servers' = 'hadoop-node1:9092',n 'properties.group.id' = 'testGroup',n 'format' = 'json',n 'json.fail-on-missing-field' = 'false',n 'json.ignore-parse-errors' = 'true'n)")
    tableEnv.executeSql("show tables").print()

  }
}

看下面通过hive客户端连接查看上面程序创建的库和表，依然是存在的

从上面验证显示，一切ok，记得开发的时候引入连接器的时候需要引入对应的maven配置

7）下载flink并本地启动集群（window）

下载地址：https://flink.apache.org/downloads.html

flink-1.14.3：https://dlcdn.apache.org/flink/flink-1.14.3/flink-1.14.3-bin-scala_2.12.tgz
【温馨提示】在新版中start-cluster.cmd和flink.cmd已经找不到了，但是可以从以前的版本中复制过来。下载下面的老版本
flink-1.9.1：https://archive.apache.org/dist/flink/flink-1.9.1/flink-1.9.1-bin-scala_2.11.tgz

其实主要从flink-1.9.1中copy以下两个文件到新版本中

下载比较慢，所以我这里还是提供一下这两个文件

flink.cmd

::###############################################################################
::  Licensed to the Apache Software Foundation (ASF) under one
::  or more contributor license agreements.  See the NOTICE file
::  distributed with this work for additional information
::  regarding copyright ownership.  The ASF licenses this file
::  to you under the Apache License, Version 2.0 (the
::  "License"); you may not use this file except in compliance
::  with the License.  You may obtain a copy of the License at
::
::      http://www.apache.org/licenses/LICENSE-2.0
::
::  Unless required by applicable law or agreed to in writing, software
::  distributed under the License is distributed on an "AS IS" BASIS,
::  WITHOUT WARRANTIES OR ConDITIONS OF ANY KIND, either express or implied.
::  See the License for the specific language governing permissions and
:: limitations under the License.
::###############################################################################

@echo off
setlocal

SET bin=%~dp0
SET Flink_HOME=%bin%..
SET Flink_LIB_DIR=%Flink_HOME%lib
SET Flink_PLUGINS_DIR=%Flink_HOME%plugins

SET JVM_ARGS=-Xmx512m

SET Flink_JM_CLASSPATH=%Flink_LIB_DIR%*

java %JVM_ARGS% -cp "%Flink_JM_CLASSPATH%"; org.apache.flink.client.cli.CliFrontend %*

endlocal

start-cluster.bat

::###############################################################################
::  Licensed to the Apache Software Foundation (ASF) under one
::  or more contributor license agreements.  See the NOTICE file
::  distributed with this work for additional information
::  regarding copyright ownership.  The ASF licenses this file
::  to you under the Apache License, Version 2.0 (the
::  "License"); you may not use this file except in compliance
::  with the License.  You may obtain a copy of the License at
::
::      http://www.apache.org/licenses/LICENSE-2.0
::
::  Unless required by applicable law or agreed to in writing, software
::  distributed under the License is distributed on an "AS IS" BASIS,
::  WITHOUT WARRANTIES OR ConDITIONS OF ANY KIND, either express or implied.
::  See the License for the specific language governing permissions and
:: limitations under the License.
::###############################################################################

@echo off
setlocal EnableDelayedExpansion

SET bin=%~dp0
SET Flink_HOME=%bin%..
SET Flink_LIB_DIR=%Flink_HOME%lib
SET Flink_PLUGINS_DIR=%Flink_HOME%plugins
SET Flink_CONF_DIR=%Flink_HOME%conf
SET Flink_LOG_DIR=%Flink_HOME%log

SET JVM_ARGS=-Xms1024m -Xmx1024m

SET Flink_CLASSPATH=%Flink_LIB_DIR%*

SET logname_jm=flink-%username%-jobmanager.log
SET logname_tm=flink-%username%-taskmanager.log
SET log_jm=%Flink_LOG_DIR%%logname_jm%
SET log_tm=%Flink_LOG_DIR%%logname_tm%
SET outname_jm=flink-%username%-jobmanager.out
SET outname_tm=flink-%username%-taskmanager.out
SET out_jm=%Flink_LOG_DIR%%outname_jm%
SET out_tm=%Flink_LOG_DIR%%outname_tm%

SET log_setting_jm=-Dlog.file="%log_jm%" -Dlogback.configurationFile=file:"%Flink_CONF_DIR%/logback.xml" -Dlog4j.configuration=file:"%Flink_CONF_DIR%/log4j.properties"
SET log_setting_tm=-Dlog.file="%log_tm%" -Dlogback.configurationFile=file:"%Flink_CONF_DIR%/logback.xml" -Dlog4j.configuration=file:"%Flink_CONF_DIR%/log4j.properties"

:: Log rotation (quick and dirty)
CD "%Flink_LOG_DIR%"
for /l %%x in (5, -1, 1) do ( 
SET /A y = %%x+1 
RENAME "%logname_jm%.%%x" "%logname_jm%.!y!" 2> nul
RENAME "%logname_tm%.%%x" "%logname_tm%.!y!" 2> nul
RENAME "%outname_jm%.%%x" "%outname_jm%.!y!"  2> nul
RENAME "%outname_tm%.%%x" "%outname_tm%.!y!"  2> nul
)
RENAME "%logname_jm%" "%logname_jm%.0"  2> nul
RENAME "%logname_tm%" "%logname_tm%.0"  2> nul
RENAME "%outname_jm%" "%outname_jm%.0"  2> nul
RENAME "%outname_tm%" "%outname_tm%.0"  2> nul
DEL "%logname_jm%.6"  2> nul
DEL "%logname_tm%.6"  2> nul
DEL "%outname_jm%.6"  2> nul
DEL "%outname_tm%.6"  2> nul

for %%X in (java.exe) do (set FOUND=%%~$PATH:X)
if not defined FOUND (
    echo java.exe was not found in PATH variable
    goto :eof
)

echo Starting a local cluster with one JobManager process and one TaskManager process.

echo You can terminate the processes via CTRL-C in the spawned shell windows.

echo Web interface by default on http://localhost:8081/.

start java %JVM_ARGS% %log_setting_jm% -cp "%Flink_CLASSPATH%"; org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint --configDir "%Flink_CONF_DIR%" > "%out_jm%" 2>&1
start java %JVM_ARGS% %log_setting_tm% -cp "%Flink_CLASSPATH%"; org.apache.flink.runtime.taskexecutor.TaskManagerRunner --configDir "%Flink_CONF_DIR%" > "%out_tm%" 2>&1

endlocal

启动flink集群很简单，只要双击start-cluster.bat

通过sql客户端验证一下

$ SELECT 'Hello World';

【错误】NoResourceAvailableException: Could not acquire the minimum required resources
【解决】是因为资源太小，不足以跑任务，扩大配置，修改如下配置：

jobmanager.memory.process.size: 3200m

taskmanager.memory.process.size: 2728m

taskmanager.memory.flink.size: 2280m

但是我这里调大了还是太小了，自己电脑配置有限，如果有小伙伴的配置高，可以再调大验证一下。

8）完成版配置 1、maven配置



    
        bigdata-test2023
        com.bigdata.test2023
        1.0-SNAPSHOT
    
    4.0.0

    flink

    
    
        
            org.apache.flink
            flink-scala_2.12
            1.14.3
            provided
        

        
            org.apache.flink
            flink-streaming-scala_2.12
            1.14.3
            provided
        

        
            org.apache.flink
            flink-clients_2.12
            1.14.3
        
        

        
        
            org.apache.flink
            flink-table-planner_2.12
            1.14.3
            provided
        
        
        
        
            org.apache.flink
            flink-table-common
            1.14.3
            provided
        

        
            org.apache.flink
            flink-csv
            1.14.3
        
        

        
        
        
            org.apache.flink
            flink-connector-hive_2.11
            1.14.3
            provided
        

        
            org.apache.flink
            flink-table-api-java-bridge_2.11
            1.14.3
            provided
        

        
        
            org.apache.hive
            hive-exec
            3.1.2
            provided
        

        


        
        
            org.apache.hadoop
            hadoop-mapreduce-client-core
            3.3.1
            provided
        
        
            org.apache.hadoop
            hadoop-common
            3.3.1
            provided
        
        
            org.apache.hadoop
            hadoop-mapreduce-client-common
            3.3.1
            provided
        
        
            org.apache.hadoop
            hadoop-mapreduce-client-jobclient
            3.3.1
            provided

2、log4j2.xml配置

3、hive-site.xml配置





    
    
        javax.jdo.option.ConnectionURL
        jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false&serverTimezone=Asia/Shanghai
    

    
    
        javax.jdo.option.ConnectionDriverName
        com.mysql.jdbc.Driver
        MySQL JDBC driver class
    

    
    
        javax.jdo.option.ConnectionUserName
        root
        user name for connecting to mysql server
    

    
    
        javax.jdo.option.ConnectionPassword
        123456
        password for connecting to mysql server
    

    
        hive.metastore.uris
        thrift://localhost:9083
        IP address (or fully-qualified domain name) and port of the metastore host
    

    
    
        hive.server2.thrift.bind.host
        localhost
        Bind host on which to run the HiveServer2 Thrift service.
    

    
    
        hive.server2.thrift.port
        10001
    

    
        hive.metastore.schema.verification
        true

六、配置IDEA环境（java） 1）maven配置



    
        bigdata-test2023
        com.bigdata.test2023
        1.0-SNAPSHOT
    
    4.0.0

    flink

    
    
        
            org.apache.flink
            flink-java
            1.14.3
            provided
        

        
            org.apache.flink
            flink-streaming-java
            1.14.3
            provided
        

        
            org.apache.flink
            flink-clients_2.12
            1.14.3
        
        

        
        
            org.apache.flink
            flink-table-planner_2.12
            1.14.3
            provided
        
        
        
        
            org.apache.flink
            flink-table-common
            1.14.3
            provided
        

        
            org.apache.flink
            flink-csv
            1.14.3
        
        

        
        
        
            org.apache.flink
            flink-connector-hive_2.11
            1.14.3
            provided
        

        
            org.apache.flink
            flink-table-api-java-bridge_2.11
            1.14.3
            provided
        

        
        
            org.apache.hive
            hive-exec
            3.1.2
            provided
        

        


        
        
            org.apache.hadoop
            hadoop-mapreduce-client-core
            3.3.1
            provided
        
        
            org.apache.hadoop
            hadoop-common
            3.3.1
            provided
        
        
            org.apache.hadoop
            hadoop-mapreduce-client-common
            3.3.1
            provided
        
        
            org.apache.hadoop
            hadoop-mapreduce-client-jobclient
            3.3.1
            provided

【温馨提示】其实log4j2.xml和hive-site.xml不区分java和scala的，为了方便这里还是再复制一份。

2）log4j2.xml配置

3）hive-site.xml配置





    
    
        javax.jdo.option.ConnectionURL
        jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false&serverTimezone=Asia/Shanghai
    

    
    
        javax.jdo.option.ConnectionDriverName
        com.mysql.jdbc.Driver
        MySQL JDBC driver class
    

    
    
        javax.jdo.option.ConnectionUserName
        root
        user name for connecting to mysql server
    

    
    
        javax.jdo.option.ConnectionPassword
        123456
        password for connecting to mysql server
    

    
        hive.metastore.uris
        thrift://localhost:9083
        IP address (or fully-qualified domain name) and port of the metastore host
    

    
    
        hive.server2.thrift.bind.host
        localhost
        Bind host on which to run the HiveServer2 Thrift service.
    

    
    
        hive.server2.thrift.port
        10001
    

    
        hive.metastore.schema.verification
        true

关于更多大数据的内容，请耐心等待~

大数据Hadoop之——搭建本地flink开发环境详解（window10）

大数据系统相关栏目本月热门文章