- 1.JDK安装
- 2.Scala安装
- 1.下载
- 2.解压
- 3.配置系统环境变量
- 4.检查是否安装成功
- 3.Maven安装
- 1.下载
- 2.解压
- 3.配置系统环境变量
- 4.检查是否安装成功
- 5.修改其conf目录下的settings.xml配置文件
- 4.Hadoop安装
- 1.下载
- 2.解压
- 3.配置ssh
- 4.修改配置文件
- 1.修改hadoop-env.sh文件中,修改其中的配置JAVA环境变量,初始值默认是${JAVA_HOME},我们需要把它改成具体的jdk所在的目录。
- 2.修改core-site.xml文件,如果没有/home/hadoop/app/tmp文件夹,需要先创建,第一个property配置的是HDFS的NameNode的地址(主机名:端口号),第二个property配置的内容用来指定Hadoop运行时产生的文件的存放目录
- 3.修改hdfs-site.xml文件,该文件是Hadoop的底层存储配置文件,可以配置namenode存储hdfs名字的空间的元数据文件目录,datanode上的一个数据块的物理的存储位置文件目录,用来指定HDFS保存数据副本的数量(现在是伪分布式,所以数量是1,将来的集群副本数量默认是3)等。
- 4.修改slaves文件,,里面写上从节点所在的主机名字
- 5.格式化namenode
- 5.配置系统环境变量
- 6.启动
- 7.检查是否安装成功
- 8.搭建yarn
- 1. 修改mapred-site.xml文件
- 2.修改 yarn-site.xml文件
- 3.启动yarn
- 9 .检查hadoop是否安装成功
- 10. 检查yarn是否安装成功
- 5.Zookeeper安装
- 6.Hbase安装
- 1.下载hbase到/home/hadoop/software目录下
- 2.解压
- 3.配置系统环境变量
- 4.修改配置文件
- 1.修改hbase-env.sh文件
- 2.修改hbase-site.xml文件
- 3. 修改regionservers文件
- 5.启动
- 6.检查是否启动成功
- 7.Spark安装
- 1.下载源码
- 2.编译
- 2.解压
- 3.配置系统环境变量
- 4.检查安装是否成功
- 8.IDEA+Maven+Spark Streaming
- 8.1 pom.xml中添加对应的依赖
略
2.Scala安装 1.下载官网->Download->Or are you looking for previous releases of Scala?->scala-2.11.8.tgz->存放在/home/hadoop/software目录中
2.解压tar -zxvf scala-2.11.8.tgz -C ~/app/3.配置系统环境变量
vi ~/.bash_profile
添加
export SCALA_HOME=/home/hadoop/app/scala-2.11.8 export PATH=$SCALA_HOME/bin:$PATH
使其生效
source ~/.bash_profile4.检查是否安装成功
[hadoop@hadoop000 scala-2.11.8]$ scala Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144). Type in expressions for evaluation. Or try :help. scala> 1+1 res0: Int = 23.Maven安装 1.下载
maven官网->Download->Previous Releases->archives->3.3.9->binaries->
apache-maven-3.3.9-bin.tar.gz或复制apache-maven-3.3.9-bin.zip 的路径wget下载到/home/hadoop/software目录下
tar -zxvf apache-maven-3.3.9-bin.tar.gz -C ~/app/3.配置系统环境变量
vi ~/.bash_profile
添加
export MAVEN_HOME=/home/hadoop/app/apache-maven-3.3.9 export PATH=$MAVEN_HOME/bin:$PATH
使其生效
source ~/.bash_profile4.检查是否安装成功
[hadoop@hadoop000 scala-2.11.8]$ mvn -v Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00) Maven home: /home/hadoop/app/apache-maven-3.3.9 Java version: 1.8.0_144, vendor: Oracle Corporation Java home: /home/hadoop/app/jdk1.8.0_144/jre Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "2.6.32-358.el6.x86_64", arch: "amd64", family: "unix"5.修改其conf目录下的settings.xml配置文件
在/home/hadoop下mkdir新建maven_repos文件夹
修改settings.xml中的localRepository
4.Hadoop安装 1.下载/home/hadoop/maven_repos/
http://archive.apache.org/dist/
hadoop-2.6.0-cdh5.7.0.tar.gz到/home/hadoop/software目录下
tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C ~/app/3.配置ssh
/home/hadoop目录下,
ssh-keygen -t rsa
ll -a可以查看到.ssh文件夹,里面有id_rsa 和id_rsa.pub两个文件
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys4.修改配置文件
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop文件夹下,
1.修改hadoop-env.sh文件中,修改其中的配置JAVA环境变量,初始值默认是${JAVA_HOME},我们需要把它改成具体的jdk所在的目录。export JAVA_HOME=/home/hadoop/app/jdk1.8.0_1442.修改core-site.xml文件,如果没有/home/hadoop/app/tmp文件夹,需要先创建,第一个property配置的是HDFS的NameNode的地址(主机名:端口号),第二个property配置的内容用来指定Hadoop运行时产生的文件的存放目录
3.修改hdfs-site.xml文件,该文件是Hadoop的底层存储配置文件,可以配置namenode存储hdfs名字的空间的元数据文件目录,datanode上的一个数据块的物理的存储位置文件目录,用来指定HDFS保存数据副本的数量(现在是伪分布式,所以数量是1,将来的集群副本数量默认是3)等。fs.defaultFS hdfs://hadoop000:8020 hadoop.tmp.dir /home/hadoop/app/tmp
4.修改slaves文件,,里面写上从节点所在的主机名字dfs.replication 1
hadoop0005.格式化namenode
进入目录/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin下,执行
./hdfs namenode -format5.配置系统环境变量
vi ~/.bash_profile
添加
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0 export PATH=$HADOOP_HOME/bin:$PATH
使其生效
source ~/.bash_profile6.启动
进入/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin文件夹下,执行
./start-dfs.sh
启动失败
[hadoop@hadoop000 sbin]$ ./start-dfs.sh 21/06/14 21:30:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [hadoop000] The authenticity of host 'hadoop000 (192.168.121.131)' can't be established. RSA key fingerprint is de:44:de:ee:0e:02:9c:2b:73:99:94:2c:af:4a:8a:ad. Are you sure you want to continue connecting (yes/no)? y Please type 'yes' or 'no': yes hadoop000: Warning: Permanently added 'hadoop000,192.168.121.131' (RSA) to the list of known hosts. hadoop@hadoop000's password: hadoop000: Agent admitted failure to sign using the key. hadoop000: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-hadoop000.out hadoop@hadoop000's password: hadoop000: Agent admitted failure to sign using the key. hadoop000: Connection closed by UNKNOWN Starting secondary namenodes [0.0.0.0] The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established. RSA key fingerprint is de:44:de:ee:0e:02:9c:2b:73:99:94:2c:af:4a:8a:ad. Are you sure you want to continue connecting (yes/no)? yes 0.0.0.0: Warning: Permanently added '0.0.0.0' (RSA) to the list of known hosts. hadoop@0.0.0.0's password: 0.0.0.0: Agent admitted failure to sign using the key. hadoop@0.0.0.0's password: 0.0.0.0: Permission denied, please try again. hadoop@0.0.0.0's password: 0.0.0.0: Permission denied, please try again. 0.0.0.0: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). 21/06/14 21:33:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@hadoop000 sbin]$ jps 11140 Jps 10874 NameNode [hadoop@hadoop000 sbin]$ ssh hadoop000 Agent admitted failure to sign using the key. hadoop@hadoop000's password: Last login: Sun Apr 4 17:14:46 2021 from 192.168.107.2
由此可知,ssh仍需要输入密码登录,未配置成功
进入.ssh文件夹,
[hadoop@hadoop000 .ssh]$ ssh-add Identity added: /home/hadoop/.ssh/id_rsa (/home/hadoop/.ssh/id_rsa) [hadoop@hadoop000 .ssh]$ ssh hadoop000 Last login: Mon Jun 14 21:45:37 2021 from hadoop000
成功
[hadoop@hadoop000 sbin]$ ./start-dfs.sh 21/06/14 21:56:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [hadoop000] hadoop000: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-hadoop000.out hadoop000: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hadoop000.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-secondarynamenode-hadoop000.out 21/06/14 21:56:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@hadoop000 sbin]$ jps 12065 NameNode 12197 DataNode 12373 SecondaryNameNode 12520 Jps
我还出现./start-dfs.sh之后,使用jps查看进程没有DataNode进程的情况。有可能是我多次hadoop namenode -format格式化namenode,namenode format清空了namenode下的数据,但是没有清空datanode下的数据,导致namenode和datanode的clusterID不一致。
因为当我们使用hadoop namenode -format格式化namenode时,会在namenode数据文件夹(这个文件夹为自己配置文件中dfs.name.dir的路径)中保存一个current/VERSION文件,记录clusterID,datanode中保存的current/VERSION文件中的clustreID的值是第一次格式化保存的clusterID。
所以我选择了修改/home/hadoop/app/tmp/dfs/name/current文件夹下的VERSION中的clusterID的值,使其与/home/hadoop/app/tmp/dfs/data/current文件夹下的VERSION中的clusterID的值一致。
参考链接
[hadoop@hadoop000 sbin]$ jps
12065 NameNode
12197 DataNode
12373 SecondaryNameNode
12520 Jps
虚拟机浏览器访问,http://hadoop000:50070/
并且有一个存活的节点
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop目录下
1. 修改mapred-site.xml文件cp mapred-site.xml.template mapred-site.xml
2.修改 yarn-site.xml文件mapreduce.framework.name yarn
3.启动yarnyarn.nodemanager.aux-services mapreduce_shuffle
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin目录下
./start-yarn.sh
此时多了NodeManager和ResourceManager两个进程
[hadoop@hadoop000 sbin]$ ./start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-hadoop000.out hadoop000: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hadoop000.out [hadoop@hadoop000 sbin]$ jps 4292 ResourceManager 3960 SecondaryNameNode 3757 DataNode 4398 NodeManager 3647 NameNode 4447 Jps
访问http://hadoop000:8088/cluster
点一下Active Nodes才会出现Nodes列表
hadoop fs -ls / 查看的是HDFS文件系统的根目录
[hadoop@hadoop000 sbin]$ hadoop fs -ls / 21/06/20 18:26:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@hadoop000 sbin]$ hadoop fs -mkdir /data 21/06/20 18:31:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@hadoop000 sbin]$ hadoop fs -ls / 21/06/20 18:31:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxr-xr-x - hadoop supergroup 0 2021-06-20 18:31 /data [hadoop@hadoop000 sbin]$ hadoop fs -ls /data 21/06/20 18:33:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@hadoop000 sbin]$ hadoop fs -put mr-jobhistory-daemon.sh /data 21/06/20 18:34:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [hadoop@hadoop000 sbin]$ hadoop fs -ls /data 21/06/20 18:34:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 hadoop supergroup 4080 2021-06-20 18:34 /data/mr-jobhistory-daemon.sh [hadoop@hadoop000 sbin]$ hadoop fs -text /data/mr-jobhistory-daemon.sh 21/06/20 18:34:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable #!/usr/bin/env bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at #...10. 检查yarn是否安装成功
进入/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce目录下,
hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3
运行结果为
[hadoop@hadoop000 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3 Number of Maps = 2 Samples per Map = 3 21/06/20 18:40:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Wrote input for Map #0 Wrote input for Map #1 Starting Job 21/06/20 18:40:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 21/06/20 18:40:12 INFO input.FileInputFormat: Total input paths to process : 2 21/06/20 18:40:12 INFO mapreduce.JobSubmitter: number of splits:2 21/06/20 18:40:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1624184518582_0001 21/06/20 18:40:14 INFO impl.YarnClientImpl: Submitted application application_1624184518582_0001 21/06/20 18:40:14 INFO mapreduce.Job: The url to track the job: http://hadoop000:8088/proxy/application_1624184518582_0001/ 21/06/20 18:40:14 INFO mapreduce.Job: Running job: job_1624184518582_0001 21/06/20 18:40:32 INFO mapreduce.Job: Job job_1624184518582_0001 running in uber mode : false 21/06/20 18:40:32 INFO mapreduce.Job: map 0% reduce 0% 21/06/20 18:40:44 INFO mapreduce.Job: map 50% reduce 0% 21/06/20 18:40:45 INFO mapreduce.Job: map 100% reduce 0% 21/06/20 18:40:54 INFO mapreduce.Job: map 100% reduce 100% 21/06/20 18:40:56 INFO mapreduce.Job: Job job_1624184518582_0001 completed successfully 21/06/20 18:40:56 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=50 FILE: Number of bytes written=335406 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=532 HDFS: Number of bytes written=215 HDFS: Number of read operations=11 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=21358 Total time spent by all reduces in occupied slots (ms)=7753 Total time spent by all map tasks (ms)=21358 Total time spent by all reduce tasks (ms)=7753 Total vcore-seconds taken by all map tasks=21358 Total vcore-seconds taken by all reduce tasks=7753 Total megabyte-seconds taken by all map tasks=21870592 Total megabyte-seconds taken by all reduce tasks=7939072 Map-Reduce framework Map input records=2 Map output records=4 Map output bytes=36 Map output materialized bytes=56 Input split bytes=296 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=56 Reduce input records=4 Reduce output records=0 Spilled Records=8 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=1000 CPU time spent (ms)=8100 Physical memory (bytes) snapshot=741052416 Virtual memory (bytes) snapshot=8293314560 Total committed heap usage (bytes)=740294656 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=236 File Output Format Counters Bytes Written=97 Job Finished in 45.464 seconds Estimated value of Pi is 4.000000000000000000005.Zookeeper安装
略
6.Hbase安装 1.下载hbase到/home/hadoop/software目录下hbase-1.2.0-cdh5.7.0.tar.gz
2.解压tar -zxvf hbase-1.2.0-cdh5.7.0.tar.gz -C ~/app/3.配置系统环境变量
vi ~/.bash_profile
添加
export Hbase_HOME=/home/hadoop/app/hbase-1.2.0-cdh5.7.0 export PATH=$Hbase_HOME/bin:$PATH
使其生效
source ~/.bash_profile
检查一下是否生效
[hadoop@hadoop000 software]$ echo $Hbase_HOME /home/hadoop/app/hbase-1.2.0-cdh5.7.04.修改配置文件
进入 /home/hadoop/app/hbase-1.2.0-cdh5.7.0/conf目录下,
1.修改hbase-env.sh文件export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144 export Hbase_MANAGES_ZK=false2.修改hbase-site.xml文件
- hbase.rootdir
这个目录是region server的共享目录,用来持久化Hbase。默认情况下Hbase是写到/tmp的。不改这个配置,数据会在重启的时候丢失。与hadoop中的core-site.xml中的配置一致。 - hbase.cluster.distributed
Hbase的运行模式。false是单机模式,true是分布式模式。若为false,Hbase和Zookeeper会运行在同一个JVM里面。 - hbase.zookeeper.quorum
zookeeper集群的URL配置,多个host中间用逗号(,)分割
3. 修改regionservers文件hbase.rootdir hdfs://hadoop000:8020/hbase hbase.cluster.distributed true hbase.zookeeper.quorum hadoop000:2181
配置regionservers这个文件的作用与hadoop中的slaves类似,用来告诉hbase集群哪些是工作节点。
hadoop0005.启动
先启动zookeeper
zkServer.sh start
可以看到进程QuorumPeerMain
[hadoop@hadoop000 conf]$ jps 4292 ResourceManager 7829 QuorumPeerMain 3960 SecondaryNameNode 7852 Jps 3757 DataNode 4398 NodeManager 3647 NameNode
然后启动hbase
进入/home/hadoop/app/hbase-1.2.0-cdh5.7.0/bin目录下,
./start-hbase.sh
可以看到出现了HMaster和HRegionServer进程
[hadoop@hadoop000 bin]$ ./start-hbase.sh starting master, logging to /home/hadoop/app/hbase-1.2.0-cdh5.7.0/logs/hbase-hadoop-master-hadoop000.out Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 hadoop000: starting regionserver, logging to /home/hadoop/app/hbase-1.2.0-cdh5.7.0/bin/../logs/hbase-hadoop-regionserver-hadoop000.out hadoop000: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 hadoop000: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 [hadoop@hadoop000 bin]$ jps 8643 Jps 4292 ResourceManager 7829 QuorumPeerMain 8263 HRegionServer 3960 SecondaryNameNode 8105 HMaster 3757 DataNode 4398 NodeManager 3647 NameNode6.检查是否启动成功
除了上述的两个进程,还可以通过访问http://hadoop000:60010来验证
执行脚本来测试一下
/home/hadoop/app/hbase-1.2.0-cdh5.7.0/bin下执行,
./hbase shell
[hadoop@hadoop000 bin]$ ./hbase shell 2021-06-20 21:08:39,715 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2021-06-20 21:08:41,324 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/app/hbase-1.2.0-cdh5.7.0/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Hbase Shell; enter 'help' for list of supported commands. Type "exit " to leave the Hbase Shell Version 1.2.0-cdh5.7.0, rUnknown, Wed Mar 23 11:46:29 PDT 2016 hbase(main):001:0> version 1.2.0-cdh5.7.0, rUnknown, Wed Mar 23 11:46:29 PDT 2016 hbase(main):002:0> status 1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load hbase(main):003:0> list TABLE 0 row(s) in 0.0400 seconds => [] hbase(main):001:0> create 'member','info','address' 0 row(s) in 1.7180 seconds => Hbase::Table - member hbase(main):002:0> list TABLE member 1 row(s) in 0.0280 seconds => ["member"] hbase(main):003:0> describe 'member' Table member is ENABLED member COLUMN FAMILIES DEscriptION {NAME => 'address', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} {NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0' , BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 2 row(s) in 0.1090 seconds
其中遇到问题:
使用hbase shell的命令list查看的时候显示为空[],但是在创建表的时候出现ERROR: Table already exists错误
参考:
https://blog.csdn.net/huashao0602/article/details/77050929
https://www.jianshu.com/p/e1767d57f972?utm_campaign=maleskine&utm_content=note&utm_medium=seo_notes&utm_source=recommendation
原因是以前创建过这个表,但是hbase暴力删除了这个表后,zookeeper还保留了这个表的信息。
(1)通过./hbase zkcli命令进入zookeeper client模式
遇到错误:
2021-06-20 21:31:25,649 INFO [main-SendThread(hadoop000:2181)] zookeeper.ClientCnxn: Session establishment complete on server hadoop000/192.168.121.131:2181, sessionid = 0x17a297cb8020008, negotiated timeout = 30000 JLine support is enabled WATCHER:: WatchedEvent state:SyncConnected type:None path:null [ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:159) at jline.console.ConsoleReader.(ConsoleReader.java:227) at jline.console.ConsoleReader. (ConsoleReader.java:219) at jline.console.ConsoleReader. (ConsoleReader.java:207) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:108) JLine support is disabled
解决办法:
vim bin/hbase文件,添加信息如下代码框部分:
elif [ “$COMMAND” = “zkcli” ] ; then
CLASS=“org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer”
CLASSPATH=`echo $CLASSPATH | sed 's/jruby-cloudera-1.0.0.jar//g'`
之后重启hbase伪分布环境,之后再次进入zk的客户端不报错了
(2)ls /hbase/table查看存在的表信息
[zk: hadoop000:2181(CONNECTED) 0] ls /hbase/table [hbase:meta, hbase:namespace, imooc_course_search_clickcount, member, course_clickcount, imooc_course_clickcount, course_search_clickcount]
(3)rmr /hbase/table/表名 删除zombie table
(4)重启Hbase,这步我没有
spark官网-download-历史版本spark-2.2.0.tgz ,但是要是源码,选最后一个
因为如果下载的是安装包的话,可能跟生产上有各种冲突,下载源码,根据生产上的hadoop的版本来编译出符合生产上hadoop使用的spack的版本
如何编译可以查看documentation-latest release-导航栏more-building spark
编译这步我没做,直接用的编译好了的
spark官网,documentation->Lastest Release->More->Building Spark
在source文件夹下,解压源码
tar -xzvf spark-2.1.0.tgz
cd spark-2.1.0
注意这块的要求,不同版本的是不一样的,有maven和java等的要求。详见官网文档。
以3.2.0举例
设置maven的内存
export MAVEN_OPTS="-Xss64m -Xmx2g -XX:ReservedCodeCacheSize=1g"
继续cd到build目录下,可以看到内置的maven
我们不使用这个,自己安个maven
回到源码根目录spark-2.1.0下
mvn编译命令,指定 Hadoop 版本并启用 YARN,使用 Hive 和 JDBC 支持( -Phive -Phive-thriftserver ),可以通过阅读该目录下的pom.xml进行了解
如果你想使用Hadoop 2.x构建,请启用hadoop-2.7配置文件:
./build/mvn -Phadoop-2.7 -Pyarn -Dhadoop.version=2.8.5 -Phive -Phive-thriftserver -DskipTests clean package
这条命令可能会耗费一两个小时,因为第一次运行需要下载很多资源
我们可以对其中的参数从外部进行调整,如以上命令,这就是-D代表的意思
需要用到哪个profile就在外部命令中将选择内部启用的id,这就是-P代表的意思,需要hive的话,就需要将id:hive-thriftserver通过-P加入进来
echo $HADOOP_HOME
可以看到hadoop的版本,如2.6.0-cdh5.7.0
可能会遇到HDFS和yarn版本不一样的情况,这种情况是因为在生产环境,我们的HDFS的版本比较老,在使用spark的新特性的情况下没法支持,但是我们又不敢对整个hadoop集群进行升级,因为这个升级不可预测的因素太多。那么我们可以将HDFS稍微升一个版本。这就会导致HDFS和yarn版本不一致。默认情况下,yarn.version使用的是hadoop.version,但是我们可以将两个版本通过-D分别指定。
当支持其他版本的 Scala(如 2.13)时,就需要更改 Scala 版本,可以为该版本构建。使用以下方法更改主要的 Scala 版本(例如 2.13):
./dev/change-scala-version.sh 2.13
启用配置文件(例如 2.13):
# For Maven ./build/mvn -Pscala-2.13 compile
maven编译出来,并没有一个压缩包供我们部署,我们可以构建一个可运行的包
推荐使用:构建可运行的发行包的方式进行编译
这是两个编译方式,之前的mvn编译,和下面说的make-distribution编译。
若要创建类似于"Spark 下载"页面分发的 Spark 分发版,并且该分发的布局使其可运行,请在项目根目录中使用。它可以像直接Maven构建一样配置Maven配置文件设置等。例:./dev/make-distribution.sh
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
这将构建Spark发行版以及Python pip和R包。有关使用情况的详细信息,请运行./dev/make-distribution.sh --help
使用这种方式编译的时候,需要一定一个名字,建议直接使用hadoop的版本名如2.6.0-cdh5.7.0,–tgz打包成一个tgz包,剩下的拼上maven构建的命令就可以了,如 -Phadoop-2.7 -Pyarn -Dhadoop.version=2.8.5 -Phive -Phive-thriftserver
参考之前的:
./build/mvn -Phadoop-2.7 -Pyarn -Dhadoop.version=2.8.5 -Phive -Phive-thriftserver -DskipTests clean package
在使用maven单独编译的时候,export MAVEN_OPTS是一定要设置的,但是使用make-distribution进行编译的时候是不需要的,因为这个设置已经封装到了make-distribution.sh脚本里面了,-DskipTests clean package也存在了。
打包后的名字如下图:
$NAME就是--name中设置的,$Version是spark的名字,--name如果使用的是hadoop的版本号,那么打包后的文件就可以同时看到spark和hadoop的版本号了。如spark-2.1.0-bin-2.6.0-cdh5.7.0.tgz。这个包就可以直接拷贝到要部署的机器上进行部署了。
可能会遇到的问题:
解决:更换maven仓库
pom.xml中添加黄色部分,注意在1部分的下面
注意的第二个地方,内存不够或是啥的,具体看官网文档,-X查看详细信息
如果在编译过程中,你看到的异常信息不是太明显看不太懂,可以在编译命令后 -X,就能看到更详细的编译信息。
2.解压tar -zxvf spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz -C ~/app/3.配置系统环境变量
pwd得到spark位置
/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0
vi ~/.bash_profile
export SPARK_HOME=/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0 export PATH=$SPARK_HOME/bin:$PATH
source ~/.bash_profile4.检查安装是否成功
spark运行在本地测试就是local,运行集群在yarn就使用yarn
/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/bin目录中
./spark-shell --master local[2]
使用cdh,需要手动添加仓库repository,url需要自己检验一下是否有效比如到浏览器里打开
scala,kafka,spark,hadoop,hbase
spark官网https://spark.apache.org/,documentation->Latest Release,Programming Guides->Spark Streaming
_2.12代表的是scala的版本,如果scala使用2.10的话,这个地方写2.10
kafka和spark streaming整合的时候需要添加另外的依赖



