文章目录
1、解压 Sqoop 安装包2、配置 sqoop-env.sh3、配置系统环境变量4、sqoop 连接 MySQL5、sqoop 连接 MySQL 导入到 Hive
5.1 准备5.2 MySQL创建模拟数据表(数据源)5.3 Hive 创建数据表(接收)5.3 MySQL 导出到 Hive
准备环境:
Hadoop 完全分布式集群环境Sqoop 安装包:http://archive.apache.org/dist/sqoop/
1、解压 Sqoop 安装包
上传本地安装包:
解压后并重命名:
返回顶部
2、配置 sqoop-env.sh
复制 sqoop-env-template.sh 模板,并将模板重命名为 sqoop-env.sh,在 sqoop-env.sh 中添加 Hdoop、Hbase、Hive、Hbase的安装路径:
#Set path to where bin/hadoop is available export HADOOP_COMMON_HOME=/usr/local/src/hadoop #Set path to where hadoop-*-core.jar is available export HADOOP_MAPRED_HOME=/usr/local/src/hadoop #set the path to where bin/hbase is available export Hbase_HOME=/usr/local/src/hbase #Set the path to where bin/hive is available export HIVE_HOME=/usr/local/src/hive #Set the path for where zookeper config dir is export ZOOCFGDIR=/usr/local/src/zookeeper/conf
返回顶部
3、配置系统环境变量
配置 Linux 系统环境变量(/etc/profile),添加 Sqoop 组件的路径:
# set sqoop environment export SQOOP_HOME=/usr/local/src/sqoop export PATH=$PATH:$SQOOP_HOME/bin export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib
保存退出后 source 使其生效!
返回顶部
4、sqoop 连接 MySQL
为了使 Sqoop 能够连接 MySQL 数据库,需要将 /../download/mysql-connector-java-5.1.46.jar 文件放入 sqoop 的 lib 目录中:
[root@server download]# cp mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/
执行 Sqoop 前需要先启动 Hadoop 集群,在 server 节点执行 start-all.sh 命令启动 Hadoop 集群,查看进程:
[root@server conf]# jps 5392 HMaster 1762 ResourceManager 4754 QuorumPeerMain 10226 Jps 1603 SecondaryNameNode 1412 NameNode [root@agent1 src]# jps 1235 DataNode 1863 HRegionServer 1306 NodeManager 1563 QuorumPeerMain 2687 Jps [root@agent2 src]# jps 1344 NodeManager 1570 QuorumPeerMain 1238 DataNode 2678 Jps 1868 HRegionServer
测试 Sqoop 是否能够正常连接 MySQL 数据库:
[root@server download]# cp mysql-connector-java-5.1.46.jar /usr/local/src/sqoop/lib/ [root@server download]# sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 22/02/25 19:35:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Enter password: 22/02/25 19:35:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. Fri Feb 25 19:35:46 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. information_schema hive mysql performance_schema sys
能 够 查 看 到 MySQL 数 据 库 中 的 information_schema 、 hive 、 mysql 、performance_schema、sys 等数据库,说明 Sqoop 可以正常连接 MySQL。
返回顶部
5、sqoop 连接 MySQL 导入到 Hive 5.1 准备
为了使 Sqoop 能够连接 Hive,需要将 hive 组件 /usr/local/src/hive/lib 目录下的 hive-common-2.3.9.jar 也放入 Sqoop 安装路径的 lib 目录中:
[root@server ~] cp /usr/local/src/hive/lib/hive-common-2.3.9.jar /usr/local/src/sqoop/lib/
5.2 MySQL创建模拟数据表(数据源)
MySQL中创建 test 数据库,在 test 中创建 student 表,在 student 表中插入了 3 条数据:
[root@server lib]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 15
Server version: 5.7.22 MySQL Community Server (GPL)
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> create database test;
Query OK, 1 row affected (0.00 sec)
mysql> use test;
Database changed
mysql> create table student(number char(9) primary key, name varchar(10));
Query OK, 0 rows affected (0.25 sec)
mysql> insert into student values('01','zhangsan');
Query OK, 1 row affected (0.00 sec)
mysql> insert into student values('02','lisi');
Query OK, 1 row affected (0.00 sec)
mysql> insert into student values('03','wangwu');
Query OK, 1 row affected (0.00 sec)
mysql> select * from student;
+--------+----------+
| number | name |
+--------+----------+
| 01 | zhangsan |
| 02 | lisi |
| 03 | wangwu |
+--------+----------+
3 rows in set (0.00 sec)
5.3 Hive 创建数据表(接收)
在 Hive 中创建 test 数据库和 student 数据表:
hive> create database test;
.................
OK
Time taken: 6.211 seconds
hive> use test;
OK
Time taken: 0.018 seconds
hive> create table student(number STRING, name STRING)
> row format delimited
> fields terminated by "|"
> stored as textfile;
OK
Time taken: 9.939 seconds
5.3 MySQL 导出到 Hive
需要说明该命令的以下几个参数:
–connect:MySQL 数据库连接 URL。–username & –password:MySQL 数据库的用户名和密码。–table:导出的数据表名。–fields-terminated-by:Hive 中字段分隔符。–delete-target-dir:删除导出目的目录。–num-mappers:Hadoop 执行 Sqoop 导入导出启动的 map 任务数。–hive-import --hive-database:导出到 Hive 的数据库名。–hive-table:导出到 Hive 的表名。
[hadoop@master ~]$ sqoop import --connect jdbc:mysql://server:3306/test --username root --password 123456 --table student --fields-terminated-by '|' --delete-target-dir --num-mappers 1 --hive-import --hive-database test --hive-table student .......... 22/02/25 20:02:10 INFO hive.Hiveimport: OK 22/02/25 20:02:10 INFO hive.Hiveimport: Time taken: 10.6 seconds 22/02/25 20:02:13 INFO hive.Hiveimport: Loading data to table test.student 22/02/25 20:02:14 INFO hive.Hiveimport: OK 22/02/25 20:02:14 INFO hive.Hiveimport: Time taken: 3.846 seconds 22/02/25 20:02:15 INFO hive.Hiveimport: Hive import complete. 22/02/25 20:02:15 INFO hive.Hiveimport: Export directory is contains the _SUCCESS file only, removing the directory.
查看Hive表中的数据:
hive> show tables; OK student Time taken: 0.409 seconds, Fetched: 1 row(s) hive> select * from student; OK 01 zhangsan 02 lisi 03 wangwu Time taken: 1.371 seconds, Fetched: 3 row(s)
返回顶部



