Sqoop同步mysql中的表到hdfs、hive

– sqoop将mysql的数据导入到hive中

出现错误
Data source rejected establishment of connection, message from server: “Too many connections”

问题分析：
查看MySQL的当前最大连接数，登录MySQL：mysql -uroot -p，回车；输入密码，回车；

输入命令：select VARIABLE_VALUE from information_schema.GLOBAL_VARIABLES where VARIABLE_NAME=‘MAX_CONNECTIONS’; 回车

此时出现错误：
mysql报错：ERROR 3167 (HY000): The ‘INFORMATION_SCHEMA.GLOBAL_VARIABLES’ feature is disabled; see the documentation for ‘show_compatibility_56’

解决办法：

mysql> show variables like '%show_compatibility_56%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| show_compatibility_56 | OFF   |
+-----------------------+-------+
1 row in set (0.01 sec)

mysql> set global show_compatibility_56=on;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like '%show_compatibility_56%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| show_compatibility_56 | ON    |
+-----------------------+-------+
1 row in set (0.00 sec)

解决问题：

要彻底解决问题还是要修改my.cnf配置文件，这里使用VI来修改，输入命令：vi /etc/my.cnf 回车；打开文件后按“i”键进入编辑状态；(可用命令find / -name my.cnf进行查找)
在“[mysqld]”下面添加“max_connections=3600”，按Esc键进入命令模式，输入“:wq”回车（保存并退出）。
执行：service mysql restart 重新启动MySQL服务；启动服务的时间可能有点长，耐心等待……

– 第一步，先把数据导入到hdfs上

bin/sqoop import 
--connect jdbc:mysql://192.168.100.26:3306/test 
--username root 
--password zd3123 
--query 'select name,age from user_tmp_d where $ConDITIONS LIMIT 10' 
--target-dir /test/user_tmp_d_sannpy_ 
--delete-target-dir 
--num-mappers 1 
--compress 
--compression-codec org.apache.hadoop.io.compress.SnappyCodec 
--fields-terminated-by 't' 
--driver com.mysql.jdbc.Driver

– 第二步，在hive中创建一张表

drop table if exists ods.ods_user_tmp ;
CREATE  TABLE IF NOT EXISTS `ods.ods_user_tmp` (
  `name` string , 
  `age` int 
  )
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' ;

– 第三步，把hdfs上面的数据，load到hive表中

load data inpath '/test/user_tmp_d_sannpy_' into table ods.ods_user_tmp ;

**********************************************mysql导入到hdfs***************************************************************

1、创建一张跟mysql中的db_info表一样的hive表ods.db_info:

bin/sqoop create-hive-table 
--connect jdbc:mysql://192.168.100.26:3306/test 
--username root 
--password zd3123 
--table db_info 
--hive-table ods.ods_db_info 
--driver com.mysql.jdbc.Driver

报错：
ERROR Could not register mbeans java.security.AccessControlException: access denied (“javax.management.MBeanTrustPermission” “register”)
解决方法：
　　1、将hive-site.xml复制到${SQOOP_HOME}/conf下
　　2、vim $JAVA_HOME/jre/lib/security/java.policy： find / -name java.policy 进行查找
　　　　在grant{}内部添加如下内容：
permission javax.management.MBeanTrustPermission “register”;

2、将mysql中的数据导入到HDFS

bin/sqoop import 
--connect jdbc:mysql://192.168.100.26:3306/test 
--username root 
--password zd3123 
--table db_info 
-m 1    
--driver com.mysql.jdbc.Driver 

默认存储到hdfs上的  /user/root/db_info/part-m-00000中

3、将mysql中的数据导入到HDFS(按照指定分隔符和路径进行导入)

bin/sqoop import   
--connect jdbc:mysql://192.168.100.26:3306/test 
--username root 
--password zd3123 
--table db_info 
--target-dir /test/db_info  
--fields-terminated-by ','  
--num-mappers 1 
--driver com.mysql.jdbc.Driver

4、将mysql中的数据导入到HDFS(带where条件)
带where条件，查询指定列（相当于select id , db_type from db_info where id=“1”）

bin/sqoop import 
--connect jdbc:mysql://192.168.100.26:3306/test 
--username root  
--password zd3123 
--columns "id , db_type" 
--where "id='1'" 
--table db_info 
--target-dir /test/db_info2  
--num-mappers 1 
--driver com.mysql.jdbc.Driver

5、将mysql中的数据导入到HDFS(自定义查询SQL)

bin/sqoop import 
--connect jdbc:mysql://192.168.100.26:3306/test 
--username root  
--password zd3123 
--target-dir /test/db_info3  
--query 'select id , db_type from db_info where $ConDITIONS ' 
--split-by  id 
--fields-terminated-by 't'  
--num-mappers 1 
--driver com.mysql.jdbc.Driver

注意：
.split-by 根据不同的参数类型有不同的切分方法，如int型，Sqoop会取最大和最小split-by字段值，然后根据传入的num-mappers来 确定划分几个区域。
比如select max(split_by),min(split-by) from得到的max(split-by)和min(split-by)分别为1000和1，而num-mappers（-m）为2的话，则会分成两个区域 (1,500)和(501-1000),
同时也会分成2个sql给2个map去进行导入操作，分别为select XXX from table where split-by>=1 and split-by<500和select XXX from table where split-by>=501 and split-by<=1000.
最后每个map各自获取各自SQL中的数据进行导入工作。

当split-by不是int型时出现如上场景中的问题。
目前想到的解决办法是：将-m 设置为1，split-by不设置，即只有一个map运行，缺点是不能并行map录入数据。
（注意，当-m 设置的值大于1时，split-by必须设置字段）
.split-by即便是int型，若不是连续有规律递增的话，各个map分配的数据是不均衡的，可能会有些map很忙，有些map几乎没有数据处理的情况

Sqoop Query imports自由查询模式下$CONDITIONS的作用：
    1、必须制定目标文件的位置:--target-dir
    2、必须使用$CONDITIONS关键字
    3、选择使用--split-by分片（分区，结果分成多个小文件
）

**********************************************mysql直接导入到hive***************************************************************

1、执行sqoop直接导入hive的功能
复制代码

bin/sqoop import 
--connect jdbc:mysql://192.168.100.26:3306/test 
--username root 
--password zd3123 
--table db_info 
--fields-terminated-by 't' 
--delete-target-dir 
--num-mappers 1 
--hive-import 
--hive-database ods 
--hive-table ods_db_info_1 
--driver com.mysql.jdbc.Driver

报错：
Ensure that you have called .close() on any active streaming result sets before attempting more queries.
加上 --driver com.mysql.jdbc.Driver
该过程分为两步：
第一步将数据导入到HDFS
第二步将导入到HDFS的数据迁移到Hive仓库,第一步默认的临时目录是/user/用户/表名

**********************************************mysql直接导入到hbase(普通导入)***************************************************************

bin/sqoop import 
--connect jdbc:mysql://192.168.100.26:3306/test 
--username root 
--password zd3123 
--table db_info 
--hbase-table db_info 
--column-family cf 
--hbase-row-key keyid

Sqoop同步mysql中的表到hdfs、hive

大数据系统相关栏目本月热门文章