Sqoop的简单使用和使用参数介绍

Sqoop的简单使用和使用参数介绍 1.验证Sqoop配置是否正确

将mysql-connector-java-5.1.48.jar 上传到/opt/software/路劲

进入到/opt/software/路劲，拷贝jdbc驱动到sqoop的lib目录下。

我们可以通过一个command来验证sqoop配置是否正确：

bin/sqoop help

将会出现一些Warning警告，并伴随着帮助命令的输出：

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table     import a table definition into Hive
  eval               evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             import a table from a database to HDFS
  import-all-tables     import tables from a database to HDFS
  import-mainframe    import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases        List available databases on a server
  list-tables           List available tables in a database
  merge              Merge results of incremental imports
  metastore           Run a standalone Sqoop metastore
  version            Display version information

2.测试Sqoop是否能够成功连接数据库

查看数据库

bin/sqoop list-databases --connect jdbc:mysql://hadoop102:3306/ --username root --password 123456

查看数据库下的表

bin/sqoop list-tables --connect jdbc:mysql://hadoop102:3306/gmall --username root --password 123456

3.使用参数介绍

sqoop import       用于转意，前要有空格，后面不能有空格
--connect jdbc:mysql://hadoop102:3306/gmall 	连接
--username root 	账号
--password 123456 	密码
--target-dir /sqooptest 	目标文件夹
--delete-target-dir 	删除原有的文件夹
--query "select * from user_info where id > 10 and id <30" 	查询的sql语句
--num-mappers 1 	设置map数
--fields-terminated-by 't' 	存储数据的字段间隔

这样会报错，显示少参数，必须要有–split-by

bin/sqoop import 
--connect jdbc:mysql://hadoop102:3306/gmall 
--username root 
--password 123456 
--query "select id,login_name from user_info" 
--target-dir /sqooptest

他要拆分四个map（默认的四个map），但是他又没有inputformat（决定map的个数），那要怎么拆分呢，在sql语句后面加where条件，留一个占位符，同时告诉他是用谁来进行切分的，他去判断where id的一个范围。

bin/sqoop import 
--connect jdbc:mysql://hadoop102:3306/gmall 
--username root 
--password 123456 
--query "select id,login_name from user_info where $CONDITIONS" 
--target-dir /sqooptest 
--split-by id

需要注意的是MR运行的时候目标文件夹不能存在，否则会报错，也有解决办法

bin/sqoop import 
--connect jdbc:mysql://hadoop102:3306/gmall 
--username root 
--password 123456 
--query "select id,login_name from user_info where $CONDITIONS" 
--target-dir /sqooptest 
--delete-target-dir 
--split-by id

SQL语句中的where后面可以写条件但是一定要加and

bin/sqoop import 
--connect jdbc:mysql://hadoop102:3306/gmall 
--username root 
--password 123456 
--query "select id,login_name from user_info where id <40 and $CONDITIONS" 
--target-dir /sqooptest 
--delete-target-dir 
--split-by id

Hive中的Null在底层是以“N”来存储，而MySQL中的Null在底层就是Null，为了保证数据两端的一致性。

在导出数据时采用–input-null-string和–input-null-non-string两个参数。

导入数据时采用–null-string和–null-non-string。

bin/sqoop import 
--connect jdbc:mysql://hadoop102:3306/gmall 
--username root 
--password 123456 
--query "select id,login_name,passwd from user_info where id <40 and $CONDITIONS" 
--target-dir /sqooptest 
--delete-target-dir 
--split-by id 
--null-string '\N' 
--null-non-string '\N'

默认分隔符是逗号‘，’；下面改成制表符

bin/sqoop import 
--connect jdbc:mysql://hadoop102:3306/gmall 
--username root 
--password 123456 
--query "select id,login_name,passwd from user_info where id <40 and $CONDITIONS" 
--target-dir /sqooptest 
--delete-target-dir 
--split-by id 
--null-string '\N' 
--null-non-string '\N' 
--fields-terminated-by 't'

Sqoop的简单使用和使用参数介绍

大数据系统相关栏目本月热门文章