栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

DataX 实现从MySQL导入数据到HDFS

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

DataX 实现从MySQL导入数据到HDFS

一、环境准备
  • Linux
  • JDK(1.8以上,推荐1.8)
  • Python(推荐Python2.6.X)

yum安装Python:https://www.cnblogs.com/kaishirenshi/p/11858655.html

# centos7
# 换成阿里云的yum源
yum -y install epel-release
yum repolist
yum -y install python36
  • 下载Datax:https://github.com/alibaba/DataX
二、案例实操 2.1 安装

1)将下载好的datax.tar.gz上传到hadoop101的/opt/softwares
2)解压datax.tar.gz到/opt/module

[atguigu@hadoop102 software]$ tar-zxvf datax.tar.gz-C /opt/module/

3)运行自检脚本

[atguigu@hadoop102 bin]cd /opt/module/datax/bin/
[atguigu@hadoop102 bin] python datax.py /opt/module/datax/job/job.json
2.2 查看官方模板
python /opt/module/datax/bin/datax.py - r mysqlreader - w
hdfswriter {
	"job": {
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"column": [],
					"connection": [{
						"jdbcUrl": [],
						"table": []
					}],
					"password": "",
					"username": "",
					"where": ""
				}
			},
			"writer": {
				"name": "hdfswriter",
				"parameter": {
					"column": [],
					"compress": "",
					"defaultFS": "",
					"fieldDelimiter": "",
					"fileName": "",
					"fileType": "",
					"path": "",
					"writeMode": ""
				}
			}
		}],
		"setting": {
			"speed": {
				"channel": ""
			}
		}
	}
}


2.3 准备数据

1)创建 student 表

mysql> create database datax;
mysql> use datax;
mysql> create table student(id int,name varchar(20));

2)插入数据

mysql> insert into student values(1001,'zhangsan'),(1002,'lisi'),(1003,'wangwu');
2.4 编写配置文件
vim /opt/module/datax/job/mysql2hdfs.json
{
	"job": {
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"column": [
						"id",
						"name"
					],
					"connection": [{
						"jdbcUrl": [
							"jdbc:mysql://hadoop102:3306/datax"
						],
						"table": [
							"student"
						]
					}],
					"username": "root",
					"password": "000000"
				}
			},
			"writer": {
				"name": "hdfswriter",
				"parameter": {
					"column": [{
							"name": "id",
							"type": "int"
						},
						{
							"name": "name",
							"type": "string"
						}
					],
					"defaultFS": "hdfs://hadoop102:9000",
					"fieldDelimiter": "t",
					"fileName": "student.txt",
					"fileType": "text",
					"path": "/",
					"writeMode": "append"
				}
			}
		}],
		"setting": {
			"speed": {
				"channel": "1"
			}
		}
	}
}

MySQL的地址和HDFS的地址记得改为自己的,HDFS的端口可在配置文件中查看

2.5 执行任务
bin/datax.py job/mysql2hdfs.json

注意: HdfsWriter 实际执行时会在该文件名后添加随机的后缀作为每个线程写入实际文件名。

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/355008.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号