栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

第二章 基于dolphinscheduler的Datax组件进行数据迁移之Oracle到hive

第二章 基于dolphinscheduler的Datax组件进行数据迁移之Oracle到hive

1、源表
create table T_YYBZB_TGH_BANKINFO
(
id        NUMBER,
bank_id   NUMBER,
bank_name varchar(200)
);
  • 同步源系统数据,每天一个分区表
2、创建hive目标表

(1)编写建表脚本

create table ods.ods_t_yybzb_tgh_bankinfo_di
(
  id        int,
  bank_id   int,
  bank_name string
)
partitioned by (`pt` string)
row format delimited fields terminated by ',';

(2)配置SQL组件

  • sql类型:非查询

(3)部署上线

(4)调试运行

  • 进入建表画布

  • 选择节点单独运行

(5)查看运行结果

  • 运行日志
[INFO] 2021-12-13 14:56:14.122  - [taskAppId=TASK-17-110-215]:[558] - Sql Params are replaced sql , parameters:
[INFO] 2021-12-13 14:56:14.124  - [taskAppId=TASK-17-110-215]:[52] - can't find udf function resource
[INFO] 2021-12-13 14:56:15.933  - [taskAppId=TASK-17-110-215]:[458] - prepare statement replace sql : org.apache.hive.jdbc.HivePreparedStatement@659999e2 
3、初始化导入数据
  • 将历史数据导入表中

(1)编写初始化SQL脚本

alter table ods.ods_t_yybzb_tgh_bankinfo_di drop if exists partition(pt=${pt})

alter table ods.ods_t_yybzb_tgh_bankinfo_di add if not exists  partition (pt=${pt})

(2)配置SQL组件

  • sql类型:非查询
  • 自定义参数:pt

(3)编写Datax的Json配置脚本

{
        "job": {
                "content": [{
                        "reader": {
                                "name": "oraclereader",
                                "parameter": {
                                        "username": "SCOTT",
                                        "password": "123456",
                                        "where": "",
                                        "connection": [{
                                                "querySql": ["SELECT id,bank_id,bank_name FROM T_YYBZB_TGH_BANKINFO"],
                                                "jdbcUrl": [
                                                        "jdbc:oracle:thin:@10.6.13.66:1521:orcl"
                                                ]
                                        }]
                                }
                        },
                        "writer": {
                                "name": "hdfswriter",
                                "parameter": {
                                        "defaultFS": "hdfs://192.168.6.102:8020",
                                        "fileType":"text",
                                                                                "path":"/user/hive/warehouse/ods.db/ods_t_yybzb_tgh_bankinfo_di/pt=20211212",
                                                                                "fileName":"ods_t_yybzb_tgh_bankinfo_di",
                                                                                "column":[
                                                                                        {"name":"id","type":"int"},
																						{"name":"bank_id","type":"int"},
																						{"name":"bank_name","type":"string"}
                                                                                ],
                                        "writeMode": "append",
                                        "fieldDelimiter": "u0001",
                                        "encoding": "utf-8"
                                }
                        }
                }],
                "setting": {
                        "speed": {
                                "channel": "1"
                        }
                }
        }
}
  • 注意事项:先用Datax测试脚本是否正确
python /opt/module/datax/bin/datax.py source.json
  • 运行结果

(4)配置DataX组件初始导入数据

  • 参数替换
{
        "job": {
                "content": [{
                        "reader": {
                                "name": "oraclereader",
                                "parameter": {
                                        "username": "${username}",
                                        "password": "${passward}",
                                        "where": "",
                                        "connection": [{
                                                "querySql": ["SELECt id,bank_id,bank_name FROM T_YYBZB_TGH_BANKINFO"],
                                                "jdbcUrl": [
                                                        "jdbc:oracle:thin:@${oracle_ip}:${oracle_port}:${sid}"
                                                ]
                                        }]
                                }
                        },
                        "writer": {
                                "name": "hdfswriter",
                                "parameter": {
                                        "defaultFS": "hdfs://${hdfs_ip}:${hdfs_port}",
                                        "fileType":"text",
										"path":"/user/hive/warehouse/ods.db/ods_t_yybzb_tgh_bankinfo_di/pt=${pt}",
										"fileName":"ods_t_yybzb_tgh_bankinfo_di",
										"column":[
												{"name":"id","type":"int"},
												{"name":"bank_id","type":"int"},
												{"name":"bank_name","type":"string"}
										],
                                        "writeMode": "append",
                                        "fieldDelimiter": "u0001",
                                        "encoding": "utf-8"
                                }
                        }
                }],
                "setting": {
                        "speed": {
                                "channel": "1"
                        }
                }
        }
}

  • 配置JSON组件

(5)调试运行

  • 上线部署

  • 进入画布

  • 调试运行:运行任务链的第一个节点

(6)查看运行结果

  • 在任务实例查看

4、设置定时调度
  • 点击定时任务

  • 设置每天6点执行

5、常见问题 5.1、Oracle启动出错

(1)问题现象

ERROR:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory

(2)问题原因

  • 分析后发现监听器已经打开,原来是因为服务未启动导致。

(3)解决方案

  • 执行以下命令
sqlplus / as sysdba
startup
exit 
  • 之后启动Oracle
5.2、Oracle插入数据后查询不到数据
  • 问题原因:没有commit事务;
  • 解决方案:insert,update,create之后commit;
5.3、Oracl通过DataX同步到Hive全部显示为NULL
  • 待解决
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/663373.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号