无索引数据有2亿左右,用sqoop导入进度太慢,等待很久map还是0%
打算采取分而治理
select * from t_swmx_sl into outfile 't_swmx_sl.csv' fields terminated by '^A 用ctrl+v然后再ctrl+a可以输入' lines terminated by 'n' ;
- mysql的fields terminated by ‘^A’ 相当于hive的’ 01’
hive 默认的字段分隔符为ascii码的控制符 01,就是建表的时候用fields terminated by ‘ 01’
如果要测试的话,造数据在vi 打开文件里面,用ctrl+v然后再ctrl+a可以输入这个控制符 01。按顺序, 02的输入方式为ctrl+v,ctrl+b”
常见错误 MySQL error Query execution was interrupted, maximum statement execution time exceeded
MySQL 查询时,报如下错:
Query execution was interrupted, maximum statement execution time exceeded
查询数据库最大语句执行时间,默认为10s,单位是毫秒
SELECt @@global.max_execution_time
设置最大执行时间, 设置为30s
SET global max_execution_time = 30000;
用时间:11:38->45 用时7分钟
2、加载到hadoop中hadoop fs -put t_swmx_sl.csv /
用时:70s
- hadoop put 比普通的ftp上传快很多
3、创建hive表查看时提示:
Failed to retrieve data from /webhdfs/v1/t_swmx_sl.csv?op=GET_BLOCK_LOCATIONS:
浏览器兼容的问题
CREATE TABLE `t_swmx_sl` ( `srun_username` string COMMENT '', `srun_ip` string COMMENT '', `srun_mac` string COMMENT '', `srun_uptime` TIMESTAMP COMMENT '', `srun_downtime` TIMESTAMP COMMENT '' )COMMENT '' row format delimited fields terminated by ' 01' lines terminated by 'n'4、再加载到hive中
load data [local] inpath ‘数据的 path’ [overwrite] into table student [partition (partcol1=val1,…)];
–load data:表示加载数据
–local:表示从本地加载数据到 hive 表;否则从 HDFS 加载数据到 hive 表
–inpath:表示加载数据的路径
–overwrite:表示覆盖表中已有数据,否则表示追加
–into table:表示加载到哪张表
–student:表示具体的表
–partition:表示上传到指定分区
load data inpath '/t_swmx_sl.csv' INTO TABLE t_swmx_sl ;
或者不用步骤2直接本地加载
load data local inpath '/home/hadoop/data' overwrite into table t_swmx_sl;5、创建hive分区表
CREATE TABLE `t_swmx_sl_p` ( `srun_username` string COMMENT '', `srun_ip` string COMMENT '', `srun_mac` string COMMENT '', `srun_uptime` TIMESTAMP COMMENT '', `srun_downtime` TIMESTAMP COMMENT '' )COMMENT '' PARTITIonED BY ( `up_mouth` string ) row format delimited fields terminated by ' 01' lines terminated by 'n' ;6、用hive动态分区转移到正式分区表中
set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrick; insert overwrite table t_swmx_sl_p PARTITION (up_mouth) SELECt srun_username ,srun_ip ,srun_mac ,srun_uptime ,srun_downtime,date_format(srun_uptime,'yyyyMM') as up_mouth from t_swmx_sl limit 10 ; set hive.exec.dynamic.partition.mode=strict; set hive.exec.dynamic.partition=false;
用时 428s
然后将t_swmx_sl_p和t_swmx_sl表名互换
到目前t_swmx_sl保存的是所有数据并且已经按月分区
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect jdbc:mysql://xnode0:3306/dbname --username uname --password 'wc' --fields-terminated-by ' 01' --delete-target-dir --num-mappers 1 --target-dir /user/root/sqoop/t_swmx_sl --hive-import --hive-overwrite --hive-database default --hive-partition-key up_mouth --hive-partition-value '202110' --hive-table t_swmx_sl --query 'SELECT srun_username ,srun_ip ,srun_mac ,srun_uptime ,srun_downtime from t_swmx_sl where date_format(srun_uptime,'"'%Y%m'"')=202110 and $ConDITIONS '



