1.在 app/hadoop/etc/hadoop/mapred-site.xml 编辑加入
mapreduce.framework.name yarn
2.在 app/hadoop/etc/hadoop/yarn-site.xml编辑加入
yarn.nodemanager.aux-services mapreduce_shuffle
3.启动yarn
start-yarn.sh
4.关闭yarn
stop-yarn.sh防止8088端口被挖矿
在 app/hadoop/etc/hadoop/yarn-site.xml编辑加入
作业yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}:9527
1.yarn.app.mapreduce.am.env、mapreduce.map.env、mapreduce.reduce.env的意义
yarn.app.mapreduce.am.env:MapReduce作业产生的日志存放位置。
mapreduce.map.env:Map Task任务
mapreduce.reduce.env:Reduce Task任务
2.跑一遍wordcount作业
在 data/目录下创建一个txt文件
touch wordcount.txt
用vi命令编辑wordcount.txt内容
在hdfs上创建 /wordcount/input 目录文件夹,并将 wordcount.txt移入input目录下
hdfs dfs -mkdir -p /wordcount/input hdfs dfs -put wordcount.txt /wordcount/input
运行以下命令,第一遍会报错
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /wordcount/input /wordcount/output
在 app/hadoop/etc/hadoop/mapred-site.xml 编辑加入
yarn.app.mapreduce.am.env HADOOP_MAPRED_HOME=${HADOOP_HOME} mapreduce.map.env HADOOP_MAPRED_HOME=${HADOOP_HOME} mapreduce.reduce.env HADOOP_MAPRED_HOME=${HADOOP_HOME}
再次运行,成功。
3.sql题
(1)emp表,求部门内(DEPTNO)最高的总薪资(SAL+COMM)的部门,姓名,总薪资
select deptno, ename, total_sal from (select deptno, ename, sal + nvl(comm, 0) as total_sal, rank() over(partition by deptno order by sal + nvl(comm, 0) desc) as rn, max(sal + nvl(comm, 0)) over(partition by deptno order by sal + nvl(comm, 0) desc) as max_sal from emp) t where t.max_sal = t.total_sal
(2)emp表,求部门内薪资(SAL)大于平均薪资的人的部门,姓名,薪资和平均薪资
要求:用两种方法实现,自关联和窗口函数
select t1.deptno, t1.ename, t1.sal, t2.avg_sal from emp t1 right join (select avg(SAL) avg_sal from emp) t2 on t1.sal > t2.avg_sal select deptno, ename, sal, avg_sal from (select deptno, ename, sal, avg(sal) over(partition by deptno order by sal) avg_sal from emp) t where t.sal > t.avg_sal
(3)emp表,求同部门,按照薪资(SAL)从大到小排名,排名上一位和下一位分别是多少薪资
select t1.sal, t1.deptno, t1.rk, t2.sal, t3.sal from (select sal, deptno, dense_rank() over(partition by deptno order by sal desc) rk from emp) t1 left join (select deptno, sal, dense_rank() over(partition by deptno order by sal desc) as rk1 from emp) t2 on t1.deptno = t2.deptno and t2.rk1 = t1.rk - 1 left join (select deptno, sal, dense_rank() over(partition by deptno order by sal desc) as rk2 from emp ) t3 on t1.deptno = t3.deptno and t3.rk2 = t1.rk + 1 order by t1.deptno, t1.rk
(4)ROWS BETWEEN
语法
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING
4.如何避免小文件过多
合并小文件,可以选择在客户端上传时执行一定的策略先合并,或者是使用Hadoop的CombineFileInputFormat



