1.本地测试
(1)需要首先配置好HadoopHome变量以及Windows运行依赖
(2)在Eclipse/Idea上运行程序
2.集群上测试
(1)用maven打jar包,需要添加的打包插件依赖
(2)将程序打成jar包,然后拷贝到Hadoop集群中
执行命令
//jar包名 驱动的全类名 文件输入目录 输出路径 hadoop jar mapreduce200105-1.0-SNAPSHOT.jar com.wordcount.WcDriver /test.txt /output1
3.在Windows上向集群提交任务
(1)添加必要配置信息
public class WcDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://hadoop101:8020");
configuration.set("mapreduce.framework.name","yarn");
configuration.set("mapreduce.app-submission.cross-platform","true");
configuration.set("yarn.resourcemanager.hostname","hadoop102");
configuration.set("mapred.job.queue.name","hive");
// // 开启map端输出压缩
// configuration.setBoolean("mapreduce.map.output.compress", true);
// // 设置map端输出压缩方式
// configuration.setClass("mapreduce.map.output.compress.codec", SnappyCodec.class,CompressionCodec.class);
//1.先获取Job实例
Job job = Job.getInstance(configuration);
//2.设置Jar包
// job.setJarByClass(WcDriver.class);
job.setJar("D:\mywork\IDEAproject\mapreduce\target\mapreduce200105-1.0-SNAPSHOT.jar");
//3.设置Mapper和Reducer
job.setMapperClass(WcMapper.class);
job.setReducerClass(WcReducer.class);
//4.设置Map和Reduce的输出类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
//5.设置输入输出文件
//5-1)设置输入路径
FileInputFormat.setInputPaths(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
//
// // 设置reduce端输出压缩开启
// FileOutputFormat.setCompressOutput(job, true);
//
// // 设置压缩的方式
// FileOutputFormat.setOutputCompressorClass(job, BZip2Codec.class);
//6.提交Job
boolean b = job.waitForCompletion(true);
System.exit(b ? 0 : 1);
}
}
(2)编辑任务配置
(3)提交并查看结果



