Hadoop MapReduce如何进行WordCount自主编译运行

上次我们已经搭建了Hadoop的伪分布式环境，并且运行了一下Hadoop自带的例子–WordCount程序，展现良好。但是大多数时候还是得自己写程序，编译，打包，然后运行的，所以做一次自编译打包运行的实验。

编辑程序

在Eclipse或者NetBeans中编辑WordCount.java程序，用IDE的好处就是我们可以更方便的选择各种依赖的jar包，并且它会帮我们编译好，我们只需要去workspace中拿出class文件打包就好了，或者直接打包就行。而不用在命令行输入很多依赖jar包去打包，这样更加省事。

1.新建Java Project，名为WordCount，然后建立一个叫test的package，新建WordCount.java，编辑好。结构如下：

2.这时候我们的workspace/WordCount/bin/test目录下自动生成了编译好的三个class文件。

3.将class文件打包。如下图所示，在bin/test目录下输入

$ jar cvf WordCount.jar test/

即可将class文件打包为WordCount.jar.

4.运行hdfs:

$ cd /usr/local/hadoop $ ./sbin/start-dfs.sh 
$ jps    //检查是否启动NameNode,DataNode等

5.往HDFS上的input文件夹中put一个文本文件或者xml文件，上篇文章有讲。比如：

$ hadoop fs -put /usr/local/hadoop/etc/hadoop       public static class TokenizerMapper extends Mapper { 
                 private final static IntWritable one = new IntWritable(1); 
        private Text word = new Text();      //Text 实现了BinaryComparable类可以作为key值     
       
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {          
          
            StringTokenizer itr = new StringTokenizer(value.toString());  //得到什么值       
            while (itr.hasMoreTokens()) {                 word.set(itr.nextToken()); 
                context.write(word, one);             } 
        }     } 
       public static class IntSumReducer extends Reducer { 
        private IntWritable result = new IntWritable();          public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { 
            int sum = 0;             for (IntWritable val : values) { 
                sum += val.get();             } 
            result.set(sum);             context.write(key, result); 
        }     } 
     public static void main(String[] args) throws Exception { 
                 Configuration conf = new Configuration(); 
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();         //这里需要配置参数即输入和输出的HDFS的文件路径 
        if (otherArgs.length != 2) {             System.err.println("Usage: wordcount  "); 
            System.exit(2);         } 
        Job job = new Job(conf, "word count");      // Job(Configuration conf, String jobName)         job.setJarByClass(WordCount.class); 
    job.setMapperClass(TokenizerMapper.class);  // 为job设置Mapper类      job.setCombinerClass(IntSumReducer.class);  // 为job设置Combiner类   
    job.setReducerClass(IntSumReducer.class);   // 为job设置Reduce类        job.setOutputKeyClass(Text.class);          // 设置输出key的类型 
    job.setOutputValueClass(IntWritable.class); // 设置输出value的类型     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));  
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));     System.exit(job.waitForCompletion(true) ? 0 : 1); 
    } }

Hadoop MapReduce如何进行WordCount自主编译运行

Hadoop相关栏目本月热门文章