hadoop实现grep示例分享

发布于 2016-07-11 13:13:34 | 447 次阅读 | 评论: 0 | 来源: 网友投递

Hadoop分布式系统

一个分布式系统基础架构，由Apache基金会所开发。用户可以在不了解分布式底层细节的情况下，开发分布式程序。充分利用集群的威力高速运算和存储。

这篇文章主要介绍了hadoop实现grep示例,可从文档中提取包含某些字符串的行,需要的朋友可以参考下

hadoop做的一个简单grep程序，可从文档中提取包含某些字符串的行


/*
 * 一个简单grep程序，可从文档中提取包含莫些字符串的行
 */

public class grep extends Configured implements Tool{

public static class grepMap extends Mapper<LongWritable, Text, Text,NullWritable>{

  public void map(LongWritable line,Text value,Context context) throws IOException, InterruptedException{
   //通过Configuration获取参数
   String str = context.getConfiguration().get("grep");
   if(value.toString().contains(str)){
    context.write(value, NullWritable.get());
   }
  }
}
@Override
public int run(String[] args) throws Exception {

  if(args.length!=3){
   System.out.println("ERROR");
   System.exit(1);
  }

  Configuration configuration = getConf();
  //传递参数
  configuration.set("grep", args[2]);
  Job job = new Job(configuration,"grep");

  job.setJarByClass(grep.class);
  job.setMapperClass(grepMap.class);
  job.setNumReduceTasks(0);

  job.setMapOutputKeyClass(Text.class);
  job.setOutputValueClass(NullWritable.class);

  Path in = new Path(args[0]);
  Path out = new Path(args[1]);
  FileSystem fileSystem = out.getFileSystem(configuration);
  if(fileSystem.exists(out))
   fileSystem.delete(out, true);

  FileInputFormat.addInputPath(job, in);
  FileOutputFormat.setOutputPath(job, out);

  System.exit(job.waitForCompletion(true)?0:1);
  return 0;
}

最新网友评论 共有(0)条评论发布评论返回顶部

Hadoop分布式系统

后端技术

前端技术

数据库

热门框架

常用IDE

其他