栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Java

Java实现AC自动机全文检索示例

Java 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Java实现AC自动机全文检索示例

第一步,构建Trie树,定义Node类型:


interface Node {

  char value();

  boolean exists();

  boolean isRoot();

  Node parent();

  Node childOf(char c);

  Node fail();

  void setFail(Node node);

  void setExists(boolean exists);

  void add(Node child);

  List children();
}

第二步,实现两种Node,如果词汇全是可打印的ASCII字符,就采用AsciiNode,否则(比如包含汉字),使用基于hash表的MapNode;这两种Node均集成自AbstractNode:


abstract class AbstractNode implements Node {

  private static final char EMPTY = '';
  private final char value;
  private final Node parent;
  private boolean exists;
  private Node fail;

  public AbstractNode(Node parent, char value) {
    this.parent = parent;
    this.value = value;
    this.exists = false;
    this.fail = null;
  }

  public AbstractNode() {
    this(null, EMPTY);
  }


  private static String tab(int n) {
    StringBuilder builder = new StringBuilder();
    for (int i = 0; i < n; i++) {
      builder.append('t');
    }
    return builder.toString();
  }

  private static String toString(Node node, int depth) {
    StringBuilder builder = new StringBuilder();
    String tab = tab(depth);
    Node fail = node.fail();
    Node parent = node.parent();
    builder
 .append(tab)
 .append('<')
 .append(node.value())
 .append(" exists="")
 .append(node.exists())
 .append('"')
 .append(" parent="")
 .append(parent == null ? "null" : parent.value())
 .append('"')
 .append(" fail="")
 .append(fail == null ? "null" : fail.value())
 .append("">rn");
    for (Node child : node.children())
      builder.append(toString(child, depth + 1));
    builder
 .append(tab)
 .append("rn");

    return builder.toString();
  }

  @Override
  public char value() {
    return value;
  }


  @Override
  public boolean exists() {
    return exists;
  }

  @Override
  public boolean isRoot() {
    return value == EMPTY;
  }

  @Override
  public Node parent() {
    return parent;
  }

  @Override
  public Node fail() {
    return fail;
  }

  @Override
  public void setFail(Node node) {
    this.fail = node;
  }

  @Override
  public void setExists(boolean exists) {
    this.exists = exists;
  }

  @Override
  public String toString() {
    return toString(this, 0);
  }
}

/////////////////////////////////////////////////////////////////////////////////////////////


final class AsciiNode extends AbstractNode implements Node {


  private static final char FROM = 32;
  private static final char TO = 126;
  private final Node[] children;


  public AsciiNode() {
    super();
    this.children = new Node[TO - FROM + 1];
  }

  public AsciiNode(Node parent, char value) {
    super(parent, value);
    this.children = new Node[TO - FROM + 1];
  }

  @Override
  public Node childOf(char c) {
    if (c >= FROM && c <= TO)
      return children[(int) c - FROM];
    else return null;
  }

  @Override
  public void add(Node child) {
    int index = (int) child.value();
    children[index - FROM] = child;
  }

  @Override
  public List children() {
    List nodes = new ArrayList();
    for (Node child : children)
      if (child != null)
 nodes.add(child);
    return nodes;
  }
}


//////////////////////////////////////////////////////////////////////////////////////////////


final class MapNode extends AbstractNode implements Node {

  private final Map children;

  public MapNode() {
    super();
    this.children = new HashMap();
  }

  public MapNode(Node parent, char value) {
    super(parent, value);
    this.children = new HashMap();
  }

  @Override
  public Node childOf(char c) {
    return children.get(c);
  }

  @Override
  public void add(Node child) {
    children.put(child.value(), child);
  }

  @Override
  public List children() {
    List nodes = new ArrayList();
    nodes.addAll(children.values());
    return nodes;
  }
}

第三步,

首先定义一个Node构造器:


public interface NodeMaker {

  Node make(Node parent, char value);

  Node makeRoot();
}

然后构建AC自动机,实现创建及查找方法:


public final class WordTable {

  private final Node root;


  private WordTable(Collection words, NodeMaker maker) {
    Node root = buildTrie(words, maker);
    setFailNode(root);
    this.root = root;
  }

  public static WordTable compile(Collection words) {
    if (words == null || words.isEmpty())
      throw new IllegalArgumentException();
    final NodeMaker maker;
    if (isAscii(words))
      maker = new NodeMaker() {
 @Override
 public Node make(Node parent, char value) {
   return new AsciiNode(parent, value);
 }

 @Override
 public Node makeRoot() {
   return new AsciiNode();
 }
      };
    else maker = new NodeMaker() {
      @Override
      public Node make(Node parent, char value) {
 return new MapNode(parent, value);
      }

      @Override
      public Node makeRoot() {
 return new MapNode();
      }
    };
    return new WordTable(words, maker);
  }

  private static boolean isAscii(Collection words) {
    for (CharSequence word : words) {
      int len = word.length();
      for (int i = 0; i < len; i++) {
 int c = (int) word.charAt(i);
 if (c < 32 || c > 126)
   return false;
      }
    }
    return true;
  }

  private static Node buildTrie(Collection sequences, NodeMaker maker) {
    Node root = maker.makeRoot();
    for (CharSequence sequence : sequences) {
      int len = sequence.length();
      Node current = root;
      for (int i = 0; i < len; i++) {
 char c = sequence.charAt(i);
 Node node = current.childOf(c);
 if (node == null) {
   node = maker.make(current, c);
   current.add(node);
 }
 current = node;
 if (i == len - 1)
   node.setExists(true);
      }
    }
    return root;
  }

  private static void setFailNode(final Node root) {
    root.setFail(null);
    Queue queue = new linkedList();
    queue.add(root);
    while (!queue.isEmpty()) {
      Node parent = queue.poll();
      Node temp;
      for (Node child : parent.children()) {
 if (parent.isRoot())
   child.setFail(root);
 else {
   temp = parent.fail();
   while (temp != null) {
     Node node = temp.childOf(child.value());
     if (node != null) {
child.setFail(node);
break;
     }
     temp = temp.fail();
   }
   if (temp == null)
     child.setFail(root);
 }
 queue.add(child);
      }
    }
  }


  public boolean findAnyIn(CharSequence cs) {
    int len = cs.length();
    Node node = root;
    for (int i = 0; i < len; i++) {
      Node next = node.childOf(cs.charAt(i));
      if (next == null) {
 next = node.fail();
 if (next == null) {
   node = root;
   continue;
 }
      }
      if (next.exists())
 return true;
    }

    return false;
  }

  public List search(CharSequence cs) {
    if (cs == null || cs.length() == 0)
      return Collections.emptyList();
    List result = new ArrayList();
    int len = cs.length();
    Node node = root;
    for (int i = 0; i < len; i++) {
      Node next = node.childOf(cs.charAt(i));
      if (next == null) {
 next = node.fail();
 if (next == null) {
   node = root;
   continue;
 }
      }
      if (next.exists()) {
 MatchInfo info = new MatchInfo(i, next);
 result.add(info);
 node = root;
 continue;
      }
      node = next;
    }
    return result;
  }

  @Override
  public String toString() {
    return root.toString();
  }
}

定义一个保存查找结果的实体:


public final class MatchInfo {

  private final int index;
  private final String word;

  public MatchInfo(int index, String word) {
    this.index = index;
    this.word = word;
  }

  public MatchInfo(int index, Node node) {
    StringBuilder builder = new StringBuilder();
    while (node != null) {
      if (!node.isRoot())
 builder.append(node.value());
      node = node.parent();
    }
    String word = builder.reverse().toString();
    this.index = index + 1 - word.length();
    this.word = word;
  }


  public int getIndex() {
    return index;
  }

  public String getWord() {
    return word;
  }

  @Override
  public String toString() {
    return index + ":" + word;
  }
}

第四步,调用Demo:

public static void main(String[] args) {
    List list = Arrays.asList("say", "her", "he", "she", "shr", "alone");
    WordTable table = WordTable.compile(list);
    System.out.println(table);
    System.out.println(table.search("1shesaynothingabouthislivinghimalone"));
  }

以下是输出结果:

< exists="false" parent="null" fail="null">
 
 
  
  
 
 
  
  
  
  
 
 
 
 
  
  
 
 
 
 
  
  
   
   
  
  
 
 


[1:she, 4:say, 31:alone]

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持考高分网。

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/147808.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号