生成正则表达式的所有有效值

由于正则表达式是由有限状态机定义的，所以我想知道是否有某种东西可以在此类机器上自动推理，并且是否适合将其用于这项工作……并提供了clojure.core.logic

因此，我查看了正则表达式语法的定义（不幸的是，它缺少{}量词，但它们应该很容易添加到我的代码中）使其适应了Java转义，并得出了以下110行长的clojure程序：

(ns regexp-unfolder.core  (:require [instaparse.core :as insta])  (:require [clojure.core.logic :as l])  (:require [clojure.set :refer [union difference]])  (:gen-class :methods [#^{:static true} [unfold [String] clojure.lang.LazySeq]]))(def parse-regexp (insta/parser   "re = union | simple-re?  union = re '|' simple-re  simple-re = concat | base-re  concat = simple-re base-re  base-re = elementary-re | star | plus  star = elementary-re '*'  plus = elementary-re '+'  elementary-re = group | char | '$' | any | set  any = '.'  group = '(' re ')'  set = positive-set | negative-set  positive-set = '['  set-items ']'  negative-set = '[^' set-items ']'  set-items = set-item*  set-item = range | char  range = char '-' char  char = #'[^\\\-\[\]]|\.'" ))(def printables (set (map char (range 32 127))))(declare fns handle-first)(defn handle-tree [q qto [ type & nodes]]  (if (nil? nodes)    [[q [""] qto]]    ((fns type handle-first) q qto nodes)))(defn star [q qto node &]  (cons [q [""] qto]         (handle-tree q q (first node))))(defn plus [q qto node &]   (concat (handle-tree q qto (first node))          (handle-tree qto qto (first node))))(defn any-char [q qto & _] [[q (vec printables) qto]] )(defn char-range [[c1 _ c2]]  (let [extract-char (comp int first seq second)]    (set (map char (range (extract-char c1) (inc (extract-char c2)))))))(defn items [nodes]  (union (mapcat    (fn [[_ [type & ns]]]      (if (= type :char)        #{(first ns)}     (char-range ns)))    (rest (second nodes)))))(defn handle-set [q qto node &] [[q (vec (items node)) qto]])(defn handle-negset [q qto node &] [[q (vec (difference printables (items node))) qto]])(defn handle-range [q qto & nodes] [[q (vec (char-range nodes)) qto]])(defn handle-char [q qto node &] [[q (vec node) qto]] )(defn handle-concat [q qto nodes]   (let [syms (for [x  (rest nodes)] (gensym q))]    (mapcat handle-tree  (cons q syms) (concat syms [qto] ) nodes)  ))(defn handle-first [q qto [node & _]] (handle-tree q qto node))(def fns {:concat handle-concat, :star star, :plus plus, :any any-char, :positive-set handle-set, :negative-set handle-negset, :char handle-char})(l/defne transition-membero  [state trans newstate otransition]  ([_ _ _ [state trans-set newstate]]     (l/membero trans trans-set)))(defn transitiono [state trans newstate transitions]  (l/conde   [(l/fresh [f]   (l/firsto transitions f)  (transition-membero state trans newstate f))]   [(l/fresh [r]  (l/resto transitions r)  (transitiono state trans newstate r))])  )(declare transitions);; Recognize a regexp finite state machine enpred in triplets [state, transition, next-state], adapted from a snippet made by Peteris Erins(defn recognizeo  ([input]     (recognizeo 'q0 input))  ([q input]     (l/matche [input] ; start pattern matching on the input        (['("")](l/== q 'ok)) ; accept the empty string if we are in an accepting state        ([[i . nput]](l/fresh [qto]       (transitiono q i qto transitions) ; assert it must be what we transition to qto from q with input symbol i       (recognizeo qto nput)))))) ; recognize the remainder(defn -unfold [regex]   (def transitions     (handle-tree 'q0 'ok (parse-regexp regex)))  (map (partial apply str) (l/run* [q] (recognizeo q))))

用core.logic编写，使其适应正则表达式匹配器应该相当容易

我将可打印字符的范围限制在32到126个ascii之间，否则处理regexp这样的正则表达式会很麻烦

[^c]

，但是您可以很轻松地扩展它……而且，我还没有实现联合，可选模式和
w， s等字符类的转义符

到目前为止，这是我在clojure中编写的最大的事情，但是基本知识似乎还不错。

regexp-unfolder.core=> (-unfold "ba[rz]")("bar" "baz")regexp-unfolder.core=> (-unfold "[a-z3-7]")("a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" "3" "4" "5" "6" "7")regexp-unfolder.core=> (-unfold "[a-z3-7][01]")("a0" "a1" "b0" "b1" "c0" "c1" "d0" "d1" "e0" "e1" "f0" "f1" "g0" "g1" "h0" "h1" "i0" "i1" "j0" "j1" "k0" "k1" "l0" "l1" "m0" "m1" "n0" "n1" "o0" "o1" "p0" "p1" "q0" "q1" "r0" "r1" "s0" "s1" "t0" "t1" "u0" "u1" "v0" "v1" "w0" "w1" "x0" "x1" "y0" "y1" "z0" "z1" "30" "31" "40" "41" "50" "51" "60" "70" "61" "71")regexp-unfolder.core=> (-unfold "[^A-z]")(" " "@" "!" """ "#" "$" "%" "&" "'" "(" ")" "*" "+" "," "-" "." "/" "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" ":" ";" "{" "<" "|" "=" "}" ">" "~" "?")regexp-unfolder.core=> (take 20 (-unfold "[abc]*"))("" "a" "b" "c" "aa" "ab" "ac" "ba" "ca" "aaa" "bb" "cb" "aab" "bc" "cc" "aac" "aba" "aca" "baa" "caa")regexp-unfolder.core=> (take 20 (-unfold "a+b+"))("ab" "aab" "abb" "abbb" "aaab" "abbbb" "aabb" "abbbbb" "abbbbbb" "aabbb" "abbbbbbb" "abbbbbbbb" "aaaab" "aabbbb" "aaabb" "abbbbbbbbb" "abbbbbbbbbb" "aabbbbb" "abbbbbbbbbbb" "abbbbbbbbbbbb")

自从我开始这种方式以来，我还实现了无限的输出:)

如果有人感兴趣，我在这里上传了

显然，这是一个如何

unfold

从普通的旧Java 调用的示例：

import static regexp_unfolder.core.unfold;public class UnfolderExample{    public static void main(String[] args){        @SuppressWarnings("unchecked")        Iterable<String> strings = unfold("a+b+");        for (String s : strings){ System.out.println(s);        }    }}

生成正则表达式的所有有效值

面试问答相关栏目本月热门文章