Java正则表达式的学习使用

正则表达式 @author: kayleigh

package com.shujia.java.test;


public class RegularDemo1 {
    public static void main(String[] args) {
        String s = "023w2341232";
        System.out.println(checkQQ(s));

    }

    public static boolean checkQQ(String qq){
        boolean flag = false;

        //必须是5-10位的
        if(qq.length() >=5 && qq.length()<=10){
            //保证不是0开头
            if(!qq.startsWith("0")){
                flag = true;

                //必须都是数字
                char[] chars = qq.toCharArray();
                for (char aChar : chars) {
                    if(!Character.isDigit(aChar)){
                        flag = false;
                        break;
                    }
                }

            }

        }


        return flag;
    }

    public static boolean checkQQByRegular(String s){

        //使用正则表达式可以很容易地完成字符串地查找、匹配替换地工作
        //正则表达式
        String regex = "[1-9][0-9][4,9]";
        return s.matches(regex);

    }
}

学习正则表达式的目的：通过正则表达式处理了字符串复杂的查找/替换/匹配/分割工作

1001,xiaohu,18,14

json, xml, html

正则表达式是一个独立于java的技术，不依附于java,它可以在java中使用，也可以在python/JS等中去使用

正则表达式的概述

概念：使用单个字符串来描述或者描述/匹配一系列符合某个语法规则的字符串

使用步骤：

1、通过大量的字符串你找规律定义规则

2、使用这种规则去匹配新的字符串

3、匹配成功做出相应的操作

1165872335@qq.com

正则表达式的语法规则 1、原义字符

字符本身就是一个正则

a b c t n r f

public class RegularDemo2 {
    public static void main(String[] args) {
        //原义字符
        String regex = "a";
        String str = "ab123241dasdasw&;123.";
        System.out.println(str.replaceAll(regex,"X"));

    }
}

输出结果：

Xb123241dXsdXsw&;123.

2、元字符

字符	描述
	将下一个字符标记为一个特殊字符、或一个原义字符、或一个向后引用、或一个八进制转义符。例如，‘n’ 匹配字符 “n”。’n’ 匹配一个换行符。序列 ‘’ 匹配 “” 而 “(” 则匹配 “(”。
^	匹配输入字符串的开始位置。如果设置了 RegExp 对象的 Multiline 属性，^ 也匹配 ‘n’ 或 ‘r’ 之后的位置。
$	匹配输入字符串的结束位置。如果设置了RegExp 对象的 Multiline 属性，$ 也匹配 ‘n’ 或 ‘r’ 之前的位置。
*	匹配前面的子表达式零次或多次。例如，zo* 能匹配 “z” 以及 “zoo”。* 等价于{0,}。
+	匹配前面的子表达式一次或多次。例如，‘zo+’ 能匹配 “zo” 以及 “zoo”，但不能匹配 “z”。+ 等价于 {1,}。
?	匹配前面的子表达式零次或一次。例如，“do(es)?” 可以匹配 “do” 或 “does” 。? 等价于 {0,1}。
{n}	n 是一个非负整数。匹配确定的 n 次。例如，‘o{2}’ 不能匹配 “Bob” 中的 ‘o’，但是能匹配 “food” 中的两个 o。
{n,}	n 是一个非负整数。至少匹配n 次。例如，‘o{2,}’ 不能匹配 “Bob” 中的 ‘o’，但能匹配 “foooood” 中的所有 o。‘o{1,}’ 等价于 ‘o+’。‘o{0,}’ 则等价于 ‘o*’。
{n,m}	m 和 n 均为非负整数，其中n <= m。最少匹配 n 次且最多匹配 m 次。例如，“o{1,3}” 将匹配 “fooooood” 中的前三个 o。‘o{0,1}’ 等价于 ‘o?’。请注意在逗号和两个数之间不能有空格。
?	当该字符紧跟在任何一个其他限制符 (*, +, ?, {n}, {n,}, {n,m}) 后面时，匹配模式是非贪婪的。非贪婪模式尽可能少的匹配所搜索的字符串，而默认的贪婪模式则尽可能多的匹配所搜索的字符串。例如，对于字符串 “oooo”，‘o+?’ 将匹配单个 “o”，而 ‘o+’ 将匹配所有 ‘o’。
.	匹配除换行符（n、r）之外的任何单个字符。要匹配包括 ‘n’ 在内的任何字符，请使用像"(.\|n)"的模式。
(pattern)	匹配 pattern 并获取这一匹配。所获取的匹配可以从产生的 Matches 集合得到，在VBscript 中使用 SubMatches 集合，在Jscript 中则使用 $0…$9 属性。要匹配圆括号字符，请使用 ‘(’ 或 ‘)’。
(?:pattern)	匹配 pattern 但不获取匹配结果，也就是说这是一个非获取匹配，不进行存储供以后使用。这在使用 “或” 字符 (\|) 来组合一个模式的各个部分是很有用。例如， 'industr(?:y\|ies) 就是一个比 ‘industry\|industries’ 更简略的表达式。
(?=pattern)	正向肯定预查（look ahead positive assert），在任何匹配pattern的字符串开始处匹配查找字符串。这是一个非获取匹配，也就是说，该匹配不需要获取供以后使用。例如，“Windows(?=95\|98\|NT\|2000)“能匹配"Windows2000"中的"Windows”，但不能匹配"Windows3.1"中的"Windows”。预查不消耗字符，也就是说，在一个匹配发生后，在最后一次匹配之后立即开始下一次匹配的搜索，而不是从包含预查的字符之后开始。
(?!pattern)	正向否定预查(negative assert)，在任何不匹配pattern的字符串开始处匹配查找字符串。这是一个非获取匹配，也就是说，该匹配不需要获取供以后使用。例如"Windows(?!95\|98\|NT\|2000)“能匹配"Windows3.1"中的"Windows”，但不能匹配"Windows2000"中的"Windows"。预查不消耗字符，也就是说，在一个匹配发生后，在最后一次匹配之后立即开始下一次匹配的搜索，而不是从包含预查的字符之后开始。
(?<=pattern)	反向(look behind)肯定预查，与正向肯定预查类似，只是方向相反。例如，"(?<=95\|98\|NT\|2000)Windows“能匹配”2000Windows“中的”Windows"，但不能匹配"3.1Windows“中的”Windows"。
(?	反向否定预查，与正向否定预查类似，只是方向相反。例如"(?
x\|y	匹配 x 或 y。例如，‘z\|food’ 能匹配 “z” 或 “food”。’(z\|f)ood’ 则匹配 “zood” 或 “food”。
[xyz]	字符集合。匹配所包含的任意一个字符。例如， ‘[abc]’ 可以匹配 “plain” 中的 ‘a’。
[^xyz]	负值字符集合。匹配未包含的任意字符。例如， ‘[^abc]’ 可以匹配 “plain” 中的’p’、‘l’、‘i’、‘n’。
[a-z]	字符范围。匹配指定范围内的任意字符。例如，’[a-z]’ 可以匹配 ‘a’ 到 ‘z’ 范围内的任意小写字母字符。
[^a-z]	负值字符范围。匹配任何不在指定范围内的任意字符。例如，’[^a-z]’ 可以匹配任何不在 ‘a’ 到 ‘z’ 范围内的任意字符。
b	匹配一个单词边界，也就是指单词和空格间的位置。例如， ‘erb’ 可以匹配"never" 中的 ‘er’，但不能匹配 “verb” 中的 ‘er’。
B	匹配非单词边界。‘erB’ 能匹配 “verb” 中的 ‘er’，但不能匹配 “never” 中的 ‘er’。
cx	匹配由 x 指明的控制字符。例如， cM 匹配一个 Control-M 或回车符。x 的值必须为 A-Z 或 a-z 之一。否则，将 c 视为一个原义的 ‘c’ 字符。
d	匹配一个数字字符。等价于 [0-9]。
D	匹配一个非数字字符。等价于 [^0-9]。
f	匹配一个换页符。等价于 x0c 和 cL。
n	匹配一个换行符。等价于 x0a 和 cJ。
r	匹配一个回车符。等价于 x0d 和 cM。
s	匹配任何空白字符，包括空格、制表符、换页符等等。等价于 [ fnrtv]。
S	匹配任何非空白字符。等价于 [^ fnrtv]。
t	匹配一个制表符。等价于 x09 和 cI。
v	匹配一个垂直制表符。等价于 x0b 和 cK。
w	匹配字母、数字、下划线。等价于’[A-Za-z0-9_]’。
W	匹配非字母、数字、下划线。等价于 ‘[^A-Za-z0-9_]’。
xn	匹配 n，其中 n 为十六进制转义值。十六进制转义值必须为确定的两个数字长。例如，’x41’ 匹配 “A”。’x041’ 则等价于 ‘x04’ & “1”。正则表达式中可以使用 ASCII 编码。
num	匹配 num，其中 num 是一个正整数。对所获取的匹配的引用。例如，’(.)1’ 匹配两个连续的相同字符。
n	标识一个八进制转义值或一个向后引用。如果 n 之前至少 n 个获取的子表达式，则 n 为向后引用。否则，如果 n 为八进制数字 (0-7)，则 n 为一个八进制转义值。
nm	标识一个八进制转义值或一个向后引用。如果 nm 之前至少有 nm 个获得子表达式，则 nm 为向后引用。如果 nm 之前至少有 n 个获取，则 n 为一个后跟文字 m 的向后引用。如果前面的条件都不满足，若 n 和 m 均为八进制数字 (0-7)，则 nm 将匹配八进制转义值 nm。
nml	如果 n 为八进制数字 (0-3)，且 m 和 l 均为八进制数字 (0-7)，则匹配八进制转义值 nml。
un	匹配 n，其中 n 是一个用四个十六进制数字表示的 Unicode 字符。例如， u00A9 匹配版权符号 (?)。

字符类：

package com.shujia.wyh.day21;


public class RegularDemo3 {
    public static void main(String[] args) {
        //表示格式：[]
        //[]表示将字符进行归类，可以匹配出现在中括号中的任意一个字符
        //只要被匹配的字符串中存在a,b,2中任何一个，都会被替换
        String regex = "[ab2]";
        String s = "ab123241dasdasw&;123.";
        System.out.println(s.replaceAll(regex, "_"));

        //^出现在中括号中，代表的意思式取反，对不是ab2的字符进行匹配
        regex = "[^ab2]";
        System.out.println(s.replaceAll(regex, "+"));


    }
}

输出结果：

__1_3_41d_sd_sw&;1_3.
ab+2+2+++a++a+++++2++

范围类：

其实实在字符类的基础之上增加了范围

package com.shujia.wyh.day21;


public class RegularDemo4 {
    public static void main(String[] args) {
        String regex = "[ab]";
        String s = "abcdefghijklmnBV1232QWE41dasdasw&;123.";
        System.out.println("匹配之前：n" + s);
        System.out.println("====================================");
        System.out.println(s.replaceAll(regex, "_"));

        //[a-z]表示的是匹配a到z中的任意一个小写字母
        regex = "[a-z]";
        System.out.println(s.replaceAll(regex, "_"));

        //[A-Z]表示的是匹配a到z中的任意一个大写字母
        regex = "[A-Z]";
        System.out.println(s.replaceAll(regex, "+"));

        //既想匹配小写又想匹配大写
        regex = "[a-zA-Z]";
        System.out.println();
        System.out.println(s.replaceAll(regex, "#"));

        //想匹配数字咋办
        regex = "[0-9]";
        System.out.println(s.replaceAll(regex, "_"));

        //既想匹配小写又想匹配大写和数字
        regex = "[a-zA-Z0-9&;.]";
        System.out.println(s.replaceAll(regex, "_"));


    }
}

输出结果：

匹配之前：
abcdefghijklmnBV1232QWE41dasdasw&;123.
====================================
__cdefghijklmnBV1232QWE41d_sd_sw&;123.
______________BV1232QWE41_______&;123.
abcdefghijklmn++1232+++41dasdasw&;123.

################1232###41#######&;123.
abcdefghijklmnBV____QWE__dasdasw&;___.
______________________________________

预定义类：

我们在上面的范围类的情况下我们在实际开发中可能会遇到一些常见的需求，比如：判断是否是数字、小写字母、大写字母等这些情况，对应的范围类的正则会比较长，所以在正则表达式中会给我们预定一些有特殊含义的表达式，正则表达式把我们常见的整理了一下

d == [0-9] 数字
D == [^0-9] 非数字
空白字符：[tnx0Bfr] == s
[^ tnx0Bfr] == S
w == [a-zA-Z0-9]
W == [^a-zA-Z0-9]
. 任何字符（与行结束符可能匹配也可能不匹配）

package com.shujia.wyh.day21;


public class RegularDemo5 {
    public static void main(String[] args) {
        String regex = "[0-9]";
        String s = "abcdefghijklmnB V1232Q WE 41dasdasw&;123.";
        System.out.println("匹配之前：n" + s);
        System.out.println("====================================");
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\d"; //[0-9] 匹配所有的数字
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\D"; //匹配所有非数字的字符
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\s"; //匹配所有空白字符
        System.out.println(s.replaceAll(regex, "!"));

        regex = "\S"; //匹配所有非空白字符
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\w"; //匹配所有的大小写字母和数字
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\W"; //匹配所有的非大小写字母和数字
        System.out.println(s.replaceAll(regex, "_"));

        regex = "."; //匹配任何字符
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\.";
        System.out.println(s.replaceAll(regex, "_"));
    }
}

输出结果：

匹配之前：
abcdefghijklmnB V1232Q WE 41dasdasw&;123.
====================================
abcdefghijklmnB V____Q WE __dasdasw&;___.
abcdefghijklmnB V____Q WE __dasdasw&;___.
_________________1232_____41_________123_
abcdefghijklmnB!V1232Q!WE!41dasdasw&;123.
_______________ ______ __ _______________
_______________ ______ __ _________&;___.
abcdefghijklmnB_V1232Q_WE_41dasdasw__123_
_________________________________________
abcdefghijklmnB V1232Q WE 41dasdasw&;123_

边界字符：

^:以xxx开头
$:以xxx结尾
b:单词边界
B:非单词边界

package com.shujia.wyh.day21;


public class RegularDemo6 {
    public static void main(String[] args) {
        //在没有中括号的时候，^代表的是以什么开头
        String regex = "^ac";
        String s = "abcdefg";
        System.out.println("匹配之前：n" + s);
        System.out.println("====================================");
        System.out.println(s.replaceAll(regex, "_"));

        regex = "fg$";
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\b";
        s = "Hello World 888 1 2 & ; 0 a b c d";
        System.out.println(s.replaceAll(regex, "_"));

        regex = "\B";
        System.out.println(s.replaceAll(regex, "_"));

    }
}

输出结果：

匹配之前：
abcdefg
====================================
abcdefg
abcde_
_Hello_ _World_ _888_ _1_ _2_ & ; _0_ _a_ _b_ _c_ _d_
H_e_l_l_o W_o_r_l_d 8_8_8 1 2 _&_ _;_ 0 a b c d

量词:

?:出现0次或者1次
+:出现1次或者多次
*:出现任意次
{n}:出现正好n次
{n,m}:出现n-m次
{n,}出现至少n次

package com.shujia.wyh.day21;


public class RegularDemo7 {
    public static void main(String[] args) {
        //匹配以a开头的0次或者1次
//        String regex = "^a?";
        String regex = "^b?";
        String s = "aaaaabcdefaaaaaag";
        System.out.println("匹配之前：n" + s);
        System.out.println("====================================");
        System.out.println(s.replaceAll(regex, "_"));

        regex = "^a+";
        System.out.println(s.replaceAll(regex, "_"));

        regex = "^b+";
        System.out.println(s.replaceAll(regex, "_"));

        regex = "^a*";
        System.out.println(s.replaceAll(regex, "_"));

        regex = "^b*";
        System.out.println(s.replaceAll(regex, "_"));

        //{n}:出现正好n次
        //匹配a连续出现了6次
        //aaaaabcdefaaaaaag
        regex = "a{6}";
//        s = "aaaaabcdefaaaaaag";
        System.out.println(s.replaceAll(regex, "_"));

        regex = "a{3}";
        System.out.println(s.replaceAll(regex, "_"));

        //{n,m}:出现n-m次
        regex = "^a{3,4}";
        System.out.println(s.replaceAll(regex, "_"));

        //{n,}出现至少n次
        regex = "a{6,}";
        System.out.println(s.replaceAll(regex, "_"));

    }
}

输出结果：

匹配之前：
aaaaabcdefaaaaaag
====================================
_aaaaabcdefaaaaaag
_bcdefaaaaaag
aaaaabcdefaaaaaag
_bcdefaaaaaag
_aaaaabcdefaaaaaag
aaaaabcdef_g
_aabcdef__g
_abcdefaaaaaag
aaaaabcdef_g

分组：

分组符号为 “( )”

package com.shujia.wyh.day21;


public class RegularDemo8 {
    public static void main(String[] args) {
        //表示匹配的是ab加上n个c
        String regex = "abc{1,2}";
        String s =  "abccccABC123123baCBAabcccccABCabcabcabc123";
        System.out.println("匹配之前：n" + s);
        System.out.println("====================================");
        System.out.println(s.replaceAll(regex, "_"));

        //表示匹配abc这个整体出现了1次到2次
        regex = "(abc){1,2}";
        System.out.println(s.replaceAll(regex, "+"));

        regex = "ABC(123|abc){1,}";
        System.out.println(s.replaceAll(regex, "+"));

        System.out.println(s.matches(regex)); //false


    }
}

输出结果：

匹配之前：
abccccABC123123baCBAabcccccABCabcabcabc123
====================================
_ccABC123123baCBA_cccABC___123
+cccABC123123baCBA+ccccABC++123
abcccc+baCBAabccccc+
false

反向引用：

package com.shujia.wyh.day21;


public class RegularDemo9 {
    public static void main(String[] args) {
        //2018-04-27 ---> 04/27/2018
        String regex = "(\d{4})-(\d{2})-(\d{2})";
        String str = "2018-04-27 2021-12-17";
        System.out.println(str.replaceAll(regex, "$2/$3/$1"));

        //2018-04-27 ---> 04/27/2018
        //分组中如果不想要生成分组编号，可以通过?:来实现
        regex = "(\d{4})-(?:\d{2})-(\d{2})";
        str = "2018-04-27 2021-12-17";
        System.out.println(str.replaceAll(regex, "$2/$1"));
    }
}

输出结果：

04/27/2018 12/17/2021
27/2018 17/2021

正则表达式在java中的应用

在java中是如何使用正则表达式来实现相关操作的？

1、字符串的查找操作：Pattern和Matcher

2、字符串匹配操作：可以用该字符串的matchers方法

3、字符串的替换操作：字符串的replaceAll()和replaceFirst()方法

4、字符串的分割：字符串的split()方法

package com.shujia.java.day27.regulardemos;

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class ReDemo10 {
    public static void main(String[] args) {
        String regex = "\w{3,}";
        String str = "abcd123";

        System.out.println(str.matches(regex));

        regex = "[a-z]{2,}";
        str = "abc efgsd hello111";
        System.out.println(str.replaceAll(regex,"X"));
        System.out.println(str.replaceFirst(regex,"X"));

        str = "abc,sdf 123ab sa123bds & ";
        String[] split = str.split(",");
        System.out.println(Arrays.toString(split));

        regex = "[as1]";
        String[] split2 = str.split(regex);
        System.out.println(Arrays.toString(split2));

        //Pattern API

        regex = "\w{3,7}";
        Pattern compile = Pattern.compile(regex);
        Matcher matcher = compile.matcher("abcd123aaaa112321dddd333");
//        System.out.println(matcher.matches());
        System.out.println(matcher.find());
        System.out.println(matcher.end());
        System.out.println(matcher.group());

        System.out.println(matcher.find());
        System.out.println(matcher.end());
        System.out.println(matcher.group());
    }

}

正则表达式验证的网站：

https://regexper.com/

正则表达式练习

一、治疗口吃。

将字符串“我我我我我我我…我…要要要要要…要要要要…学习习习习…习习习习习习习习编程程程程程程…程程程程程程程程程”变成“我要学习编程”

分析：先将…去掉：使用".+“模式，再将叠词替换成一个：使用”(.)1+"模式。

package p02.Exercise;

public class Demo01 {
    public static void main(String args[])
    {
        String str="我我我我我我我..........我.......要要要要要..................要要要要...学习习习习.......习习习习习习习习编程程程程程程.......程程程程程程程程程";
        //1.先去掉.
        String regex="\.+";
        str=str.replaceAll(regex, "");
//        System.out.println(str);
        //2.合并叠词。
        regex="(.)\1+";
        str=str.replaceAll(regex, "$1");
        System.out.println(str);
    }
}

2、枚举类型

类的对象只有有限个，确定的，我们就可以称之为枚举

星期：Monday(星期一)…Sunday(星期天)

性别：Man(男)，Woman(女)

季节：Spring（春天）…winter(冬天)

就职状态。。。

当需要定义一组常量的时候,强烈建议使用枚举

JDK1.5之前：自定义枚举

package com.shujia.java.day27.regulardemos;


public class EnumDemo1 {
    public static void main(String[] args) {
        Season spring = Season.SPRING;
        System.out.println(spring);
    }
}


class Season{
    //2、创建Season的属性，常量处理
    private final String SEASON_NAME;
    private final String SEASON_DESC;


    //1、要保证类的对象的个数是有限的
    //那么我们必须要私有构造方法
    private Season(String SEASON_NAME,String SEASON_DESC){
        this.SEASON_NAME = SEASON_NAME;
        this.SEASON_DESC = SEASON_DESC;
    }

    //3、提供公共的静态的方法给外界获取枚举类的多个对象
    public static final Season SPRING = new Season("春天", "万物复苏");
    public static final Season SUMMER = new Season("夏天", "万物复苏2");
    public static final Season AUTUMN = new Season("秋天", "万物复苏3");
    public static final Season WINTER = new Season("冬天", "万物复苏4");

    //4、提供SEASON_NAME和SEASON_DESC的get方法
    public String getSEASON_NAME() {
        return SEASON_NAME;
    }

    public String getSEASON_DESC() {
        return SEASON_DESC;
    }

    //5、重写toString()

    @Override
    public String toString() {
        return "Season{" +
                "SEASON_NAME='" + SEASON_NAME + ''' +
                ", SEASON_DESC='" + SEASON_DESC + ''' +
                '}';
    }
}

JDK1.5之后：通过关键字enum定义枚举类

package com.shujia.java.day27.regulardemos;


public class EnumDemo2 {
    public static void main(String[] args) {
        Season2 spring = Season2.SPRING;
        System.out.println(spring);
        System.out.println(Season2.class.getSuperclass());
    }
}


enum Season2{

    //3、枚举有的有限个对象，对象之间通过逗号连接，最后一个分号结尾
    //枚举相关的放在头部
    SPRING("春天", "万物复苏"),
    SUMMER("夏天", "万物复苏2"),
    AUTUMN("秋天", "万物复苏3"),
    WINTER("冬天", "万物复苏4");

    //2、创建Season2的属性，常量处理
    private final String SEASON_NAME;
    private final String SEASON_DESC;


    //1、要保证类的对象的个数是有限的
    //那么我们必须要私有构造方法
    private Season2(String SEASON_NAME,String SEASON_DESC){
        this.SEASON_NAME = SEASON_NAME;
        this.SEASON_DESC = SEASON_DESC;
    }


    //4、提供SEASON_NAME和SEASON_DESC的get方法
    public String getSEASON_NAME() {
        return SEASON_NAME;
    }

    public String getSEASON_DESC() {
        return SEASON_DESC;
    }

    //5、重写toString()

//    @Override
//    public String toString() {
//        return "Season{" +
//                "SEASON_NAME='" + SEASON_NAME + ''' +
//                ", SEASON_DESC='" + SEASON_DESC + ''' +
//                '}';
//    }
}

枚举类可以实现接口

1、直接在枚举类实现接口中的抽象方法

2、在每个枚举对象中实现

Java正则表达式的学习使用

Java相关栏目本月热门文章