python——正则表达式详解(一)

python的正则表达式用途很广泛，常用于数据处理，下面将一一进行讲解。

元字符：

字符	作用
.	可以匹配除了换行符(n)以外的所有单个字符
*	匹配单个字符0次或多次
+	匹配单个字符1次或多次
?	匹配单个字符0次或1次
{}	{n}表示匹配n个字符，{n,m}表示匹配n～m个字符
[]	[]表示集合，如[0-9a-zA-Z]表示匹配数字小写字母和大写字母
^	表示匹配以某元素开头，该字符在[]中如[^0-2]，表示不包含0-2
$	表示匹配以某元素结尾
	转义特殊字符，或者表示一个特殊序列
()	匹配小括号的内容，(xyz)作为一个整体去匹配
竖线	表示或，匹配的是x或y

以下代码进行示例：

import re

"""
.的作用：可以匹配除了换行符(n)以外的所有单个字符
"""
# egg_002 匹配字符接
test_str = "hello python hello jython"
pattern = re.compile(r".ython")
matcher = pattern.findall(test_str)
print(matcher)

"""
* 的作用：匹配单个字符0次或多次
"""
test_str = "ct cat caat caaat caaaat"
pattern = re.compile(r"ca*t")
matcher = pattern.findall(test_str)
print(matcher)

"""
+ 的作用：匹配单个字符1次或多次
"""
test_str = "ct cat caat caaat caaaat"
pattern = re.compile(r"ca+t")
matcher = pattern.findall(test_str)
print(matcher)

"""
? 的作用：匹配单个字符0次或1次
"""
test_str = "ct cat caat caaat caaaat"
pattern = re.compile(r"ca?t")
matcher = pattern.findall(test_str)
print(matcher)

"""
{} 的作用：{n}表示匹配n个字符，{n,m}表示匹配n～m个字符
"""
test_str = "ct cat caat caaat caaaat"
pattern = re.compile(r"ca{2,4}t")
matcher = pattern.findall(test_str)
print(matcher)

"""
[] 的作用：[]表示集合，如[0-9a-zA-Z]表示匹配数字小写字母和大写字母
"""
test_str = "hello python hello jython"
pattern = re.compile(r"[pj]ython")
matcher = pattern.findall(test_str)
print(matcher)

"""
^ 的作用：表示匹配以某元素开头，该字符在[]中如[^0-2]，表示不包含0-2
"""
test_str = "look"
pattern = re.compile(r"[^b].+")
matcher = pattern.findall(test_str)
print(matcher)

"""
$ 的作用：表示匹配以某元素结尾
"""
test_str = "python"
pattern = re.compile(r".+n$")
matcher = pattern.findall(test_str)
print(matcher)

"""
 的作用：转义特殊字符，或者表示一个特殊序列
"""
test_str = "how are you ?"
pattern = re.compile(r"?")
matcher = pattern.findall(test_str)
print(matcher)

"""
() 的作用：匹配小括号的内容，(xyz)作为一个整体去匹配
"""
test_str = "123ABC123ABC"
pattern = re.compile(r"(123)")
matcher = pattern.findall(test_str)
print(matcher)

"""
| 的作用：表示或，x|y匹配的是x或y
"""
test_str = "AAABBBCCCDDD"
pattern = re.compile(r"A+|C+")
matcher = pattern.findall(test_str)
print(matcher)

转义字符

字符	作用
d	匹配数字，效果同[0-9]
D	匹配非数字，效果同[^0-9]
w	匹配数字，字母和下划线，效果同[0-9a-zA-Z]
W	匹配非数字，字母和下划线，效果同[^0-9a-zA-Z]
s	匹配任意的空白[ fnrt]
S	匹配任意的非空白[^ fnrt]

以下代码进行示例：

import re

"""
d 的作用：匹配数字，效果同[0-9]
"""
test_str = "123ABC123ABC"
pattern = re.compile(r"d+")
matcher = pattern.findall(test_str)
print(matcher)

"""
D 的作用：匹配非数字，效果同[^0-9]
"""
test_str = "123ABC123ABC"
pattern = re.compile(r"D+")
matcher = pattern.findall(test_str)
print(matcher)

"""
w 的作用：匹配数字，字母和下划线，效果同[0-9a-zA-Z]
"""
test_str = "123ABC——123ABC"
pattern = re.compile(r"w+")
matcher = pattern.findall(test_str)
print(matcher)

"""
W 的作用：匹配非数字，字母和下划线，效果同[^0-9a-zA-Z]
"""
test_str = "123ABC——123ABC"
pattern = re.compile(r"W+")
matcher = pattern.findall(test_str)
print(matcher)

"""
s 的作用：匹配任意的空白[ fnrt]
"""
test_str = "123ABCn123ABC"
pattern = re.compile(r"s+")
matcher = pattern.findall(test_str)
print(matcher)

"""
S 的作用：匹配任意的非空白[^ fnrt]
"""
test_str = "123ABCn123ABC"
pattern = re.compile(r"S+")
matcher = pattern.findall(test_str)
print(matcher)

前瞻：expl(?=ex2) 查找exp2前面的expl
后顾：(?<=exp2)expl 查找exp2后面的expl

import re

"""
前瞻：expl(?=ex2)     查找exp2前面的expl
后顾：(?<=exp2)expl   查找exp2后面的expl
"""
test_str = "hello python"
# pattern = re.compile("[w ]+
") # 结果为：hello python
# pattern = re.compile("[w ]+(?=
)") # 结果为：hello python
# pattern = re.compile("(?<=
)[w ]+
")  # 结果为：hello python
pattern = re.compile("(?<=).+(?=)")  # 结果为：hello python
matcher = pattern.findall(test_str)
print(matcher)

贪婪模式：尽可能多的匹配，比较模糊
懒惰模式：尽可能少的匹配，比较精确

import re

"""
贪婪模式：尽可能多的匹配，比较模糊
懒惰模式：尽可能少的匹配，比较精确
"""
# 默认为贪婪模式
test_str = "hello python
"
pattern = re.compile("(?<=).+(?=)")
matcher = pattern.findall(test_str)
print(matcher) # 结果为：['hello python
']

# 懒惰模式
test_str = "hello python
"
pattern = re.compile("(?<=).+?(?=)")
matcher = pattern.findall(test_str)
print(matcher)  # 结果为：['hello python']

捕获组：()捕获到返回
非捕获组：(?:)捕获到不返回

import re

"""
捕获组：()捕获到返回
非捕获组：(?:)捕获到不返回
"""
test_str = "123@qq.com123@163.com123@126.com"
pattern = re.compile(r"w+@(qq|163|126).com") # 捕获组，结果为：['qq', '163', '126']
# pattern = re.compile(r"w+@(?:qq|163|126).com") # 非捕获组，结果为：['123@qq.com', '123@163.com', '123@126.com']
matcher = pattern.findall(test_str)
print(matcher)

以上是python关于正则表达式的讲解之一，后续会继续编写正则表达式的函数使用方法以及经典案例，有疑问的欢迎评论或私信博主啊。

python——正则表达式详解(一)

Python相关栏目本月热门文章