基于上次写的博客:https://www.imooc.com/search/article?words=迷之语法
补充两个初学时容易遗漏,但是对于实际应用Regex特别重要的知识点。
下面看几个Demo:
import re line = "boooooobby123" regex_str = ".*(b.*b).*" match_obj = re.match(regex_str,line) if match_obj: print (match_obj.group(0)) print (match_obj.group(1))
结果:
boooooobbbby123
bb
import re line = "boooooobby123" regex_str = ".*?(b.*b).*" match_obj = re.match(regex_str,line) if match_obj: print (match_obj.group(0)) print (match_obj.group(1))
结果:
boooooobby123
boooooobb
line = "booooooby123" regex_str = ".*?(b.*b).*" match_obj = re.match(regex_str,line) if match_obj: print (match_obj.group(0)) print (match_obj.group(1))
结果:
boooooobby123
boooooob
结果和贪婪算法有关,贪婪算法同时从两个方向(左,右)看匹配结果
取消贪婪的方式,匹配到第一个b就返回:
import re line = "boooooobbbby123" regex_str = ".*?(b.*?b).*" match_obj = re.match(regex_str,line) if match_obj: print (match_obj.group(0)) print (match_obj.group(1))
结果:
boooooobbbby123
boooooob
import re line = "boooooobbbby123" regex_str = ".*?(b.*b)?.*" match_obj = re.match(regex_str,line) if match_obj: print (match_obj.group(0)) print (match_obj.group(1))
结果:
boooooobbbby123
boooooobbbb
import re line = "boooooobbbby123" regex_str = ".*?(b.*b?).*" match_obj = re.match(regex_str,line) if match_obj: print (match_obj.group(1))
结果:boooooobbbby123
贪婪的思想:把满足匹配条件的字符串逐个都匹配一遍,返回最后一次匹配的结果- 举例:
import re line = "boooooobbbby123" regex_str = ".*(b.+b).*" match_obj = re.match(regex_str,line) if match_obj: print (match_obj.group(1))
结果:bbb (因为最后一个满足条件的字符串是bbb)
import re line = "boooooobbbbbaby123" regex_str = ".*(b.+b).*" match_obj = re.match(regex_str,line) if match_obj: print (match_obj.group(1))
结果:bab (因为最后一个满足条件的字符串是bab,前一个满足条件的字符串是bbb)
中括号[ ]三大用途:1.表示’或’关系,满足其中条件之一就能匹配成功
2.定义区间[0-9] ,{ }:定义出现次数, 比如:d{1,2} 出现1~2个数字,可以匹配日期中的月份,如’06’和’6’
3.取消特殊含义的字符(小数点等)代表的意义,比如[. ]中的小数点,不再代表"匹配任意字符,除了换行符",
而代表单纯的小数点(可以用代码验证一下)
提取汉字:[u4E00-u9FA5]
import re line = "study in 南京大学" regex_str = ".*([u4E00-u9FA5]+大学)" match_obj = re.match(regex_str,line) if match_obj: print (match_obj.group(1))
结果:京大学
出现上面结果的原因:受到了贪婪匹配的影响,匹配的第一个结果是"南",第二个结果是"京",
返回最后一个结果"京",在条件的前面加一个"?",取消贪婪
import re line = "study in 南京大学" regex_str = ".*?([u4E00-u9FA5]+大学)" #.*的作用:忽视掉前面的字符 match_obj = re.match(regex_str,line) if match_obj: print (match_obj.group(1))
结果:南京大学
参考教程:https://coding.imooc.com/lesson/92.html#mid=2844



