python re.sub替换不全的原因分析

  • Post author:
  • Post category:python


re.sub函数的定义

def sub(pattern, repl, string, count=0, flags=0):
    """Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a string, backslash escapes in it are processed.  If it is
    a callable, it's passed the Match object and must return
    a replacement string to be used."""
    return _compile(pattern, flags).sub(repl, string, count)

参数count、flags都为int型

查看re.S的定义为

DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline

S = DOTALL

SRE_FLAG_DOTALL = 16 # treat target as a single string

最终对应一个数值为16的int型变量

所以在使用中未指明参数名称,如:re.sub(r’a’,’’,re.S)

会导致re.S生效的同时,count参数被置为16,因为count的位置为3

示例:

content='aaaaaaaaaaaaaaaaaaaa'
content=re.sub(r'a','',content,re.S)
print(content)

输出:
aaaa

发现有16个a被替换,而re.S对应的定义恰恰是16

再看看,re.S匹配换行符的功能有没有生效

content1='aaaaaaaaaaaaaa\naaaaaa'
content1=re.sub(r'.*?','x',content1,re.S)
print(content1)

输出:
xxxxxxxxxxxxxxxxaaaaaa
aaaaaa

发现re.S原本的功能失效


content1='aaaaaaaaaaaaaa\naaaaaa'
content1=re.sub(r'.*?','x',content1,flags=re.S)
print(content1)

输出:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

解决方法:

在使用时指明参数名称就行

re.sub默认替换全部



版权声明:本文为az9996原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。