• Re: Possible re bug when using ".*"

    From Roel Schroeven@21:1/5 to Alexander Richert - NOAA Affiliate on Wed Dec 28 19:59:09 2022
    Alexander Richert - NOAA Affiliate via Python-list schreef op 28/12/2022
    om 19:42:
    In a couple recent versions of Python (including 3.8 and 3.10), the following code:
    import re
    print(re.sub(".*", "replacement", "pattern"))
    yields the output "replacementreplacement".

    This behavior does not occur in 3.6.

    Which behavior is the desired one? Perhaps relatedly, I noticed that even
    in 3.6, the code
    print(re.findall(".*","pattern"))
    yields ['pattern',''] which is not what I was expecting.
    The documentation for re.sub() and re.findall() has these notes:
    "Changed in version 3.7: Empty matches for the pattern are replaced when adjacent to a previous non-empty match." and "Changed in version 3.7:
    Non-empty matches can now start just after a previous empty match."
    That's probably describes the behavior you're seeing. ".*" first matches "pattern", which is a non-empty match; then it matches the empty string
    at the end, which is an empty match but is replaced because it is
    adjacent to a non-empty match.

    Seems somewhat counter-intuitive to me, but AFAICS it's the intended
    behavior.

    --
    "Programming today is a race between software engineers striving to build bigger
    and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning."
    -- Douglas Adams

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roel Schroeven@21:1/5 to Roel Schroeven on Wed Dec 28 20:03:04 2022
    Roel Schroeven schreef op 28/12/2022 om 19:59:
    Alexander Richert - NOAA Affiliate via Python-list schreef op
    28/12/2022 om 19:42:
    In a couple recent versions of Python (including 3.8 and 3.10), the
    following code:
    import re
    print(re.sub(".*", "replacement", "pattern"))
    yields the output "replacementreplacement".

    This behavior does not occur in 3.6.

    Which behavior is the desired one? Perhaps relatedly, I noticed that even
    in 3.6, the code
    print(re.findall(".*","pattern"))
    yields ['pattern',''] which is not what I was expecting.
    The documentation for re.sub() and re.findall() has these notes:
    "Changed in version 3.7: Empty matches for the pattern are replaced
    when adjacent to a previous non-empty match." and "Changed in version
    3.7: Non-empty matches can now start just after a previous empty match." That's probably describes the behavior you're seeing. ".*" first
    matches "pattern", which is a non-empty match; then it matches the
    empty string at the end, which is an empty match but is replaced
    because it is adjacent to a non-empty match.

    Seems somewhat counter-intuitive to me, but AFAICS it's the intended behavior.
    For what it's worth, there's some discussion about this in this Github
    issue: https://github.com/python/cpython/issues/76489

    --
    "Je ne suis pas d’accord avec ce que vous dites, mais je me battrai jusqu’à
    la mort pour que vous ayez le droit de le dire."
    -- Attribué à Voltaire
    "I disapprove of what you say, but I will defend to the death your right to
    say it."
    -- Attributed to Voltaire
    "Ik ben het niet eens met wat je zegt, maar ik zal je recht om het te zeggen tot de dood toe verdedigen"
    -- Toegeschreven aan Voltaire

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alexander Richert - NOAA Affiliate@21:1/5 to All on Wed Dec 28 10:42:29 2022
    In a couple recent versions of Python (including 3.8 and 3.10), the
    following code:
    import re
    print(re.sub(".*", "replacement", "pattern"))
    yields the output "replacementreplacement".

    This behavior does not occur in 3.6.

    Which behavior is the desired one? Perhaps relatedly, I noticed that even
    in 3.6, the code
    print(re.findall(".*","pattern"))
    yields ['pattern',''] which is not what I was expecting.

    Thanks,
    Alex Richert

    --
    Alexander Richert, PhD
    *RedLine Performance Systems*

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MRAB@21:1/5 to All on Wed Dec 28 19:07:06 2022
    On 2022-12-28 18:42, Alexander Richert - NOAA Affiliate via Python-list
    wrote:
    In a couple recent versions of Python (including 3.8 and 3.10), the following code:
    import re
    print(re.sub(".*", "replacement", "pattern"))
    yields the output "replacementreplacement".

    This behavior does not occur in 3.6.

    Which behavior is the desired one? Perhaps relatedly, I noticed that even
    in 3.6, the code
    print(re.findall(".*","pattern"))
    yields ['pattern',''] which is not what I was expecting.

    It's not a bug, it's a change in behaviour to bring it more into line
    with other regex implementations in other languages.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ethan Furman@21:1/5 to MRAB on Wed Dec 28 12:37:20 2022
    On 12/28/22 11:07, MRAB wrote:
    On 2022-12-28 18:42, Alexander Richert - NOAA Affiliate via Python-list wrote:
      In a couple recent versions of Python (including 3.8 and 3.10), the
    following code:
    import re
    print(re.sub(".*", "replacement", "pattern"))
    yields the output "replacementreplacement".

    This behavior does not occur in 3.6.

    Which behavior is the desired one? Perhaps relatedly, I noticed that even
    in 3.6, the code
    print(re.findall(".*","pattern"))
    yields ['pattern',''] which is not what I was expecting.

    It's not a bug, it's a change in behaviour to bring it more into line with other regex implementations in other languages.

    The new behavior makes no sense to me, but better to be consistent with the other regex engines than not -- I still get
    thrown off by vim's regex.

    --
    ~Ethan~

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter J. Holzer@21:1/5 to MRAB on Sun Jan 1 18:47:31 2023
    On 2022-12-28 19:07:06 +0000, MRAB wrote:
    On 2022-12-28 18:42, Alexander Richert - NOAA Affiliate via Python-list wrote:
    print(re.sub(".*", "replacement", "pattern"))
    yields the output "replacementreplacement".
    [...]
    It's not a bug, it's a change in behaviour to bring it more into line with other regex implementations in other languages.

    Interesting. Perl does indeed behave that way, too. Never noticed that
    in 28 years of using it.

    hp

    --
    _ | Peter J. Holzer | Story must make more sense than reality.
    |_|_) | |
    | | | hjp@hjp.at | -- Charles Stross, "Creative writing
    __/ | http://www.hjp.at/ | challenge!"

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmOxxyoACgkQ8g5IURL+ KF1YPBAAkmrZ5ztOXn6UlancapD2D1HiwX0ozZUTyylsJ5A+UbAtSxAG3npUKNvQ FSymEQvR72a5AOd79obVdkJURcjrbQAe8niaCupOahZbCOgM/S4ZrGcxBf/ysM3X Xw+m1SG+msfylAWHIhgEMoQWHW2T/NZFfb7ytdVjIZ/e+oeUMNz4C5RNa2oT0D7L ZFzqZmw4orfz5mRrQiqNDCeTuS/vTGjvMRA+A1/Q0+/Z3r7hOFO9JVWUuKRWKgXj Ap3hlSZYrc46ulDc7zCNHnLGVJVtTWV/qhIacByQi0s+rNkOQSyekYuJxcXVUtGK rYTE8TUP3PyuggbpkrzvtH6IYgJueCZjBeJ/Fw6u+2dLaEoN1VAgTWGjBZXUDQbT h4uUK8x4G1J+xMd9L/9u0NA8vjYu0sH80dFSrZk13pXmeFE0m0gjvq92zpxuIM0L hNBNg8RjP9eEvLEanFN4dWuNmSUPjHWnFnJj0OSWzywuprA7YZ+qR0M+LMVZW4Lo qAa8+gXegpilCdKca8YoNFZ23tmEIgudp92xyOaKDYcmEO21IopDr8BuCcqD18rq GiU3Pi8IRQanFRdrEGC6mJAIUJXpOTStNtuWxIc9N2MZg2x7LNYs2BxM9Kdip82s 4R/cPmDDV0n9S7mUu7RY/ytAb1PTTJIWBbR3LhU