• Re: Did Python 3.12 developers honestly broke special regexp sequences?

    From Simon McVittie@21:1/5 to Andreas Tille on Tue Feb 13 18:40:02 2024
    On Tue, 13 Feb 2024 at 18:21:17 +0100, Andreas Tille wrote:
    SyntaxWarning: invalid escape sequence '\.'
    573s CLI_INPUT_RE = re.compile('[a-zA-Z0-9_:\.\-\+; /#%]')

    This should be:

    re.compile(r'[a-zA-Z0-9_:\.\-\+; /#%]')
    ^

    a raw string, where the backslashes are not interpreted by the
    Python parser, allowing them to be passed through to the re module for
    parsing; or alternatively

    re.compile('[a-zA-Z0-9_:\\.\\-\\+; /#%]')
    ^^ ^^ ^^

    like you would have to write in the C equivalent.

    Reference:

    """
    Regular expressions use the backslash character ('\') to indicate
    special forms or to allow special characters to be used without
    invoking their special meaning. This collides with Python’s usage
    of the same character for the same purpose in string literals;
    for example, to match a literal backslash, one might have to write
    '\\\\' as the pattern string, because the regular expression must
    be \\, and each backslash must be expressed as \\ inside a regular
    Python string literal. Also, please note that any invalid escape
    sequences in Python’s usage of the backslash in string literals
    now generate a SyntaxWarning and in the future this will become a
    SyntaxError. This behaviour will happen even if it is a valid escape
    sequence for a regular expression.

    The solution is to use Python’s raw string notation for regular
    expression patterns; backslashes are not handled in any special way
    in a string literal prefixed with 'r'. So r"\n" is a two-character
    string containing '\' and 'n', while "\n" is a one-character string
    containing a newline. Usually patterns will be expressed in Python
    code using this raw string notation.
    """
    —re module docs

    which makes me scratching my head what else we should write
    for "any kind of space" now in Python3.12.

    \s continues to be correct for "any kind of space", but Python now
    complains if you do the backslash-escapes in the wrong layer of syntax.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)