• regex

    From George Bouras@21:1/5 to All on Wed Apr 7 13:12:52 2021
    spaces inside the <...> to _
    e.g.
    "add of <Number A> and <Number B> = " . ( <Number A> + <Number B>
    to
    "add of <Number_A> and <Number_B> = " . ( <Number_A> + <Number_B>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Bouras@21:1/5 to All on Wed Apr 7 13:57:20 2021
    Στις 7/4/2021 1:12 μ.μ., ο/η George Bouras έγραψε:

    this loosks ok , any better idea ?



    my $var = '"add of <Number A> and <Number B> = " . ( <Number A> +
    <Number B> )';
    my $tmp = '';

    say $var;
    $var =~s/<([^>]+)(?{ $tmp=$^N; $tmp=~s|\W+|_|g })>/<$tmp>/g;
    say $var;

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to George Bouras on Wed Apr 7 12:45:02 2021
    George Bouras <foo@example.com> writes:

    spaces inside the <...> to _
    e.g.
    "add of <Number A> and <Number B> = " . ( <Number A> + <Number B>
    to
    "add of <Number_A> and <Number_B> = " . ( <Number_A> + <Number_B>

    s/(<[^<>]*) (?=[^<>]*>)/\1_/g

    The (?= ... ) part is a "zero-width positive look ahead". It consumes
    no characters (so to speak) but must match for the space to match.

    Perl 5.30 has, experimentally, variable-length, zero-width positive look
    behind patterns (it's the variable-length part that is experimental),
    currently limited to 255 characters. That permits

    s/(?<=<[^<>]{0,254}) (?=[^<>]*>)/_/g

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rainer Weikusat@21:1/5 to George Bouras on Wed Apr 7 16:04:17 2021
    George Bouras <foo@example.com> writes:
    Στις 7/4/2021 1:12 μ.μ., ο/η George Bouras έγραψε:

    this loosks ok , any better idea ?



    my $var = '"add of <Number A> and <Number B> = " . ( <Number A> +
    <Number B> )';
    my $tmp = '';

    say $var;
    $var =~s/<([^>]+)(?{ $tmp=$^N; $tmp=~s|\W+|_|g })>/<$tmp>/g;
    say $var;

    Assuming your original description is what you want

    ------
    my $a = "add of <Number A> and <Number B>, <a sentence in angle brackets>";

    $a =~ s/(<[^>]+>)/$1=~y| |_|r/eg;

    print $a, "\n";
    ------

    Closer to your code:

    -------
    my $a = "add of <Number A> and <Number B>, <a sentence in angle brackets>";

    $a =~ s/(<[^>]+>)/$1=~s|\s+|_|gr/eg;

    print $a, "\n";

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to ben.usenet@bsb.me.uk on Wed Apr 7 16:55:17 2021
    In comp.lang.perl.misc, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    George Bouras <foo@example.com> writes:
    spaces inside the <...> to _
    e.g.
    "add of <Number A> and <Number B> = " . ( <Number A> + <Number B>
    to
    "add of <Number_A> and <Number_B> = " . ( <Number_A> + <Number_B>

    s/(<[^<>]*) (?=[^<>]*>)/\1_/g

    $ perl -wle '$_="<Number 100 000>"; s/(<[^<>]*) (?=[^<>]*>)/\1_/g; print'
    \1 better written as $1 at -e line 1.
    <Number 100_000>

    The (?= ... ) part is a "zero-width positive look ahead". It consumes
    no characters (so to speak) but must match for the space to match.

    Since you force a match on '<', your code can only change one space
    per <...> block. I run into this all the time with similar fixes with
    vi search and replace. The fix is loop until it stops matching.

    Or use another method. I like the else-thread suggested one with a tr
    in an s///eg framework, like:

    s/( < [^>]+ > )/ local $_ = $1; tr| |_|; $_ /xeg

    I also like to not perl-golf it.

    Perl 5.30 has, experimentally, variable-length, zero-width positive look behind patterns (it's the variable-length part that is experimental), currently limited to 255 characters. That permits

    s/(?<=<[^<>]{0,254}) (?=[^<>]*>)/_/g

    Sounds computationally expensive.

    Elijah
    ------
    tries not to use the bleeding edge features

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Otto J. Makela@21:1/5 to All on Tue May 4 15:10:36 2021
    It would be nice to have bal() which was a pattern matching primitive
    which matched balanced quote-like separators (including ones where you
    had different left and right quotes, eg () or []). This was a standard primitive in languages like Snobol4 and Icon, where the pattern matching
    wasn't regex-derived.

    Of course making an ersatz version in regex isn't impossible:

    https://www.andrewzammit.com/blog/regexp-matching-balanced-parenthesis-and-quotes-greedy-non-recursive/
    --
    /* * * Otto J. Makela <om@iki.fi> * * * * * * * * * */
    /* Phone: +358 40 765 5772, ICBM: N 60 10' E 24 55' */
    /* Mail: Mechelininkatu 26 B 27, FI-00100 Helsinki */
    /* * * Computers Rule 01001111 01001011 * * * * * * */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jim Gibson@21:1/5 to All on Wed May 5 05:37:02 2021
    On May 4, 2021 at 5:10:36 AM PDT, "Otto J. Makela" <Otto J. Makela> wrote:

    It would be nice to have bal() which was a pattern matching primitive
    which matched balanced quote-like separators (including ones where you
    had different left and right quotes, eg () or []). This was a standard primitive in languages like Snobol4 and Icon, where the pattern matching wasn't regex-derived.

    Of course making an ersatz version in regex isn't impossible:


    https://www.andrewzammit.com/blog/regexp-matching-balanced-parenthesis-and-quotes-greedy-non-recursive/

    There is a module for that:

    https://metacpan.org/pod/Text::Balanced

    --
    Jim Gibson

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)