• How to split input on \040 ?

    From Janis Papanagnou@21:1/5 to Luuk on Sun Apr 25 12:00:17 2021
    On 25.04.2021 11:28, Luuk wrote:
    I was looking at a question on stackoverflow about "MySQL history show a
    lot of \040" https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377


    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    But, why not do it using GAWK ?

    Because it's overkill?


    ~/tmp> cat testfile
    test\040test

    Attempt 1a:
    ~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\ test

    Attempt 1b:
    ~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile test\040test

    Attempt 2:
    ~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
    $0 }'
    test test

    Attempt 3:
    ~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
    test test

    In the first attempt (1a) I do see a backslash in the output, but if I
    add another backslash (1b) the FS is no longer recognized.

    If I change from reading an inputfile (1a) to reading standard input
    (2), then the output changes....

    My final attempt (3) shows a workaround not using FS

    Question:
    How can FS be set, so GAWK will split input on the four characters \040 ?

    gawk -F '\\\\040' '...'


    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luuk@21:1/5 to All on Sun Apr 25 11:28:27 2021
    I was looking at a question on stackoverflow about "MySQL history show a
    lot of \040" https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377

    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    But, why not do it using GAWK ?


    ~/tmp> cat testfile
    test\040test

    Attempt 1a:
    ~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\ test

    Attempt 1b:
    ~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile test\040test

    Attempt 2:
    ~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
    $0 }'
    test test

    Attempt 3:
    ~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
    test test

    In the first attempt (1a) I do see a backslash in the output, but if I
    add another backslash (1b) the FS is no longer recognized.

    If I change from reading an inputfile (1a) to reading standard input
    (2), then the output changes....

    My final attempt (3) shows a workaround not using FS

    Question:
    How can FS be set, so GAWK will split input on the four characters \040 ?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Luuk on Sun Apr 25 14:14:08 2021
    On 25.04.2021 13:55, Luuk wrote:
    On 25-4-2021 12:00, Janis Papanagnou wrote:
    On 25.04.2021 11:28, Luuk wrote:
    I was looking at a question on stackoverflow about "MySQL history show a >>> lot of \040"
    https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377



    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    But, why not do it using GAWK ?

    Because it's overkill?


    ~/tmp> cat testfile
    test\040test

    Attempt 1a:
    ~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\ test

    Attempt 1b:
    ~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\040test

    Attempt 2:
    ~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
    $0 }'
    test test

    Attempt 3:
    ~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
    test test

    In the first attempt (1a) I do see a backslash in the output, but if I
    add another backslash (1b) the FS is no longer recognized.

    If I change from reading an inputfile (1a) to reading standard input
    (2), then the output changes....

    My final attempt (3) shows a workaround not using FS

    Question:
    How can FS be set, so GAWK will split input on the four characters
    \040 ?

    gawk -F '\\\\040' '...'


    Janis


    wow, that helps....

    ~/tmp> cat testfile | gawk -F '\\\\040' '1'
    test\040test
    ~/tmp> gawk -F '\\\\040' '1' testfile
    test\040test

    or not?

    No. You have to enter your awk code where I wrote '...'.
    I omitted that because you just asked for "setting FS".

    But note my comment about being [lexically] an overkill.
    Your code also changes the input data format when doing
    $1=$1, which is most likely undesired behavior.
    You'd have to use gsub() or similar to avoid that.
    Much easier with sed, since it's only a substitution.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luuk@21:1/5 to Janis Papanagnou on Sun Apr 25 13:55:38 2021
    On 25-4-2021 12:00, Janis Papanagnou wrote:
    On 25.04.2021 11:28, Luuk wrote:
    I was looking at a question on stackoverflow about "MySQL history show a
    lot of \040"
    https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377


    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    But, why not do it using GAWK ?

    Because it's overkill?


    ~/tmp> cat testfile
    test\040test

    Attempt 1a:
    ~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\ test

    Attempt 1b:
    ~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\040test

    Attempt 2:
    ~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
    $0 }'
    test test

    Attempt 3:
    ~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
    test test

    In the first attempt (1a) I do see a backslash in the output, but if I
    add another backslash (1b) the FS is no longer recognized.

    If I change from reading an inputfile (1a) to reading standard input
    (2), then the output changes....

    My final attempt (3) shows a workaround not using FS

    Question:
    How can FS be set, so GAWK will split input on the four characters \040 ?

    gawk -F '\\\\040' '...'


    Janis


    wow, that helps....

    ~/tmp> cat testfile | gawk -F '\\\\040' '1'
    test\040test
    ~/tmp> gawk -F '\\\\040' '1' testfile
    test\040test

    or not?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Luuk on Sun Apr 25 13:21:32 2021
    Luuk <luuk@invalid.lan> writes:

    On 25-4-2021 12:00, Janis Papanagnou wrote:
    On 25.04.2021 11:28, Luuk wrote:
    <cut>
    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    But, why not do it using GAWK ?
    <cut>
    Question:
    How can FS be set, so GAWK will split input on the four characters \040 ? >>
    gawk -F '\\\\040' '...'

    wow, that helps....

    ~/tmp> cat testfile | gawk -F '\\\\040' '1'
    test\040test
    ~/tmp> gawk -F '\\\\040' '1' testfile
    test\040test

    or not?

    Well it does what you ask for, which it a start. It may not do what you
    want, but that's not obvious.

    Since you offer sed -e 's/\\040/ /g' as an option, it seems you don't
    want to do what subject line says (and your question asks for).

    If sed -e 's/\\040/ /g' does what you want, why try to do it in gawk?

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luuk@21:1/5 to Janis Papanagnou on Sun Apr 25 14:54:14 2021
    On 25-4-2021 14:14, Janis Papanagnou wrote:
    On 25.04.2021 13:55, Luuk wrote:
    On 25-4-2021 12:00, Janis Papanagnou wrote:
    On 25.04.2021 11:28, Luuk wrote:
    I was looking at a question on stackoverflow about "MySQL history show a >>>> lot of \040"
    https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377



    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    But, why not do it using GAWK ?

    Because it's overkill?


    ~/tmp> cat testfile
    test\040test

    Attempt 1a:
    ~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\ test

    Attempt 1b:
    ~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\040test

    Attempt 2:
    ~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print >>>> $0 }'
    test test

    Attempt 3:
    ~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
    test test

    In the first attempt (1a) I do see a backslash in the output, but if I >>>> add another backslash (1b) the FS is no longer recognized.

    If I change from reading an inputfile (1a) to reading standard input
    (2), then the output changes....

    My final attempt (3) shows a workaround not using FS

    Question:
    How can FS be set, so GAWK will split input on the four characters
    \040 ?

    gawk -F '\\\\040' '...'


    Janis


    wow, that helps....

    ~/tmp> cat testfile | gawk -F '\\\\040' '1'
    test\040test
    ~/tmp> gawk -F '\\\\040' '1' testfile
    test\040test

    or not?

    No. You have to enter your awk code where I wrote '...'.
    I omitted that because you just asked for "setting FS".

    But note my comment about being [lexically] an overkill.
    Your code also changes the input data format when doing
    $1=$1, which is most likely undesired behavior.
    You'd have to use gsub() or similar to avoid that.
    Much easier with sed, since it's only a substitution.

    Janis


    OK, thanks,

    I will ignore the comment "Much easier with sed" because:
    - I want to do it with (g)AWK, just to learn about it
    - I showed how I could do it with sed.
    - I am still confused about the difference in output ofr my attempt (1a)
    en (2), which I will repeat below:

    Attempt 1a:
    ~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\ test

    Attempt 2:
    ~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
    $0 }'
    test test

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Janis Papanagnou on Sun Apr 25 14:23:57 2021
    On 25.04.2021 14:14, Janis Papanagnou wrote:

    $1=$1, which is most likely undesired behavior.

    Ignore that part of my reply for the given case.

    Janis


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Luuk on Sun Apr 25 08:46:58 2021
    On 4/25/2021 4:28 AM, Luuk wrote:
    I was looking at a question on stackoverflow about "MySQL history show a
    lot of \040" https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377


    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    But, why not do it using GAWK ?


    ~/tmp> cat testfile
    test\040test

    Attempt 1a:
    ~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\ test

    I'm about 95% sure we previously had a discussion on SO about the
    importance of quoting strings in shell.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luuk@21:1/5 to Ed Morton on Sun Apr 25 16:50:55 2021
    On 25-4-2021 15:46, Ed Morton wrote:
    On 4/25/2021 4:28 AM, Luuk wrote:
    I was looking at a question on stackoverflow about "MySQL history show
    a lot of \040"
    https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377


    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    But, why not do it using GAWK ?


    ~/tmp> cat testfile
    test\040test

    Attempt 1a:
    ~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\ test

    I'm about 95% sure we previously had a discussion on SO about the
    importance of quoting strings in shell.

        Ed.

    You are right, but (bad excuse), I am not doing this often enough ... 😉

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Luuk on Sun Apr 25 09:26:57 2021
    On 4/25/2021 7:54 AM, Luuk wrote:
    On 25-4-2021 14:14, Janis Papanagnou wrote:
    On 25.04.2021 13:55, Luuk wrote:
    On 25-4-2021 12:00, Janis Papanagnou wrote:
    On 25.04.2021 11:28, Luuk wrote:
    I was looking at a question on stackoverflow about "MySQL history
    show a
    lot of \040"
    https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377




    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    But, why not do it using GAWK ?

    Because it's overkill?


    ~/tmp> cat testfile
    test\040test

    Attempt 1a:
    ~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\ test

    Attempt 1b:
    ~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\040test

    Attempt 2:
    ~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1;
    print
    $0 }'
    test test

    Attempt 3:
    ~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
    test test

    In the first attempt (1a) I do see a backslash in the output, but if I >>>>> add another backslash (1b) the FS is no longer recognized.

    If I change from reading an inputfile (1a) to reading standard input >>>>> (2), then the output changes....

    My final attempt (3) shows a workaround not using FS

    Question:
    How can FS be set, so GAWK will split input on the four characters
    \040 ?

        gawk -F '\\\\040' '...'


    Janis


    wow, that helps....

    ~/tmp> cat testfile | gawk -F '\\\\040' '1'
    test\040test
    ~/tmp> gawk -F '\\\\040' '1' testfile
    test\040test

    or not?

    No. You have to enter your awk code where I wrote '...'.
    I omitted that because you just asked for "setting FS".

    But note my comment about being [lexically] an overkill.
    Your code also changes the input data format when doing
    $1=$1, which is most likely undesired behavior.
    You'd have to use gsub() or similar to avoid that.
    Much easier with sed, since it's only a substitution.

    Janis


    OK, thanks,

    I will ignore the comment "Much easier with sed" because:
    - I want to do it with (g)AWK, just to learn about it
    - I showed how I could do it with sed.
    - I am still confused about the difference in output ofr my attempt (1a)
    en (2), which I will repeat below:

    Attempt 1a:
    ~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\ test

    Attempt 2:
    ~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
    $0 }'
    test test

    Take a look at the output of `echo test\040test` and then you'lll
    understand. You're again removing the single quotes from a string, this
    time the one that you want to pass as an arg to echo, and so again
    asking the shell to interpret escapes and so the backslash is stripped
    before awk even gets to see the input. So your 2nd script is equivalent to:

    echo 'test040test' | gawk -F '040' 'BEGIN{ OFS=" " }{ $1=$1; print
    $0 }'

    Regards,

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Luuk on Sun Apr 25 09:21:43 2021
    On 4/25/2021 4:28 AM, Luuk wrote:
    I was looking at a question on stackoverflow about "MySQL history show a
    lot of \040" https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377


    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    But, why not do it using GAWK ?


    ~/tmp> cat testfile
    test\040test

    Attempt 1a:
    ~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
    test\ test

    Attempt 1b:
    ~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile test\040test

    Attempt 2:
    ~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
    $0 }'
    test test

    Attempt 3:
    ~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
    test test

    In the first attempt (1a) I do see a backslash in the output, but if I
    add another backslash (1b) the FS is no longer recognized.

    If I change from reading an inputfile (1a) to reading standard input
    (2), then the output changes....

    My final attempt (3) shows a workaround not using FS

    Question:
    How can FS be set, so GAWK will split input on the four characters \040 ?



    There are 3 things to consider here:

    1) By removing the single quotes that should be present by default
    around '\040' you're telling the shell you want it to, among other
    things, interpret escape sequences. Don't do that. **ALWAYS** use single
    quotes around all strings (including scripts) in shell unless you
    **need** to make them double quotes instead (e.g. to allow the shell to
    expand variables) and then use double quotes unless you **need** to use
    no quotes (e.g. to allow the shell do do word splitting and globbing)
    See https://mywiki.wooledge.org/Quotes.

    2) A string passed using `-F` or `-v` has escape sequences interpreted
    by design so. for example, `\t` in the assignment becomes a `<tab>` char
    in the string stored in the variable.

    3) A string used in a regexp (including field separator) context is
    parsed twice by awk, once to convert the string to a regexp and then the
    2nd time to use it as a regexp.

    See https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script
    for how to pass literal strings to awk from shell and then you'll
    understand why you can avoid having to specify multiple pairs of escapes
    when using `-F` to set FS to a string that includes a literal backslash
    and can, for example, do:

    $ awk 'BEGIN{FS=ARGV[1]; ARGV[1]=""} {$1=$1} 1' '\\040' file
    test test

    or:

    $ fs='\\040' awk 'BEGIN{FS=ENVIRON["fs"]} {$1=$1} 1' file
    test test

    instead of this if you just quote your string properly but still want to
    deal with awk interpreting escapes with this kind of assignment:

    $ awk -F'\\\\040' '{$1=$1} 1' file
    test test

    or this if you choose to ask the shell to also interpret escapes (so
    every escape char in the above has to be escaped again) before awk sees
    them:

    $ awk -F \\\\\\\\040 '{$1=$1} 1' file
    test test

    Regards,

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Luuk on Sun Apr 25 19:57:57 2021
    On 25.04.2021 14:54, Luuk wrote:

    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    gawk -F '\\\\040' '...'

    OK, thanks,

    I will ignore the comment "Much easier with sed" because:
    - I want to do it with (g)AWK, just to learn about it

    Okay. First; you called your "Attempt 3" as being "workaround".
    Actually all your field-separator based approaches (1a/1b/2)
    are hacks that work around what sed is doing; a substitution.
    You can formulate that substitution also in awk - you actually
    did already in your "Attempt 3". That could of course be done
    a little bit terser, e.g. by

    gawk 'gsub(/\\040/," ")+1'


    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luuk@21:1/5 to Janis Papanagnou on Sun Apr 25 21:01:27 2021
    On 25-4-2021 19:57, Janis Papanagnou wrote:
    On 25.04.2021 14:54, Luuk wrote:

    I know one can change the '\040' for a ' ' by doing:
    ~/tmp> sed -e 's/\\040/ /g' testfile
    test test

    gawk -F '\\\\040' '...'

    OK, thanks,

    I will ignore the comment "Much easier with sed" because:
    - I want to do it with (g)AWK, just to learn about it

    Okay. First; you called your "Attempt 3" as being "workaround".
    Actually all your field-separator based approaches (1a/1b/2)
    are hacks that work around what sed is doing; a substitution.
    You can formulate that substitution also in awk - you actually
    did already in your "Attempt 3". That could of course be done
    a little bit terser, e.g. by

    gawk 'gsub(/\\040/," ")+1'



    I had to look up what 'terser' means (I do not speak English very well)
    terser = Brief and to the point

    so, terser would also have to leave out the meaningless '+' sign

    Because this does doe the same:
    gawk 'gsub(/\\040/," ")1'


    Janis


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Luuk on Sun Apr 25 22:04:57 2021
    On 25.04.2021 21:01, Luuk wrote:
    You can formulate that substitution also in awk - you actually
    did already in your "Attempt 3". That could of course be done
    a little bit terser, e.g. by

    gawk 'gsub(/\\040/," ")+1'



    I had to look up what 'terser' means (I do not speak English very well) terser = Brief and to the point

    so, terser would also have to leave out the meaningless '+' sign

    It's not meaningless; it makes a possible 0 result always positive.

    Because this does doe the same:
    gawk 'gsub(/\\040/," ")1'

    This saves one character (is lexically terser) but is more complex
    than a simple increment due to the implicit type conversions that
    are performed and the implicit string concatenation. - YMMV, but
    there's anyway not much difference.

    My preference [in awk] would anyway be "wasting" yet more characters

    gawk 'gsub(/\\040/," ") || 1'

    instead of relying on implicit conversions or on the +1, because I
    think it's the clearest without an explicit action block.

    If your intention is minimizing the number of characters (a most
    terse solution) there's always the original solution available

    sed 's/\\040/ /g'


    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Janis Papanagnou on Mon Apr 26 08:36:06 2021
    On 4/25/2021 3:04 PM, Janis Papanagnou wrote:
    On 25.04.2021 21:01, Luuk wrote:
    You can formulate that substitution also in awk - you actually
    did already in your "Attempt 3". That could of course be done
    a little bit terser, e.g. by

    gawk 'gsub(/\\040/," ")+1'



    I had to look up what 'terser' means (I do not speak English very well)
    terser = Brief and to the point

    so, terser would also have to leave out the meaningless '+' sign

    It's not meaningless; it makes a possible 0 result always positive.

    Because this does doe the same:
    gawk 'gsub(/\\040/," ")1'

    This saves one character (is lexically terser) but is more complex
    than a simple increment due to the implicit type conversions that
    are performed and the implicit string concatenation. - YMMV, but
    there's anyway not much difference.

    My preference [in awk] would anyway be "wasting" yet more characters

    gawk 'gsub(/\\040/," ") || 1'

    instead of relying on implicit conversions or on the +1, because I
    think it's the clearest without an explicit action block.
    <snip>

    Several people on this forum discussed using the result of an action in
    the condition context years ago (maybe 20 years ago?) and IIRC came to
    the consensus that:

    awk '{gsub(/\\040/," ")} 1'

    was the clearest way to write such code.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Ed Morton on Mon Apr 26 09:07:02 2021
    On 4/26/2021 8:36 AM, Ed Morton wrote:
    On 4/25/2021 3:04 PM, Janis Papanagnou wrote:
    On 25.04.2021 21:01, Luuk wrote:
    You can formulate that substitution also in awk - you actually
    did already in your "Attempt 3". That could of course be done
    a little bit terser, e.g. by

        gawk 'gsub(/\\040/," ")+1'



    I had to look up what 'terser' means (I do not speak English very well)
    terser = Brief and to the point

    so, terser would also have to leave out the meaningless '+' sign

    It's not meaningless; it makes a possible 0 result always positive.

    Because this does doe the same:
    gawk 'gsub(/\\040/," ")1'

    This saves one character (is lexically terser) but is more complex
    than a simple increment due to the implicit type conversions that
    are performed and the implicit string concatenation. - YMMV, but
    there's anyway not much difference.

    My preference [in awk] would anyway be "wasting" yet more characters

       gawk 'gsub(/\\040/," ") || 1'

    instead of relying on implicit conversions or on the +1, because I
    think it's the clearest without an explicit action block.
    <snip>

    Several people on this forum discussed using the result of an action in
    the condition context years ago (maybe 20 years ago?) and IIRC came to
    the consensus that:

       awk '{gsub(/\\040/," ")} 1'

    was the clearest way to write such code.


    To be clear I'm not saying that's how to use the result of an action as
    a condition, I'm saying don't do that unless you actually have some need
    to do so other than possibly saving a character or 2 in your script.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Ed Morton on Mon Apr 26 16:17:50 2021
    On 26.04.2021 15:36, Ed Morton wrote:
    <snip>

    Several people on this forum discussed using the result of an action in
    the condition context years ago (maybe 20 years ago?) and IIRC came to
    the consensus that:

    awk '{gsub(/\\040/," ")} 1'

    was the clearest way to write such code.

    awk '{ gsub(/\\040/," ") ; print }'

    is certainly the clearest way, readable and understandable not only
    by Awk nerds.

    As soon as we introduce idioms like the '1' above there's not really
    any significant difference whether we have the function also in the
    condition part.

    The way from {gsub(/\\040/," ")} 1 to gsub(/\\040/," ")+1 is
    just an insignificant small step, at least for people like both of us
    and others here.

    Beyond that this post starts making the topic a religious affair, and
    I am not inclined to support that.

    WRT the original code source of the topic - strictly speaking off-topic
    here - I still think that
    sed 's/\\040/ /g'
    is the clearest formulation, in addition to being terse and more than
    twice as fast in execution speed than any of the very similar awk code
    pattern variants.

    Janis


    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)