• GAWK printf strings to numbers seems inconsistent

    From J Naman@21:1/5 to Janis Papanagnou on Thu Feb 4 10:10:21 2021
    On Wednesday, 3 February 2021 at 22:12:55 UTC-5, Janis Papanagnou wrote:
    On 04.02.2021 03:02, J Naman wrote:
    On Wednesday, 3 February 2021 at 20:56:57 UTC-5, J Naman wrote:
    I was surprised by printf behavior when coercing strings to numbers. Not saying it is a bug, just surprised me.
    BEGIN{
    #CONVFMT="%.6g"; GAWK default
    a="202102.01.1234"
    printf("s{%s} d{%d} f{%f}\n",a,a,a) # s{202102.01.1234} d{202102} f{202102.010000}
    printf("s{%s} d{%d} f{%f}\n",+a,+a,+a) # s{202102} d{202102} f{202102.010000}
    printf("s{%s} d{%d} f{%f}\n",0+a,0+a,0+a) # s{202102} d{202102} f{202102.010000}
    printf("s{%s} d{%d} f{%f}\n","" a,"" a,"" a) # s{202102.01.1234} d{202102} f{202102.010000}
    a="20210201.1234"
    printf("s{%s} d{%d} f{%f}\n",a,a,a) # s{20210201.1234} d{20210201} f{20210201.123400}
    printf("s{%s} d{%d} f{%f}\n",+a,+a,+a) # s{2.02102e+007} d{20210201} f{20210201.123400}
    printf("s{%s} d{%d} f{%f}\n",0+a,0+a,0+a) # s{2.02102e+007} d{20210201} f{20210201.123400}
    printf("s{%s} d{%d} f{%f}\n","" a,"" a,"" a) # s{20210201.1234} d{20210201} f{20210201.123400}
    exit;} # eoBegin
    #============
    Woops. The entire second set is correct. CONVFMT %.6g creates the 2.02102e+007
    The first set surprised me a little because there was no error.
    I wouldn't expect an error, but two of the lines surprise me as well
    in one of these output fields.

    But it seems not an issue of the string conversions, you can also see
    effects with floats. Obviously depending on the size of the number,
    the number of decimals. I played around with floats, reduced number
    of decimals, adjusted CONVFMT, etc. etc., like in

    awk '
    BEGIN{a=20219.01;print a;print a+0;a=0+a;print a;printf "%s\n", a}
    '

    awk -v CONVFMT="%.8g" '
    BEGIN{a=20219.01;print a;print a+0;a=0+a;print a;printf "%s\n", a}
    '

    The behavior and observed output is not spending confidence.
    Some CSV had YYYYMM.DD.HHMM sigh ...
    That's an issue of your data. It can be fixed beforehand or when doing
    the awk processing.

    Janis
    CSV: I easily deal with CSV double dots. Just threw one in to see what printf() did with it. Forget CSV, try GAWK Version:
    printf("s{%s} d{%d} f{%f}\n",PROCINFO["version"],PROCINFO["version"],PROCINFO["version"]);
    # s{5.1.0} d{5} f{5.100000}
    gawkver=a="5.1.987654321"
    printf("s{%s} d{%d} f{%f}\n",a,a,a) # s{5.1.0} d{5} f{5.100000} # doesn't round up (good!, i guess)
    # Use with care!

    Not at all sure any of this is a "bug", so Arnold Robbins, et.al. maintainers don't have to "fix" it, whatever that means, unless they feel like it. I don't have any other versions of *awk to see how the effects are handled by others.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Janis Papanagnou on Thu Feb 4 22:16:50 2021
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:


    But it seems not an issue of the string conversions,

    I think it is to do with string conversion -- specifically when they
    occur and when the default numeric output format is used instead.

    you can also see
    effects with floats. Obviously depending on the size of the number,
    the number of decimals. I played around with floats, reduced number
    of decimals, adjusted CONVFMT, etc. etc., like in

    awk '
    BEGIN{a=20219.01;print a;print a+0;a=0+a;print a;printf "%s\n", a}
    '

    All produce 20219 which is the expected result using the default output
    and conversion formats: "%.6g".

    awk -v CONVFMT="%.8g" '
    BEGIN{a=20219.01;print a;print a+0;a=0+a;print a;printf "%s\n", a}
    '

    Only the last uses CONVFMT. All the others use OFMT which remains at
    the default value of "%.6g".

    The behavior and observed output is not spending confidence.

    It surprising that there are two formats -- one for printing and one for conversion to a string -- but having only one might not be so convenient.

    The manual is not perfect here. Someone using it for reference would
    see that is says that numbers are converted to strings for printing, and
    the sections on "How awk Converts Between Strings and Numbers" refers (naturally enough) to CONVFMT. You have to read a few sections further
    in the explanation of print to see that it has its own format.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J Naman@21:1/5 to Ben Bacarisse on Thu Feb 4 22:13:32 2021
    On Thursday, 4 February 2021 at 17:16:52 UTC-5, Ben Bacarisse wrote:
    Janis Papanagnou <janis_pa...@hotmail.com> writes:


    But it seems not an issue of the string conversions,
    I think it is to do with string conversion -- specifically when they
    occur and when the default numeric output format is used instead.
    you can also see
    effects with floats. Obviously depending on the size of the number,
    the number of decimals. I played around with floats, reduced number
    of decimals, adjusted CONVFMT, etc. etc., like in

    awk '
    BEGIN{a=20219.01;print a;print a+0;a=0+a;print a;printf "%s\n", a}
    '
    All produce 20219 which is the expected result using the default output
    and conversion formats: "%.6g".
    awk -v CONVFMT="%.8g" '
    BEGIN{a=20219.01;print a;print a+0;a=0+a;print a;printf "%s\n", a}
    '
    Only the last uses CONVFMT. All the others use OFMT which remains at
    the default value of "%.6g".
    The behavior and observed output is not spending confidence.
    It surprising that there are two formats -- one for printing and one for conversion to a string -- but having only one might not be so convenient.

    The manual is not perfect here. Someone using it for reference would
    see that is says that numbers are converted to strings for printing, and
    the sections on "How awk Converts Between Strings and Numbers" refers (naturally enough) to CONVFMT. You have to read a few sections further
    in the explanation of print to see that it has its own format.

    --
    Ben.
    Excellent insight: when (and where) conversions happen versus when they are printed. I conflated them.
    "but having only one might not be so convenient." Agreed, the potential consequences of doing anything should be well thought out.
    Who knows what might "break". Awk has been doing well for 44 years without any(?) complaints about what I stumbled across.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to J Naman on Fri Feb 5 12:48:02 2021
    On 05.02.2021 07:13, J Naman wrote:
    Awk has been doing well for 44 years without any(?) complaints about
    what I stumbled across.

    In that time range this is not true. If you'd have said 36 years - or
    more accurately 34 years - that's probably more realistic, though still arguable, given that specifically in the topic also (at least partly)
    covered here in this topic enhancements have been made in the vivid
    GNU awk branch (speaking about the latter in STRING, NUMERIC, STRNUM).

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to J Naman on Fri Feb 5 15:03:59 2021
    J Naman <jnaman2@gmail.com> writes:

    On Thursday, 4 February 2021 at 17:16:52 UTC-5, Ben Bacarisse wrote:
    Janis Papanagnou <janis_pa...@hotmail.com> writes:


    But it seems not an issue of the string conversions,
    I think it is to do with string conversion -- specifically when they
    occur and when the default numeric output format is used instead.
    you can also see
    effects with floats. Obviously depending on the size of the number,
    the number of decimals. I played around with floats, reduced number
    of decimals, adjusted CONVFMT, etc. etc., like in

    awk '
    BEGIN{a=20219.01;print a;print a+0;a=0+a;print a;printf "%s\n", a}
    '
    All produce 20219 which is the expected result using the default output
    and conversion formats: "%.6g".
    awk -v CONVFMT="%.8g" '
    BEGIN{a=20219.01;print a;print a+0;a=0+a;print a;printf "%s\n", a}
    '
    Only the last uses CONVFMT. All the others use OFMT which remains at
    the default value of "%.6g".
    The behavior and observed output is not spending confidence.
    It surprising that there are two formats -- one for printing and one for
    conversion to a string -- but having only one might not be so convenient.

    The manual is not perfect here. Someone using it for reference would
    see that is says that numbers are converted to strings for printing, and
    the sections on "How awk Converts Between Strings and Numbers" refers
    (naturally enough) to CONVFMT. You have to read a few sections further
    in the explanation of print to see that it has its own format.

    Excellent insight: when (and where) conversions happen versus when
    they are printed. I conflated them. "but having only one might not be
    so convenient." Agreed, the potential consequences of doing anything
    should be well thought out. Who knows what might "break". Awk has
    been doing well for 44 years without any(?) complaints about what I
    stumbled across.

    But, as it happens, the problem is a new(ish) one. The original AWK
    book (1988, a mere 33 years ago) specified only one format: OFMT.
    Printing and conversion to a string were, back then, consistent.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J Naman@21:1/5 to Aharon Robbins on Mon Feb 8 20:59:41 2021
    On Monday, 8 February 2021 at 00:47:21 UTC-5, Aharon Robbins wrote:
    In article <87im76m...@bsb.me.uk>,
    Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
    But, as it happens, the problem is a new(ish) one. The original AWK
    book (1988, a mere 33 years ago) specified only one format: OFMT.
    Printing and conversion to a string were, back then, consistent.
    POSIX separated the semantics of general number to string conversion
    from the semantics of printing numbers. IMHO this was a good thing.
    Because CONVFMT and OFMT both have the same default value, almost
    all programs continued to work unchanged.
    --
    Aharon (Arnold) Robbins arnold AT skeeve DOT com
    Thank you very much Aharon! Two different semantics explains that there can be a difference. And, after carefully rereading your carefully written GAWK: Effective AWK Programming, I realized that you indirectly explained part of this when you wrote, "If
    you need to represent a floating-point constant at a higher precision than the default and cannot use a command-line assignment to PREC, you should either
    specify the constant as a string, or as a rational number, whenever possible.". Got it: conversion of string to f.p. can be different than a simple assignment. Plus, I LIKE that a constant specified as a string can have a higher precision. "To boldly go
    where no program has gone before, into the twilight zone where ALMOST all programs continue to work unchanged. Except for my silly explorations to see into dark places" Hope I didn't waste peoples' time. I learned a lot and actually reread the guide more
    carefully than my first pass. I think we are at the end of this analysis of printf(). Thanks everyone who pitched in. John (c.f. Jonas)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aharon Robbins@21:1/5 to ben.usenet@bsb.me.uk on Mon Feb 8 05:47:19 2021
    In article <87im76mu8w.fsf@bsb.me.uk>,
    Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    But, as it happens, the problem is a new(ish) one. The original AWK
    book (1988, a mere 33 years ago) specified only one format: OFMT.
    Printing and conversion to a string were, back then, consistent.

    POSIX separated the semantics of general number to string conversion
    from the semantics of printing numbers. IMHO this was a good thing.
    Because CONVFMT and OFMT both have the same default value, almost
    all programs continued to work unchanged.
    --
    Aharon (Arnold) Robbins arnold AT skeeve DOT com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)