• Finding POSIX standard updates for awk

    From Ed Morton@21:1/5 to All on Thu Mar 4 07:36:17 2021
    There are a few awk changes I see planned for a future POSIX standards
    update, e.g.:

    delete(array): https://www.austingroupbugs.net/view.php?id=544
    nextfile: https://www.austingroupbugs.net/view.php?id=607
    fflush(): https://www.austingroupbugs.net/view.php?id=634

    Disappointingly (since it's common, useful, and wouldn't break any
    existing scripts if implemented) I don't see length(array) listed for
    inclusion in the standard so I'm wondering if I'm just missing it and
    what other awk changes might be coming.

    Q18 in the FAQ at http://www.opengroup.org/austin/papers/posix_faq.html
    says that next POSIX standard revision is planned to be issued in 2022
    and that seems like it will be Issue 8 given the Issue numbers I see at
    the top of:

    2004 Edition (https://pubs.opengroup.org/onlinepubs/009696799/) = Issue 6
    2018 Edition (https://pubs.opengroup.org/onlinepubs/9699919799/) = Issue 7

    The above 3 awk changes are tagged with "issue 8" so I assume they are
    all going to be present in that 2022 issue of the standard, it appears I
    can see all changes associated with "issue 8" (2022) at https://www.austingroupbugs.net/tag_view_page.php?tag_id=8 and clicking
    on the "Attached Issues" link there lists 177 issues.

    I can manually search for "awk" in the Summary for each issue but the
    tool name isn't always present in the summary text so - is there a
    robust way to see all planned changes for a given tool such as awk?

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Janis Papanagnou on Thu Mar 4 16:20:40 2021
    On 04.03.2021 16:16, Janis Papanagnou wrote:
    On 04.03.2021 14:36, Ed Morton wrote:
    There are a few awk changes I see planned for a future POSIX standards
    update, e.g.:

    delete(array): https://www.austingroupbugs.net/view.php?id=544
    nextfile: https://www.austingroupbugs.net/view.php?id=607
    fflush(): https://www.austingroupbugs.net/view.php?id=634

    Disappointingly (since it's common, useful, and wouldn't break any
    existing scripts if implemented) I don't see length(array) listed for
    inclusion in the standard so I'm wondering if I'm just missing it [...]

    The referenced issues carry comments about the features being widely available in other awks. Is that also true for 'length(array)'?

    And to add: also efficiency seems to be a concern (at least in
    'delete(array)' and 'nextfile'). Dynamically determining the
    'length(array)' might not qualify in that respect (also adding
    an implicit counter might not qualify for obvious reasons).

    Janis


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Ed Morton on Thu Mar 4 16:16:26 2021
    On 04.03.2021 14:36, Ed Morton wrote:
    There are a few awk changes I see planned for a future POSIX standards update, e.g.:

    delete(array): https://www.austingroupbugs.net/view.php?id=544
    nextfile: https://www.austingroupbugs.net/view.php?id=607
    fflush(): https://www.austingroupbugs.net/view.php?id=634

    Disappointingly (since it's common, useful, and wouldn't break any
    existing scripts if implemented) I don't see length(array) listed for inclusion in the standard so I'm wondering if I'm just missing it [...]

    The referenced issues carry comments about the features being widely
    available in other awks. Is that also true for 'length(array)'?

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Janis Papanagnou on Thu Mar 4 09:53:00 2021
    On 3/4/2021 9:20 AM, Janis Papanagnou wrote:
    On 04.03.2021 16:16, Janis Papanagnou wrote:
    On 04.03.2021 14:36, Ed Morton wrote:
    There are a few awk changes I see planned for a future POSIX standards
    update, e.g.:

    delete(array): https://www.austingroupbugs.net/view.php?id=544
    nextfile: https://www.austingroupbugs.net/view.php?id=607
    fflush(): https://www.austingroupbugs.net/view.php?id=634

    Disappointingly (since it's common, useful, and wouldn't break any
    existing scripts if implemented) I don't see length(array) listed for
    inclusion in the standard so I'm wondering if I'm just missing it [...]

    The referenced issues carry comments about the features being widely
    available in other awks. Is that also true for 'length(array)'?

    And to add: also efficiency seems to be a concern (at least in 'delete(array)' and 'nextfile'). Dynamically determining the
    'length(array)' might not qualify in that respect (also adding
    an implicit counter might not qualify for obvious reasons).


    The description of `delete(array)` in that link is misleading as it's
    really replacing `split("",array)` rather than `for (i in array) delete array[i]` in common use.

    Idk how commonly length(array) is implemented but it's commonly used
    when handling arrays in gawk scripts, would presumably be implemented
    more efficiently than `c=0; for (i in array) c++`, and can't break any
    existing scripts when introduced so it seems to me like an excellent
    candidate for the standard.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Ed Morton on Thu Mar 4 17:14:09 2021
    On 04.03.2021 16:53, Ed Morton wrote:
    On 3/4/2021 9:20 AM, Janis Papanagnou wrote:
    On 04.03.2021 16:16, Janis Papanagnou wrote:
    On 04.03.2021 14:36, Ed Morton wrote:
    There are a few awk changes I see planned for a future POSIX standards >>>> update, e.g.:

    delete(array): https://www.austingroupbugs.net/view.php?id=544
    nextfile: https://www.austingroupbugs.net/view.php?id=607
    fflush(): https://www.austingroupbugs.net/view.php?id=634

    Disappointingly (since it's common, useful, and wouldn't break any
    existing scripts if implemented) I don't see length(array) listed for
    inclusion in the standard so I'm wondering if I'm just missing it [...] >>>
    The referenced issues carry comments about the features being widely
    available in other awks. Is that also true for 'length(array)'?

    And to add: also efficiency seems to be a concern (at least in
    'delete(array)' and 'nextfile'). Dynamically determining the
    'length(array)' might not qualify in that respect (also adding
    an implicit counter might not qualify for obvious reasons).


    The description of `delete(array)` in that link is misleading as it's
    really replacing `split("",array)` rather than `for (i in array) delete array[i]` in common use.

    Idk how commonly length(array) is implemented but it's commonly used
    when handling arrays in gawk scripts, would presumably be implemented
    more efficiently than `c=0; for (i in array) c++`, and can't break any existing scripts when introduced so it seems to me like an excellent candidate for the standard.

    Well, what I was aiming at was that in cases where you need to
    interrogate the number of elements in the array you don't need
    to traverse the whole array (that may be costly) but can count
    on insertion and on deletion of elements. Or, of course, if it
    fits better you can also loop-count (in cases where you'd not
    need to interrogate that number often). A built-in length(arr)
    function would need to impose costs on *any* user - either to increment/decrement a counter, or loop across the whole array,
    which may unnecessarily be a of bad performance. I think that
    characteristic makes it not a perfect feature candidate for a
    standard implementation.

    Janis


    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Janis Papanagnou on Thu Mar 4 11:25:56 2021
    On 3/4/2021 10:14 AM, Janis Papanagnou wrote:
    On 04.03.2021 16:53, Ed Morton wrote:
    On 3/4/2021 9:20 AM, Janis Papanagnou wrote:
    On 04.03.2021 16:16, Janis Papanagnou wrote:
    On 04.03.2021 14:36, Ed Morton wrote:
    There are a few awk changes I see planned for a future POSIX standards >>>>> update, e.g.:

    delete(array): https://www.austingroupbugs.net/view.php?id=544
    nextfile: https://www.austingroupbugs.net/view.php?id=607
    fflush(): https://www.austingroupbugs.net/view.php?id=634

    Disappointingly (since it's common, useful, and wouldn't break any
    existing scripts if implemented) I don't see length(array) listed for >>>>> inclusion in the standard so I'm wondering if I'm just missing it [...] >>>>
    The referenced issues carry comments about the features being widely
    available in other awks. Is that also true for 'length(array)'?

    And to add: also efficiency seems to be a concern (at least in
    'delete(array)' and 'nextfile'). Dynamically determining the
    'length(array)' might not qualify in that respect (also adding
    an implicit counter might not qualify for obvious reasons).


    The description of `delete(array)` in that link is misleading as it's
    really replacing `split("",array)` rather than `for (i in array) delete
    array[i]` in common use.

    Idk how commonly length(array) is implemented but it's commonly used
    when handling arrays in gawk scripts, would presumably be implemented
    more efficiently than `c=0; for (i in array) c++`, and can't break any
    existing scripts when introduced so it seems to me like an excellent
    candidate for the standard.

    Well, what I was aiming at was that in cases where you need to
    interrogate the number of elements in the array you don't need
    to traverse the whole array (that may be costly) but can count
    on insertion and on deletion of elements. Or, of course, if it
    fits better you can also loop-count (in cases where you'd not
    need to interrogate that number often). A built-in length(arr)
    function would need to impose costs on *any* user - either to increment/decrement a counter, or loop across the whole array,
    which may unnecessarily be a of bad performance. I think that
    characteristic makes it not a perfect feature candidate for a
    standard implementation.


    I wouldn't be surprised if existing implementations already tracked how
    many elements are in an array and so then the length(array) function
    just becomes returning that value and in the absolute worst case where
    that isn't true and adding such a counter on insertions/deletions isn't absolutely trivial with negligible performance impact, the provider of
    that awk variant could implement length(array) as a loop so they don't
    have to do anything extra at all on insertions/deletions, and it'd be no
    more expensive than the user manually writing such a loop (probably much faster) and even IF that's unacceptably slow then the user still has
    the option of just not calling it and instead keeping a counter of insertions/deletions manually.

    There's simply no down side to having length(array) in the language,
    other than that some awk implementations would need a trivial tweak to
    provide it and it lets us write scripts that can benefit from it (by
    reduced code and/or improved efficiency and/or ability to write more
    general functions that can operate on arrays) more portably.

    Regards,

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Ed Morton on Fri Mar 5 09:04:56 2021
    On 04.03.2021 18:25, Ed Morton wrote:

    There's simply no down side to having length(array) in the language,
    other than that some awk implementations would need a trivial tweak to provide it and it lets us write scripts that can benefit from it (by
    reduced code and/or improved efficiency and/or ability to write more
    general functions that can operate on arrays) more portably.

    All practical applications I pondered about and those grep'ed in
    my sources did not show the necessity. All are typically solvable
    in easy ways with negligible overhead without performance issues.

    Maybe you can provide a concrete example from your practices that
    obviously demonstrate the necessity of a standard 'length(array)'?

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aharon Robbins@21:1/5 to janis_papanagnou@hotmail.com on Fri Mar 5 09:04:45 2021
    In article <s1qtka$a90$1@news-1.m-online.net>,
    Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
    On 04.03.2021 14:36, Ed Morton wrote:
    There are a few awk changes I see planned for a future POSIX standards
    update, e.g.:

    delete(array): https://www.austingroupbugs.net/view.php?id=544
    nextfile: https://www.austingroupbugs.net/view.php?id=607
    fflush(): https://www.austingroupbugs.net/view.php?id=634

    Disappointingly (since it's common, useful, and wouldn't break any
    existing scripts if implemented) I don't see length(array) listed for
    inclusion in the standard so I'm wondering if I'm just missing it [...]

    I have a good sized list of comments / enhancement requests for POSIX
    and length(array) is one of them. I just need a few hours to enter
    them into the Austin Group bug system.

    The referenced issues carry comments about the features being widely >available in other awks. Is that also true for 'length(array)'?

    It's implemented in gawk and BWK awk, as well as in mawk 1.9.9.6.

    Gawk and BWK awk keep a count of the number of elements in an array,
    so length(array) has neglible cost.

    It's been in BWK awk since 2002 (!), and was in gawk even before then. So,
    it's about time it got standardized.
    --
    Aharon (Arnold) Robbins arnold AT skeeve DOT com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Geoff Clare@21:1/5 to Ed Morton on Fri Mar 5 12:08:03 2021
    Ed Morton wrote:

    delete(array): https://www.austingroupbugs.net/view.php?id=544
    nextfile: https://www.austingroupbugs.net/view.php?id=607
    fflush(): https://www.austingroupbugs.net/view.php?id=634

    [...]
    The above 3 awk changes are tagged with "issue 8" so I assume they are
    all going to be present in that 2022 issue of the standard, it appears I
    can see all changes associated with "issue 8" (2022) at https://www.austingroupbugs.net/tag_view_page.php?tag_id=8 and clicking
    on the "Attached Issues" link there lists 177 issues.

    I can manually search for "awk" in the Summary for each issue but the
    tool name isn't always present in the summary text so - is there a
    robust way to see all planned changes for a given tool such as awk?

    Restricting your search to bugs tagged "issue8" could mean you miss
    some things, for two reasons:

    1. The "issue8" tag is added when a bug is resolved (with a change to
    be made), so there could be as yet unresolved bugs that will result in
    a change in Issue 8.

    2. There is also a "tc3-2008" tag which is for bugs that would be
    suitable for inclusion in a 3rd TC for Issue 7, if the Austin Group
    decides to produce one. (It doesn't currently plan to, but is keeping
    the option open.) The Issue 8 draft has these applied as well as the
    "issue8" tagged bugs.

    As regards a more robust way to list the awk-related bugs of interest to
    you: if you are primarily looking for feature additions (as opposed to
    minor bug fixes etc.) then I would suggest you expand the filters and
    click on "Advanced Filters" then select all of the Status values except "Closed", and all of the Section values that include awk (there are 4 at
    the moment - use your browser's "find in page" feature to find them
    quickly), then click "Apply Filter". This will finds bugs that were
    reported specifically about awk (or awk and other things) - currently
    13 bugs.

    If you really want to find everything that affects awk, then instead of
    using the Section values you should just put "awk" in the search box.
    This will, of course, produce some false positives from words like
    awkward, but there are ony 36 results so not a huge amount to weed
    through.

    The reason for omitting "Closed" status is because old bugs that were
    fixed in Issue 7 or one of its TCs will be on Closed status.

    --
    Geoff Clare <netnews@gclare.org.uk>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Janis Papanagnou on Fri Mar 5 06:39:41 2021
    On 3/5/2021 2:04 AM, Janis Papanagnou wrote:
    On 04.03.2021 18:25, Ed Morton wrote:

    There's simply no down side to having length(array) in the language,
    other than that some awk implementations would need a trivial tweak to
    provide it and it lets us write scripts that can benefit from it (by
    reduced code and/or improved efficiency and/or ability to write more
    general functions that can operate on arrays) more portably.

    All practical applications I pondered about and those grep'ed in
    my sources did not show the necessity. All are typically solvable
    in easy ways with negligible overhead without performance issues.

    Maybe you can provide a concrete example from your practices that
    obviously demonstrate the necessity of a standard 'length(array)'?


    Of course it's not necessary. Neither is `length(string)`:

    $ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++); print c}'
    3

    nor `delete(array)` nor several other useful constructs.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Geoff Clare on Fri Mar 5 07:31:25 2021
    On 3/5/2021 6:08 AM, Geoff Clare wrote:
    Ed Morton wrote:

    delete(array): https://www.austingroupbugs.net/view.php?id=544
    nextfile: https://www.austingroupbugs.net/view.php?id=607
    fflush(): https://www.austingroupbugs.net/view.php?id=634

    [...]
    The above 3 awk changes are tagged with "issue 8" so I assume they are
    all going to be present in that 2022 issue of the standard, it appears I
    can see all changes associated with "issue 8" (2022) at
    https://www.austingroupbugs.net/tag_view_page.php?tag_id=8 and clicking
    on the "Attached Issues" link there lists 177 issues.

    I can manually search for "awk" in the Summary for each issue but the
    tool name isn't always present in the summary text so - is there a
    robust way to see all planned changes for a given tool such as awk?

    Restricting your search to bugs tagged "issue8" could mean you miss
    some things, for two reasons:

    1. The "issue8" tag is added when a bug is resolved (with a change to
    be made), so there could be as yet unresolved bugs that will result in
    a change in Issue 8.

    2. There is also a "tc3-2008" tag which is for bugs that would be
    suitable for inclusion in a 3rd TC for Issue 7, if the Austin Group
    decides to produce one. (It doesn't currently plan to, but is keeping
    the option open.) The Issue 8 draft has these applied as well as the "issue8" tagged bugs.

    As regards a more robust way to list the awk-related bugs of interest to
    you: if you are primarily looking for feature additions (as opposed to
    minor bug fixes etc.) then I would suggest you expand the filters and
    click on "Advanced Filters" then select all of the Status values except "Closed", and all of the Section values that include awk (there are 4 at
    the moment - use your browser's "find in page" feature to find them
    quickly), then click "Apply Filter". This will finds bugs that were
    reported specifically about awk (or awk and other things) - currently
    13 bugs.

    If you really want to find everything that affects awk, then instead of
    using the Section values you should just put "awk" in the search box.
    This will, of course, produce some false positives from words like
    awkward, but there are ony 36 results so not a huge amount to weed
    through.

    The reason for omitting "Closed" status is because old bugs that were
    fixed in Issue 7 or one of its TCs will be on Closed status.


    Thanks for the info!

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Aharon Robbins on Fri Mar 5 07:30:43 2021
    On 3/5/2021 3:04 AM, Aharon Robbins wrote:
    In article <s1qtka$a90$1@news-1.m-online.net>,
    Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
    On 04.03.2021 14:36, Ed Morton wrote:
    There are a few awk changes I see planned for a future POSIX standards
    update, e.g.:

    delete(array): https://www.austingroupbugs.net/view.php?id=544
    nextfile: https://www.austingroupbugs.net/view.php?id=607
    fflush(): https://www.austingroupbugs.net/view.php?id=634

    Disappointingly (since it's common, useful, and wouldn't break any
    existing scripts if implemented) I don't see length(array) listed for
    inclusion in the standard so I'm wondering if I'm just missing it [...]

    I have a good sized list of comments / enhancement requests for POSIX
    and length(array) is one of them. I just need a few hours to enter
    them into the Austin Group bug system.

    The referenced issues carry comments about the features being widely
    available in other awks. Is that also true for 'length(array)'?

    It's implemented in gawk and BWK awk, as well as in mawk 1.9.9.6.

    Gawk and BWK awk keep a count of the number of elements in an array,
    so length(array) has neglible cost.

    It's been in BWK awk since 2002 (!), and was in gawk even before then. So, it's about time it got standardized.


    Good, thanks!

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Ed Morton on Fri Mar 5 16:06:15 2021
    On 05.03.2021 13:39, Ed Morton wrote:
    On 3/5/2021 2:04 AM, Janis Papanagnou wrote:
    On 04.03.2021 18:25, Ed Morton wrote:

    There's simply no down side to having length(array) in the language,
    other than that some awk implementations would need a trivial tweak to
    provide it and it lets us write scripts that can benefit from it (by
    reduced code and/or improved efficiency and/or ability to write more
    general functions that can operate on arrays) more portably.

    All practical applications I pondered about and those grep'ed in
    my sources did not show the necessity. All are typically solvable
    in easy ways with negligible overhead without performance issues.

    Maybe you can provide a concrete example from your practices that
    obviously demonstrate the necessity of a standard 'length(array)'?


    Of course it's not necessary. Neither is `length(string)`:

    $ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++); print c}'
    3

    This is silly.

    nor `delete(array)` nor several other useful constructs.

    Ed.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Janis Papanagnou on Fri Mar 5 10:01:39 2021
    On 3/5/2021 9:06 AM, Janis Papanagnou wrote:
    On 05.03.2021 13:39, Ed Morton wrote:
    On 3/5/2021 2:04 AM, Janis Papanagnou wrote:
    On 04.03.2021 18:25, Ed Morton wrote:

    There's simply no down side to having length(array) in the language,
    other than that some awk implementations would need a trivial tweak to >>>> provide it and it lets us write scripts that can benefit from it (by
    reduced code and/or improved efficiency and/or ability to write more
    general functions that can operate on arrays) more portably.

    All practical applications I pondered about and those grep'ed in
    my sources did not show the necessity. All are typically solvable
    in easy ways with negligible overhead without performance issues.

    Maybe you can provide a concrete example from your practices that
    obviously demonstrate the necessity of a standard 'length(array)'?


    Of course it's not necessary. Neither is `length(string)`:

    $ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++); print c}' >> 3

    This is silly.

    It's conceptually the same as and therefore no more silly than the
    equivalent for arrays:

    $ awk 'BEGIN{split("f o o",arr); for (i in arr) c++; print c}'
    3

    Obviously I'm populating `str` and `arr` that way for brevity for this
    example so while _in this case_ you could save the output of split()
    that's completely irrelevant to the general case where at some point in
    your code you need to know the length of a string or an array,
    regardless of how that string or array was constructed or modified
    leading up to that point.

    You can use a loop to get the length of a string or array and yet right
    now we have `length(string)` in POSIX but not `length(array)`.

    Anyway, looks like length(array) is present in [almost] all modern awks
    and Arnold is taking care of requesting it be added to POSIX so there's
    no point discussing it further.

    Ed.

    nor `delete(array)` nor several other useful constructs.

    Ed.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Ed Morton on Fri Mar 5 17:23:08 2021
    On 05.03.2021 17:01, Ed Morton wrote:
    On 3/5/2021 9:06 AM, Janis Papanagnou wrote:
    On 05.03.2021 13:39, Ed Morton wrote:
    On 3/5/2021 2:04 AM, Janis Papanagnou wrote:
    On 04.03.2021 18:25, Ed Morton wrote:

    There's simply no down side to having length(array) in the language, >>>>> other than that some awk implementations would need a trivial tweak to >>>>> provide it and it lets us write scripts that can benefit from it (by >>>>> reduced code and/or improved efficiency and/or ability to write more >>>>> general functions that can operate on arrays) more portably.

    All practical applications I pondered about and those grep'ed in
    my sources did not show the necessity. All are typically solvable
    in easy ways with negligible overhead without performance issues.

    Maybe you can provide a concrete example from your practices that
    obviously demonstrate the necessity of a standard 'length(array)'?


    Of course it's not necessary. Neither is `length(string)`:

    $ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++);
    print c}'
    3

    This is silly.

    It's conceptually the same as and therefore no more silly than the
    equivalent for arrays:

    I cannot take you serious with this sidetrack; my question was
    simple and could be supported be (any existing) evidence.

    Instead of writing such silly statement or vacuous nonsense like...

    nor `delete(array)` nor several other useful constructs.

    ...I had expected that you could provide some examples from your
    practical contexts - I'd expected you have some. So it seems you
    have none. That would have been a sufficient answer, either way.
    Just claiming it's useful isn't helpful for the question asked.

    Awk is a terse language. My reflexive expectation would have been
    that 'length(array)' is a useful feature. But really, I could not
    find any sensible example. (That's a different experience I have
    from other programming languages that have a strong emphasis on
    data structures!) But with Awk's restrictions and idioms, really,
    that feature appears to me to not be of any importance. Therefore
    I was asking for substantial application cases.

    Janis

    Ed.




    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Janis Papanagnou on Fri Mar 5 11:16:07 2021
    On 3/5/2021 10:23 AM, Janis Papanagnou wrote:
    On 05.03.2021 17:01, Ed Morton wrote:
    On 3/5/2021 9:06 AM, Janis Papanagnou wrote:
    On 05.03.2021 13:39, Ed Morton wrote:
    On 3/5/2021 2:04 AM, Janis Papanagnou wrote:
    On 04.03.2021 18:25, Ed Morton wrote:

    There's simply no down side to having length(array) in the language, >>>>>> other than that some awk implementations would need a trivial tweak to >>>>>> provide it and it lets us write scripts that can benefit from it (by >>>>>> reduced code and/or improved efficiency and/or ability to write more >>>>>> general functions that can operate on arrays) more portably.

    All practical applications I pondered about and those grep'ed in
    my sources did not show the necessity. All are typically solvable
    in easy ways with negligible overhead without performance issues.

    Maybe you can provide a concrete example from your practices that
    obviously demonstrate the necessity of a standard 'length(array)'?


    Of course it's not necessary. Neither is `length(string)`:

    $ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++);
    print c}'
    3

    This is silly.

    It's conceptually the same as and therefore no more silly than the
    equivalent for arrays:

    I cannot take you serious with this sidetrack; my question was
    simple and could be supported be (any existing) evidence.

    Instead of writing such silly statement or vacuous nonsense like...

    What sidetrack? I have no idea what you're talking about by saying that
    nor why you're behaving like this. I've explained clearly and simply in response to your questions why length(array) is useful and appropriate
    to be part of POSIX by showing how its behavior relates to existing and
    pending functionality for strings and arrays. How you can claim that's
    silly or vacuous is beyond me and frankly quite insulting.


    nor `delete(array)` nor several other useful constructs.

    ...I had expected that you could provide some examples from your
    practical contexts - I'd expected you have some. So it seems you
    have none. That would have been a sufficient answer, either way.
    Just claiming it's useful isn't helpful for the question asked.

    I've given you enough examples. Apparently you find them silly and
    vacuous despite them clearly and directly addressing the issue so
    needless to say I won't be providing any more.

    Awk is a terse language. My reflexive expectation would have been
    that 'length(array)' is a useful feature. But really, I could not
    find any sensible example. (That's a different experience I have
    from other programming languages that have a strong emphasis on
    data structures!) But with Awk's restrictions and idioms, really,
    that feature appears to me to not be of any importance. Therefore
    I was asking for substantial application cases.

    You are, apparently deliberately and purposefully, confusing "necessary"
    with "useful and consistent", and becoming rude so there's no point
    continuing this conversation. I look forward to seeing length(array)
    become part of the POSIX standard just as it's already part of almost
    all modern awks for the reasons I've already stated.

    Ed.


    Janis

    Ed.





    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Ed Morton on Fri Mar 5 20:20:40 2021
    Ed Morton <mortonspam@gmail.com> writes:

    On 3/5/2021 10:23 AM, Janis Papanagnou wrote:

    ...I had expected that you could provide some examples from your
    practical contexts - I'd expected you have some. So it seems you
    have none. That would have been a sufficient answer, either way.
    Just claiming it's useful isn't helpful for the question asked.

    I've given you enough examples.

    Things have got a bit heated. I did not know about (and had not missed)
    gawk's length(array) extension so I too was hoping to see what sort of
    scripts use it. In case I'd missed them, I looked through all your
    posts in this thread and could not see any examples where length(array)
    was used.

    Obviously I don't expect you to put in any work looking out such
    examples just because I ask, but you might have some to hand, given the interest you have in including it in a future POSIX standard. Use cases
    would be useful in that context too.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Ed Morton on Fri Mar 5 15:30:23 2021
    On 3/5/2021 3:25 PM, Ed Morton wrote:
    <snip>
        function paste(left,right,      lgthLeft,lgthRight,i,n,out) {
            for (i=1; i in left; i++) {
    that can just be `for (i in left)` of course and ditto for right.

    Ed.
                lgthLeft++
            }
            for (i=1; i in right; i++) {
                lgthRight++
            }
            n = ( lgthLeft > lgthRight ? lgthLeft : lgthRight )
            for (i=1; i<=n; i++) {
                out = out "<"left[i]":"right[i]">" OFS
            }
            return out
        }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Ben Bacarisse on Fri Mar 5 15:25:34 2021
    On 3/5/2021 2:20 PM, Ben Bacarisse wrote:
    Ed Morton <mortonspam@gmail.com> writes:

    On 3/5/2021 10:23 AM, Janis Papanagnou wrote:

    ...I had expected that you could provide some examples from your
    practical contexts - I'd expected you have some. So it seems you
    have none. That would have been a sufficient answer, either way.
    Just claiming it's useful isn't helpful for the question asked.

    I've given you enough examples.

    Things have got a bit heated. I did not know about (and had not missed) gawk's length(array) extension so I too was hoping to see what sort of scripts use it. In case I'd missed them, I looked through all your
    posts in this thread and could not see any examples where length(array)
    was used.

    True, the code I posted were examples of cases where length(array) would
    be useful instead of the shown code and cases where length(string) is
    similarly useful, not necessary, to draw the obvious comparison between
    the 2 uses for length().

    Obviously I don't expect you to put in any work looking out such
    examples just because I ask, but you might have some to hand, given the interest you have in including it in a future POSIX standard. Use cases would be useful in that context too.

    No problem. Consider, for example, this code using length(array) that
    stores your input in an array and if some condition is encountered
    deletes the entry at index 17 if it exists:

    { arr[++idx] = $0 }
    some_condition { delete arr[17] }
    END { print "Number of elements:", length(arr) }

    vs this if you want to have to introduce a separate variable (`cnt`) to
    track the number of elements in the array and remember to
    increment/decrement it everywhere in your code that the array changes:

    { arr[++idx] = $0; cnt++ }
    some_condition {
    if ( 17 in arr ) {
    delete arr[17]
    cnt--
    }
    }
    END { print "Number of elements:", cnt+0 }

    or this otherwise:

    { arr[++idx] = $0 }
    some_condition { delete arr[17] }
    END {
    for ( i in arr ) {
    cnt++
    }
    print "Number of elements:", cnt+0
    }

    As another example, lets imagine you want a function to paste the
    elements of 2 numerically indexed arrays side by side so you could do:

    BEGIN {
    split("sue bob jan",a)
    split("sue joe jan alf",b)

    print paste(a,b)
    }

    and get the output:

    <sue:sue> <bob:joe> <jan:jan> <:alf>

    Here is that function using length(array):

    function paste(left,right, i,n,out) {
    n = ( length(left) > length(right) ? length(left) : length(right) )
    for (i=1; i<=n; i++) {
    out = out "<"left[i]":"right[i]">" OFS
    }
    return out
    }

    and here it is otherwise assuming you don't want to have to keep
    separate counters of the sizes of every array, incrementing and
    decrementing them every time the array s change, and change the
    arguments to the function to pass those in:

    function paste(left,right, lgthLeft,lgthRight,i,n,out) {
    for (i=1; i in left; i++) {
    lgthLeft++
    }
    for (i=1; i in right; i++) {
    lgthRight++
    }
    n = ( lgthLeft > lgthRight ? lgthLeft : lgthRight )
    for (i=1; i<=n; i++) {
    out = out "<"left[i]":"right[i]">" OFS
    }
    return out
    }

    Regards,

    Regards,

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Ed Morton on Fri Mar 5 22:41:35 2021
    Ed Morton <mortonspam@gmail.com> writes:

    On 3/5/2021 2:20 PM, Ben Bacarisse wrote:
    Ed Morton <mortonspam@gmail.com> writes:

    On 3/5/2021 10:23 AM, Janis Papanagnou wrote:

    ...I had expected that you could provide some examples from your
    practical contexts - I'd expected you have some. So it seems you
    have none. That would have been a sufficient answer, either way.
    Just claiming it's useful isn't helpful for the question asked.

    I've given you enough examples.

    Things have got a bit heated. I did not know about (and had not missed)
    gawk's length(array) extension so I too was hoping to see what sort of
    scripts use it. In case I'd missed them, I looked through all your
    posts in this thread and could not see any examples where length(array)
    was used.

    True, the code I posted were examples of cases where length(array)
    would be useful instead of the shown code and cases where
    length(string) is similarly useful, not necessary, to draw the obvious comparison between the 2 uses for length().

    Obviously I don't expect you to put in any work looking out such
    examples just because I ask, but you might have some to hand, given the
    interest you have in including it in a future POSIX standard. Use cases
    would be useful in that context too.

    No problem. Consider, for example, this code using length(array) that
    stores your input in an array and if some condition is encountered
    deletes the entry at index 17 if it exists:

    { arr[++idx] = $0 }
    some_condition { delete arr[17] }
    END { print "Number of elements:", length(arr) }

    vs this if you want to have to introduce a separate variable (`cnt`)
    to track the number of elements in the array and remember to increment/decrement it everywhere in your code that the array changes:

    Is this a case you've come across? It look rather unusual.

    { arr[++idx] = $0; cnt++ }
    some_condition {
    if ( 17 in arr ) {
    delete arr[17]
    cnt--
    }
    }
    END { print "Number of elements:", cnt+0 }

    My preference would be

    { arr[++added] = $0 }
    some_condition {
    if ( 17 in arr ) {
    delete arr[17]
    deleted++
    }
    }
    END { print "Number of elements:", added-deleted }

    As another example, lets imagine you want a function to paste the
    elements of 2 numerically indexed arrays side by side so you could do:

    BEGIN {
    split("sue bob jan",a)
    split("sue joe jan alf",b)

    print paste(a,b)
    }

    Good example. I can imagine this coming up in real code -- it's just
    the function often called zip.

    and get the output:

    <sue:sue> <bob:joe> <jan:jan> <:alf>

    Here is that function using length(array):

    function paste(left,right, i,n,out) {
    n = ( length(left) > length(right) ? length(left) : length(right) )
    for (i=1; i<=n; i++) {
    out = out "<"left[i]":"right[i]">" OFS
    }
    return out
    }

    I'd write

    function paste(left, right, i, out) {
    for (i = 1; i in left || i in right; i++)
    out = out "<" left[i] ":" right[i] ">" OFS
    return out
    }

    It's shorter and, to me, more obvious without length. It also works
    when elements have been deleted.

    and here it is otherwise assuming you don't want to have to keep
    separate counters of the sizes of every array, incrementing and
    decrementing them every time the array s change, and change the
    arguments to the function to pass those in:

    function paste(left,right, lgthLeft,lgthRight,i,n,out) {
    for (i=1; i in left; i++) {
    lgthLeft++
    }
    for (i=1; i in right; i++) {
    lgthRight++
    }
    n = ( lgthLeft > lgthRight ? lgthLeft : lgthRight )
    for (i=1; i<=n; i++) {
    out = out "<"left[i]":"right[i]">" OFS
    }
    return out
    }

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Ben Bacarisse on Fri Mar 5 17:04:56 2021
    On 3/5/2021 4:41 PM, Ben Bacarisse wrote:
    Ed Morton <mortonspam@gmail.com> writes:

    On 3/5/2021 2:20 PM, Ben Bacarisse wrote:
    Ed Morton <mortonspam@gmail.com> writes:

    On 3/5/2021 10:23 AM, Janis Papanagnou wrote:

    ...I had expected that you could provide some examples from your
    practical contexts - I'd expected you have some. So it seems you
    have none. That would have been a sufficient answer, either way.
    Just claiming it's useful isn't helpful for the question asked.

    I've given you enough examples.

    Things have got a bit heated. I did not know about (and had not missed) >>> gawk's length(array) extension so I too was hoping to see what sort of
    scripts use it. In case I'd missed them, I looked through all your
    posts in this thread and could not see any examples where length(array)
    was used.

    True, the code I posted were examples of cases where length(array)
    would be useful instead of the shown code and cases where
    length(string) is similarly useful, not necessary, to draw the obvious
    comparison between the 2 uses for length().

    Obviously I don't expect you to put in any work looking out such
    examples just because I ask, but you might have some to hand, given the
    interest you have in including it in a future POSIX standard. Use cases >>> would be useful in that context too.

    No problem. Consider, for example, this code using length(array) that
    stores your input in an array and if some condition is encountered
    deletes the entry at index 17 if it exists:

    { arr[++idx] = $0 }
    some_condition { delete arr[17] }
    END { print "Number of elements:", length(arr) }

    vs this if you want to have to introduce a separate variable (`cnt`)
    to track the number of elements in the array and remember to
    increment/decrement it everywhere in your code that the array changes:

    Is this a case you've come across? It look rather unusual.

    Yes, adding elements to an array and later deleting some of them under
    various conditions before finally needing to know how many elements are
    in the array is common-place, I just minimized the code.


    { arr[++idx] = $0; cnt++ }
    some_condition {
    if ( 17 in arr ) {
    delete arr[17]
    cnt--
    }
    }
    END { print "Number of elements:", cnt+0 }

    My preference would be

    { arr[++added] = $0 }
    some_condition {
    if ( 17 in arr ) {
    delete arr[17]
    deleted++
    }
    }
    END { print "Number of elements:", added-deleted }

    My preference would for a single variable if length() wasn't an option
    but either way. Keep in mind you'd be doing this for the N arrays that
    you might need the length() of and so you'll have N * both variables in
    general.

    As another example, lets imagine you want a function to paste the
    elements of 2 numerically indexed arrays side by side so you could do:

    BEGIN {
    split("sue bob jan",a)
    split("sue joe jan alf",b)

    print paste(a,b)
    }

    Good example. I can imagine this coming up in real code -- it's just
    the function often called zip.

    and get the output:

    <sue:sue> <bob:joe> <jan:jan> <:alf>

    Here is that function using length(array):

    function paste(left,right, i,n,out) {
    n = ( length(left) > length(right) ? length(left) : length(right) ) >> for (i=1; i<=n; i++) {
    out = out "<"left[i]":"right[i]">" OFS
    }
    return out
    }

    I'd write

    function paste(left, right, i, out) {
    for (i = 1; i in left || i in right; i++)
    out = out "<" left[i] ":" right[i] ">" OFS
    return out
    }

    It's shorter and, to me, more obvious without length. It also works
    when elements have been deleted.

    That would fail if the same index had been deleted from both arrays as
    the loop would exit at that deleted index instead of continuing to
    process the indices after that point.

    Ed.


    and here it is otherwise assuming you don't want to have to keep
    separate counters of the sizes of every array, incrementing and
    decrementing them every time the array s change, and change the
    arguments to the function to pass those in:

    function paste(left,right, lgthLeft,lgthRight,i,n,out) {
    for (i=1; i in left; i++) {
    lgthLeft++
    }
    for (i=1; i in right; i++) {
    lgthRight++
    }
    n = ( lgthLeft > lgthRight ? lgthLeft : lgthRight )
    for (i=1; i<=n; i++) {
    out = out "<"left[i]":"right[i]">" OFS
    }
    return out
    }


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Ben Bacarisse on Sat Mar 6 08:07:33 2021
    On 3/6/2021 7:38 AM, Ben Bacarisse wrote:
    Ed Morton <mortonspam@gmail.com> writes:

    On 3/5/2021 4:41 PM, Ben Bacarisse wrote:
    Ed Morton <mortonspam@gmail.com> writes:

    As another example, lets imagine you want a function to paste the
    elements of 2 numerically indexed arrays side by side so you could do: >>>>
    BEGIN {
    split("sue bob jan",a)
    split("sue joe jan alf",b)

    print paste(a,b)
    }

    Good example. I can imagine this coming up in real code -- it's just
    the function often called zip.

    and get the output:

    <sue:sue> <bob:joe> <jan:jan> <:alf>

    Here is that function using length(array):

    function paste(left,right, i,n,out) {
    n = ( length(left) > length(right) ? length(left) : length(right) )
    for (i=1; i<=n; i++) {
    out = out "<"left[i]":"right[i]">" OFS
    }
    return out
    }

    I'd write

    function paste(left, right, i, out) {
    for (i = 1; i in left || i in right; i++)
    out = out "<" left[i] ":" right[i] ">" OFS
    return out
    }

    It's shorter and, to me, more obvious without length. It also works
    when elements have been deleted.

    That would fail if the same index had been deleted from both arrays as
    the loop would exit at that deleted index instead of continuing to
    process the indices after that point.

    Thanks for pointing out the bug. I had, in fact, tested this case but I missed it because both functions have a another bug: they modify the
    arrays. I'd called yours in the test before mine, and that masked the
    bug. Both functions should probably use

    out = out "<" (i in left ? left[i] : "") ":"\
    (i in right ? right[i] : "") ">" OFS

    to avoid altering either the length or the result of any "in" tests.

    But the key point is that your version also fails in the face of deleted elements (though obviously in a different way).

    It's an interesting problem, but so far, not one that shows that length helps. The fact that the length is adjusted by a delete is unhelpful
    when what you really want is the maximum index.

    Yeah, you're right, it was a crap example, I shouldn't have rushed to
    put something together without really thinking it through. My enthusiasm
    level is pretty low to invest any more time in this but just think of
    any situation where you have N arrays in your code and a function that
    takes an array (or multiple arrays) and needs to know how many elements
    are in it/them to do something with it/them - your options are to:

    a) keep "numberOfElementsN" variables (or numberAddedN and
    numberedDeletedN as you suggested previously) for every such array and
    pass those as args to the function or
    b) write a loop in the function to count the elements in each array or
    c) call length(array) in the function.

    IMHO "c" is the clear winner. Again, it's not necessary, it's simply
    useful and consistent with other functionality. If I feel a burning
    desire later to actually look around for or come up with a specific
    example I may do.

    Ed.


    TL;DR: If we can assume contiguous indexes, I prefer my solution above,
    and in the presence of deletes, this is my best attempt:

    function paste(left, right, i, ub, out) {
    ub = 0;
    for (i in left) if (i > ub) ub = i;
    for (i in right) if (i > ub) ub = i;
    for (i = 1; i <= ub; i++)
    out = out "<" (i in left ? left[i] : "") ":"\
    (i in right ? right[i] : "") ">" OFS
    return out
    }

    Can length(arr) help with this?


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Ed Morton on Sat Mar 6 13:38:03 2021
    Ed Morton <mortonspam@gmail.com> writes:

    On 3/5/2021 4:41 PM, Ben Bacarisse wrote:
    Ed Morton <mortonspam@gmail.com> writes:

    As another example, lets imagine you want a function to paste the
    elements of 2 numerically indexed arrays side by side so you could do:

    BEGIN {
    split("sue bob jan",a)
    split("sue joe jan alf",b)

    print paste(a,b)
    }

    Good example. I can imagine this coming up in real code -- it's just
    the function often called zip.

    and get the output:

    <sue:sue> <bob:joe> <jan:jan> <:alf>

    Here is that function using length(array):

    function paste(left,right, i,n,out) {
    n = ( length(left) > length(right) ? length(left) : length(right) )
    for (i=1; i<=n; i++) {
    out = out "<"left[i]":"right[i]">" OFS
    }
    return out
    }

    I'd write

    function paste(left, right, i, out) {
    for (i = 1; i in left || i in right; i++)
    out = out "<" left[i] ":" right[i] ">" OFS
    return out
    }

    It's shorter and, to me, more obvious without length. It also works
    when elements have been deleted.

    That would fail if the same index had been deleted from both arrays as
    the loop would exit at that deleted index instead of continuing to
    process the indices after that point.

    Thanks for pointing out the bug. I had, in fact, tested this case but I
    missed it because both functions have a another bug: they modify the
    arrays. I'd called yours in the test before mine, and that masked the
    bug. Both functions should probably use

    out = out "<" (i in left ? left[i] : "") ":"\
    (i in right ? right[i] : "") ">" OFS

    to avoid altering either the length or the result of any "in" tests.

    But the key point is that your version also fails in the face of deleted elements (though obviously in a different way).

    It's an interesting problem, but so far, not one that shows that length
    helps. The fact that the length is adjusted by a delete is unhelpful
    when what you really want is the maximum index.

    TL;DR: If we can assume contiguous indexes, I prefer my solution above,
    and in the presence of deletes, this is my best attempt:

    function paste(left, right, i, ub, out) {
    ub = 0;
    for (i in left) if (i > ub) ub = i;
    for (i in right) if (i > ub) ub = i;
    for (i = 1; i <= ub; i++)
    out = out "<" (i in left ? left[i] : "") ":"\
    (i in right ? right[i] : "") ">" OFS
    return out
    }

    Can length(arr) help with this?

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Ed Morton on Sat Mar 6 15:38:08 2021
    Ed Morton <mortonspam@gmail.com> writes:

    ... My enthusiasm level is pretty low to invest any more time in this
    but just think of any situation where you have N arrays in your code
    and a function that takes an array (or multiple arrays) and needs to
    know how many elements are in it/them to do something with it/them -
    your options are to:

    a) keep "numberOfElementsN" variables (or numberAddedN and
    numberedDeletedN as you suggested previously) for every such array and
    pass those as args to the function or
    b) write a loop in the function to count the elements in each array or
    c) call length(array) in the function.

    (a) suggests you are considering deleted elements, but (c) (the number
    of elements) is not the right number to let you "do something with
    it/them".

    IMHO "c" is the clear winner.

    I agree, provided no elements might have been deleted. Then you might
    want maxindex(arr) to get highest numeric index. That would solve the
    zip problem too. In fact, for arrays with numeric indexes (those where
    you need to know how many elements are in it to do something with
    it/them), maxindex(arr) (possibly with a matching minindex(arr)) might
    be better in almost every case.

    For arrays without numeric keys, maxindex(arr) would be useless, but
    then length(arr) is not going to be so useful either.

    I'd argue for both, but maxindex(arr) is probably not going to as simple
    for implementations to add.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Ben Bacarisse on Sat Mar 6 09:54:54 2021
    On 3/6/2021 9:38 AM, Ben Bacarisse wrote:
    Ed Morton <mortonspam@gmail.com> writes:

    ... My enthusiasm level is pretty low to invest any more time in this
    but just think of any situation where you have N arrays in your code
    and a function that takes an array (or multiple arrays) and needs to
    know how many elements are in it/them to do something with it/them -
    your options are to:

    a) keep "numberOfElementsN" variables (or numberAddedN and
    numberedDeletedN as you suggested previously) for every such array and
    pass those as args to the function or
    b) write a loop in the function to count the elements in each array or
    c) call length(array) in the function.

    (a) suggests you are considering deleted elements, but (c) (the number
    of elements) is not the right number to let you "do something with
    it/them".

    IMHO "c" is the clear winner.

    I agree, provided no elements might have been deleted. Then you might
    want maxindex(arr) to get highest numeric index.

    Yeah you might but that's simply a different problem. Sometimes you want
    the number of elements in any array (as length(array) provides) and
    other times you want the min and/or max indices in a numeric array but
    that's not the case I'm referring to in this discussion, it's simply the
    common and garden case where you need to know how many elements are in
    an array.

    Ed.

    That would solve the
    zip problem too. In fact, for arrays with numeric indexes (those where
    you need to know how many elements are in it to do something with
    it/them), maxindex(arr) (possibly with a matching minindex(arr)) might
    be better in almost every case.

    For arrays without numeric keys, maxindex(arr) would be useless, but
    then length(arr) is not going to be so useful either.

    I'd argue for both, but maxindex(arr) is probably not going to as simple
    for implementations to add.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Ben Bacarisse on Mon Mar 8 08:14:26 2021
    On 3/5/2021 2:20 PM, Ben Bacarisse wrote:
    Ed Morton <mortonspam@gmail.com> writes:

    On 3/5/2021 10:23 AM, Janis Papanagnou wrote:

    ...I had expected that you could provide some examples from your
    practical contexts - I'd expected you have some. So it seems you
    have none. That would have been a sufficient answer, either way.
    Just claiming it's useful isn't helpful for the question asked.

    I've given you enough examples.

    Things have got a bit heated. I did not know about (and had not missed) gawk's length(array) extension so I too was hoping to see what sort of scripts use it. In case I'd missed them, I looked through all your
    posts in this thread and could not see any examples where length(array)
    was used.

    Obviously I don't expect you to put in any work looking out such
    examples just because I ask, but you might have some to hand, given the interest you have in including it in a future POSIX standard. Use cases would be useful in that context too.


    FWIW I just used length(array) so figured I'd share how given the above
    request for use cases. I have an existing (about 10 years old) ~300 line
    awk script that analyzes the emails I get at a given account and takes
    various actions. It looks like this

    {
    Collect data and, among other things, populate cnt[] with
    a count of emails of a specific type received per sender
    email addresses.
    }
    END {
    for (addr in cnt) {
    if (cnt[addr] > threshold) {
    do something
    }
    }
    }
    }

    Today it was taking longer than usual to run so I decided it'd be useful
    to know how many addresses are about to be processed before starting
    that loop so that I get an indication of where it's at in the execution,
    how much output to expect and how long it'll take to run and so tweaked
    it to add:

    printf "About to process %d addresses\n", length(cnt) | "cat>&2"

    as the first line of the END section. Clean, simple, succinct, with no
    need to add other variables or a loop or make any changes to the rest of
    the 10-year-old code.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to mortonspam@gmail.com on Mon Mar 8 15:53:08 2021
    In article <s25bg2$dg8$1@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    ...
    printf "About to process %d addresses\n", length(cnt) | "cat>&2"

    print "About to process",length(cnt),"addresses" > "/dev/stderr"

    HTH

    --

    "If God wanted us to believe in him, he'd exist."

    (Linda Smith on "10 Funniest Londoners", TimeOut, 23rd June, 2005.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Ed Morton on Mon Mar 8 17:34:45 2021
    On 2021-03-08, Ed Morton <mortonspam@gmail.com> wrote:
    FWIW I just used length(array) so figured I'd share how given the above request for use cases. I have an existing (about 10 years old) ~300 line
    awk script that analyzes the emails I get at a given account and takes various actions. It looks like this

    My experiments with GNU Awk's length(array) reveal a serious flaw.

    If array is not yet defined, then length(array) mutates its state,
    turning it into a scalar.

    For instance, this program has an issue when the input stream is empty,
    unless we remove the length(a) line:

    function report (a,
    i)
    {
    printf("length(a) = %d\n", length(a))

    for (i in a) {
    printf("a[%s] = %s\n", i, a[i])
    }
    }

    { array[NR] = $0 }


    END { report(array) }

    $ awk -f length.awk
    a
    b
    c
    length(a) = 3
    a[1] = a
    a[2] = b
    a[3] = c
    $ awk -f length.awk
    length(a) = 0
    awk: length.awk:6: fatal: attempt to use scalar parameter `a' as an array

    The requirement is that if x is undefined, length(x) should just leave
    it undefined.

    I can obtain a length function with this property if I make it
    user-defined.

    function array_len(a,
    len, i)
    {
    len = 0
    for (i in a)
    len++
    return len
    }

    function report (a,
    i)
    {
    printf("length(a) = %d\n", array_len(a))
    for (i in a) {
    printf("a[%s] = %s\n", i, a[i])
    }
    }

    { array[NR] = $0 }

    END { report(array) }

    Now the program works no longer dies with an error message for an empty
    input.

    Why does the built-in length function/operator have to mutate the
    variable, but a user-defined function appears to be pure pass-by-value?

    That behavior is acceptable if the only operand to which we can apply
    length is the character string. If length works only for character
    strings, and those are scalar, then it makes sense to infer that x
    must be scalar if length(x) has been applied.

    (Well, or at least, it makes "sense" if we accept the premise that
    strings cannot be traversed with for (i in str), nor accessed
    with str[i].)

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Kaz Kylheku on Mon Mar 8 15:50:21 2021
    On 3/8/2021 11:34 AM, Kaz Kylheku wrote:
    On 2021-03-08, Ed Morton <mortonspam@gmail.com> wrote:
    FWIW I just used length(array) so figured I'd share how given the above
    request for use cases. I have an existing (about 10 years old) ~300 line
    awk script that analyzes the emails I get at a given account and takes
    various actions. It looks like this

    My experiments with GNU Awk's length(array) reveal a serious flaw.

    If array is not yet defined, then length(array) mutates its state,
    turning it into a scalar.

    The gawk guys could provide C&V on this but FWIW - that behavior is
    stated in the gawk manual (https://www.gnu.org/software/gawk/manual/gawk.html#String-Functions):

    ------
    If length() is called with a variable that has not been used, gawk
    forces the variable to be a scalar. Other implementations of awk leave
    the variable without a type. (d.c.) Consider:

    $ gawk 'BEGIN { print length(x) ; x[1] = 1 }'
    -| 0
    error→ gawk: fatal: attempt to use scalar `x' as array

    $ nawk 'BEGIN { print length(x) ; x[1] = 1 }'
    -| 0

    If --lint has been specified on the command line, gawk issues a warning
    about this.
    ------

    It doesn't explain "why" that's the case, just states that it is, but I wouldn't be surprised if some or all versions of awk that don't support `length(array)` force the type of any variable passed to `length()` to
    be string and, if so, then gawk would be consistent with that behavior
    and so not break existing scripts being ported to gawk.

    Ed.


    For instance, this program has an issue when the input stream is empty, unless we remove the length(a) line:

    function report (a,
    i)
    {
    printf("length(a) = %d\n", length(a))

    for (i in a) {
    printf("a[%s] = %s\n", i, a[i])
    }
    }

    { array[NR] = $0 }


    END { report(array) }

    $ awk -f length.awk
    a
    b
    c
    length(a) = 3
    a[1] = a
    a[2] = b
    a[3] = c
    $ awk -f length.awk
    length(a) = 0
    awk: length.awk:6: fatal: attempt to use scalar parameter `a' as an array

    The requirement is that if x is undefined, length(x) should just leave
    it undefined.

    I can obtain a length function with this property if I make it
    user-defined.

    function array_len(a,
    len, i)
    {
    len = 0
    for (i in a)
    len++
    return len
    }

    function report (a,
    i)
    {
    printf("length(a) = %d\n", array_len(a))
    for (i in a) {
    printf("a[%s] = %s\n", i, a[i])
    }
    }

    { array[NR] = $0 }

    END { report(array) }

    Now the program works no longer dies with an error message for an empty input.

    Why does the built-in length function/operator have to mutate the
    variable, but a user-defined function appears to be pure pass-by-value?

    That behavior is acceptable if the only operand to which we can apply
    length is the character string. If length works only for character
    strings, and those are scalar, then it makes sense to infer that x
    must be scalar if length(x) has been applied.

    (Well, or at least, it makes "sense" if we accept the premise that
    strings cannot be traversed with for (i in str), nor accessed
    with str[i].)


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)