Forum: >>> Magnum BBS <<<

Finding POSIX standard updates for awk

From Ed Morton@21:1/5 to All on Thu Mar 4 07:36:17 2021

There are a few awk changes I see planned for a future POSIX standards
update, e.g.:

delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634

Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for
inclusion in the standard so I'm wondering if I'm just missing it and
what other awk changes might be coming.

Q18 in the FAQ at http://www.opengroup.org/austin/papers/posix_faq.html
says that next POSIX standard revision is planned to be issued in 2022
and that seems like it will be Issue 8 given the Issue numbers I see at
the top of:

2004 Edition (https://pubs.opengroup.org/onlinepubs/009696799/) = Issue 6
2018 Edition (https://pubs.opengroup.org/onlinepubs/9699919799/) = Issue 7

The above 3 awk changes are tagged with "issue 8" so I assume they are
all going to be present in that 2022 issue of the standard, it appears I
can see all changes associated with "issue 8" (2022) at https://www.austingroupbugs.net/tag_view_page.php?tag_id=8 and clicking
on the "Attached Issues" link there lists 177 issues.

I can manually search for "awk" in the Summary for each issue but the
tool name isn't always present in the summary text so - is there a
robust way to see all planned changes for a given tool such as awk?

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Janis Papanagnou on Thu Mar 4 16:20:40 2021

On 04.03.2021 16:16, Janis Papanagnou wrote:

On 04.03.2021 14:36, Ed Morton wrote:

There are a few awk changes I see planned for a future POSIX standards
update, e.g.:

delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634

Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for
inclusion in the standard so I'm wondering if I'm just missing it [...]

The referenced issues carry comments about the features being widely available in other awks. Is that also true for 'length(array)'?

And to add: also efficiency seems to be a concern (at least in
'delete(array)' and 'nextfile'). Dynamically determining the
'length(array)' might not qualify in that respect (also adding
an implicit counter might not qualify for obvious reasons).

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Ed Morton on Thu Mar 4 16:16:26 2021

On 04.03.2021 14:36, Ed Morton wrote:

There are a few awk changes I see planned for a future POSIX standards update, e.g.:

delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634

Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for inclusion in the standard so I'm wondering if I'm just missing it [...]

The referenced issues carry comments about the features being widely
available in other awks. Is that also true for 'length(array)'?

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Janis Papanagnou on Thu Mar 4 09:53:00 2021

On 3/4/2021 9:20 AM, Janis Papanagnou wrote:

On 04.03.2021 16:16, Janis Papanagnou wrote:

On 04.03.2021 14:36, Ed Morton wrote:

There are a few awk changes I see planned for a future POSIX standards
update, e.g.:

delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634

Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for
inclusion in the standard so I'm wondering if I'm just missing it [...]

The referenced issues carry comments about the features being widely
available in other awks. Is that also true for 'length(array)'?

And to add: also efficiency seems to be a concern (at least in 'delete(array)' and 'nextfile'). Dynamically determining the
'length(array)' might not qualify in that respect (also adding
an implicit counter might not qualify for obvious reasons).

The description of `delete(array)` in that link is misleading as it's
really replacing `split("",array)` rather than `for (i in array) delete array[i]` in common use.

Idk how commonly length(array) is implemented but it's commonly used
when handling arrays in gawk scripts, would presumably be implemented
more efficiently than `c=0; for (i in array) c++`, and can't break any
existing scripts when introduced so it seems to me like an excellent
candidate for the standard.

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Ed Morton on Thu Mar 4 17:14:09 2021

On 04.03.2021 16:53, Ed Morton wrote:

On 3/4/2021 9:20 AM, Janis Papanagnou wrote:

On 04.03.2021 16:16, Janis Papanagnou wrote:

On 04.03.2021 14:36, Ed Morton wrote:

There are a few awk changes I see planned for a future POSIX standards >>>> update, e.g.:

delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634

Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for
inclusion in the standard so I'm wondering if I'm just missing it [...] >>>

The referenced issues carry comments about the features being widely
available in other awks. Is that also true for 'length(array)'?

And to add: also efficiency seems to be a concern (at least in
'delete(array)' and 'nextfile'). Dynamically determining the
'length(array)' might not qualify in that respect (also adding
an implicit counter might not qualify for obvious reasons).

The description of `delete(array)` in that link is misleading as it's
really replacing `split("",array)` rather than `for (i in array) delete array[i]` in common use.

Idk how commonly length(array) is implemented but it's commonly used
when handling arrays in gawk scripts, would presumably be implemented
more efficiently than `c=0; for (i in array) c++`, and can't break any existing scripts when introduced so it seems to me like an excellent candidate for the standard.

Well, what I was aiming at was that in cases where you need to
interrogate the number of elements in the array you don't need
to traverse the whole array (that may be costly) but can count
on insertion and on deletion of elements. Or, of course, if it
fits better you can also loop-count (in cases where you'd not
need to interrogate that number often). A built-in length(arr)
function would need to impose costs on *any* user - either to increment/decrement a counter, or loop across the whole array,
which may unnecessarily be a of bad performance. I think that
characteristic makes it not a perfect feature candidate for a
standard implementation.

Janis

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Janis Papanagnou on Thu Mar 4 11:25:56 2021

On 3/4/2021 10:14 AM, Janis Papanagnou wrote:

On 04.03.2021 16:53, Ed Morton wrote:

On 3/4/2021 9:20 AM, Janis Papanagnou wrote:

On 04.03.2021 16:16, Janis Papanagnou wrote:

On 04.03.2021 14:36, Ed Morton wrote:

There are a few awk changes I see planned for a future POSIX standards >>>>> update, e.g.:

delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634

Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for >>>>> inclusion in the standard so I'm wondering if I'm just missing it [...] >>>>

The referenced issues carry comments about the features being widely
available in other awks. Is that also true for 'length(array)'?

And to add: also efficiency seems to be a concern (at least in
'delete(array)' and 'nextfile'). Dynamically determining the
'length(array)' might not qualify in that respect (also adding
an implicit counter might not qualify for obvious reasons).

The description of `delete(array)` in that link is misleading as it's
really replacing `split("",array)` rather than `for (i in array) delete
array[i]` in common use.

Idk how commonly length(array) is implemented but it's commonly used
when handling arrays in gawk scripts, would presumably be implemented
more efficiently than `c=0; for (i in array) c++`, and can't break any
existing scripts when introduced so it seems to me like an excellent
candidate for the standard.

Well, what I was aiming at was that in cases where you need to
interrogate the number of elements in the array you don't need
to traverse the whole array (that may be costly) but can count
on insertion and on deletion of elements. Or, of course, if it
fits better you can also loop-count (in cases where you'd not
need to interrogate that number often). A built-in length(arr)
function would need to impose costs on *any* user - either to increment/decrement a counter, or loop across the whole array,
which may unnecessarily be a of bad performance. I think that
characteristic makes it not a perfect feature candidate for a
standard implementation.

I wouldn't be surprised if existing implementations already tracked how
many elements are in an array and so then the length(array) function
just becomes returning that value and in the absolute worst case where
that isn't true and adding such a counter on insertions/deletions isn't absolutely trivial with negligible performance impact, the provider of
that awk variant could implement length(array) as a loop so they don't
have to do anything extra at all on insertions/deletions, and it'd be no
more expensive than the user manually writing such a loop (probably much faster) and even IF that's unacceptably slow then the user still has
the option of just not calling it and instead keeping a counter of insertions/deletions manually.

There's simply no down side to having length(array) in the language,
other than that some awk implementations would need a trivial tweak to
provide it and it lets us write scripts that can benefit from it (by
reduced code and/or improved efficiency and/or ability to write more
general functions that can operate on arrays) more portably.

Regards,

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Ed Morton on Fri Mar 5 09:04:56 2021

On 04.03.2021 18:25, Ed Morton wrote:

There's simply no down side to having length(array) in the language,
other than that some awk implementations would need a trivial tweak to provide it and it lets us write scripts that can benefit from it (by
reduced code and/or improved efficiency and/or ability to write more
general functions that can operate on arrays) more portably.

All practical applications I pondered about and those grep'ed in
my sources did not show the necessity. All are typically solvable
in easy ways with negligible overhead without performance issues.

Maybe you can provide a concrete example from your practices that
obviously demonstrate the necessity of a standard 'length(array)'?

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Aharon Robbins@21:1/5 to janis_papanagnou@hotmail.com on Fri Mar 5 09:04:45 2021

In article <s1qtka$a90$1@news-1.m-online.net>,
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

On 04.03.2021 14:36, Ed Morton wrote:

There are a few awk changes I see planned for a future POSIX standards
update, e.g.:

delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634

Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for
inclusion in the standard so I'm wondering if I'm just missing it [...]

I have a good sized list of comments / enhancement requests for POSIX
and length(array) is one of them. I just need a few hours to enter
them into the Austin Group bug system.

The referenced issues carry comments about the features being widely >available in other awks. Is that also true for 'length(array)'?

It's implemented in gawk and BWK awk, as well as in mawk 1.9.9.6.

Gawk and BWK awk keep a count of the number of elements in an array,
so length(array) has neglible cost.

It's been in BWK awk since 2002 (!), and was in gawk even before then. So,
it's about time it got standardized.
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Geoff Clare@21:1/5 to Ed Morton on Fri Mar 5 12:08:03 2021

Ed Morton wrote:

delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634

[...]

The above 3 awk changes are tagged with "issue 8" so I assume they are
all going to be present in that 2022 issue of the standard, it appears I
can see all changes associated with "issue 8" (2022) at https://www.austingroupbugs.net/tag_view_page.php?tag_id=8 and clicking
on the "Attached Issues" link there lists 177 issues.

I can manually search for "awk" in the Summary for each issue but the
tool name isn't always present in the summary text so - is there a
robust way to see all planned changes for a given tool such as awk?

Restricting your search to bugs tagged "issue8" could mean you miss
some things, for two reasons:

1. The "issue8" tag is added when a bug is resolved (with a change to
be made), so there could be as yet unresolved bugs that will result in
a change in Issue 8.

2. There is also a "tc3-2008" tag which is for bugs that would be
suitable for inclusion in a 3rd TC for Issue 7, if the Austin Group
decides to produce one. (It doesn't currently plan to, but is keeping
the option open.) The Issue 8 draft has these applied as well as the
"issue8" tagged bugs.

As regards a more robust way to list the awk-related bugs of interest to
you: if you are primarily looking for feature additions (as opposed to
minor bug fixes etc.) then I would suggest you expand the filters and
click on "Advanced Filters" then select all of the Status values except "Closed", and all of the Section values that include awk (there are 4 at
the moment - use your browser's "find in page" feature to find them
quickly), then click "Apply Filter". This will finds bugs that were
reported specifically about awk (or awk and other things) - currently
13 bugs.

If you really want to find everything that affects awk, then instead of
using the Section values you should just put "awk" in the search box.
This will, of course, produce some false positives from words like
awkward, but there are ony 36 results so not a huge amount to weed
through.

The reason for omitting "Closed" status is because old bugs that were
fixed in Issue 7 or one of its TCs will be on Closed status.

--
Geoff Clare <netnews@gclare.org.uk>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Janis Papanagnou on Fri Mar 5 06:39:41 2021

On 3/5/2021 2:04 AM, Janis Papanagnou wrote:

On 04.03.2021 18:25, Ed Morton wrote:

There's simply no down side to having length(array) in the language,
other than that some awk implementations would need a trivial tweak to
provide it and it lets us write scripts that can benefit from it (by
reduced code and/or improved efficiency and/or ability to write more
general functions that can operate on arrays) more portably.

All practical applications I pondered about and those grep'ed in
my sources did not show the necessity. All are typically solvable
in easy ways with negligible overhead without performance issues.

Maybe you can provide a concrete example from your practices that
obviously demonstrate the necessity of a standard 'length(array)'?

Of course it's not necessary. Neither is `length(string)`:

$ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++); print c}'
3

nor `delete(array)` nor several other useful constructs.

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Geoff Clare on Fri Mar 5 07:31:25 2021

On 3/5/2021 6:08 AM, Geoff Clare wrote:

Ed Morton wrote:

delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634

[...]

The above 3 awk changes are tagged with "issue 8" so I assume they are
all going to be present in that 2022 issue of the standard, it appears I
can see all changes associated with "issue 8" (2022) at
https://www.austingroupbugs.net/tag_view_page.php?tag_id=8 and clicking
on the "Attached Issues" link there lists 177 issues.

I can manually search for "awk" in the Summary for each issue but the
tool name isn't always present in the summary text so - is there a
robust way to see all planned changes for a given tool such as awk?

Restricting your search to bugs tagged "issue8" could mean you miss
some things, for two reasons:

1. The "issue8" tag is added when a bug is resolved (with a change to
be made), so there could be as yet unresolved bugs that will result in
a change in Issue 8.

2. There is also a "tc3-2008" tag which is for bugs that would be
suitable for inclusion in a 3rd TC for Issue 7, if the Austin Group
decides to produce one. (It doesn't currently plan to, but is keeping
the option open.) The Issue 8 draft has these applied as well as the "issue8" tagged bugs.

As regards a more robust way to list the awk-related bugs of interest to
you: if you are primarily looking for feature additions (as opposed to
minor bug fixes etc.) then I would suggest you expand the filters and
click on "Advanced Filters" then select all of the Status values except "Closed", and all of the Section values that include awk (there are 4 at
the moment - use your browser's "find in page" feature to find them
quickly), then click "Apply Filter". This will finds bugs that were
reported specifically about awk (or awk and other things) - currently
13 bugs.

If you really want to find everything that affects awk, then instead of
using the Section values you should just put "awk" in the search box.
This will, of course, produce some false positives from words like
awkward, but there are ony 36 results so not a huge amount to weed
through.

The reason for omitting "Closed" status is because old bugs that were
fixed in Issue 7 or one of its TCs will be on Closed status.

Thanks for the info!

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Aharon Robbins on Fri Mar 5 07:30:43 2021

On 3/5/2021 3:04 AM, Aharon Robbins wrote:

In article <s1qtka$a90$1@news-1.m-online.net>,
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

On 04.03.2021 14:36, Ed Morton wrote:

There are a few awk changes I see planned for a future POSIX standards
update, e.g.:

delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634

Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for
inclusion in the standard so I'm wondering if I'm just missing it [...]

I have a good sized list of comments / enhancement requests for POSIX
and length(array) is one of them. I just need a few hours to enter
them into the Austin Group bug system.

The referenced issues carry comments about the features being widely
available in other awks. Is that also true for 'length(array)'?

It's implemented in gawk and BWK awk, as well as in mawk 1.9.9.6.

Gawk and BWK awk keep a count of the number of elements in an array,
so length(array) has neglible cost.

It's been in BWK awk since 2002 (!), and was in gawk even before then. So, it's about time it got standardized.

Good, thanks!

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Ed Morton on Fri Mar 5 16:06:15 2021

On 05.03.2021 13:39, Ed Morton wrote:

On 3/5/2021 2:04 AM, Janis Papanagnou wrote:

On 04.03.2021 18:25, Ed Morton wrote:

There's simply no down side to having length(array) in the language,
other than that some awk implementations would need a trivial tweak to
provide it and it lets us write scripts that can benefit from it (by
reduced code and/or improved efficiency and/or ability to write more
general functions that can operate on arrays) more portably.

All practical applications I pondered about and those grep'ed in
my sources did not show the necessity. All are typically solvable
in easy ways with negligible overhead without performance issues.

Maybe you can provide a concrete example from your practices that
obviously demonstrate the necessity of a standard 'length(array)'?

Of course it's not necessary. Neither is `length(string)`:

$ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++); print c}'
3

This is silly.

nor `delete(array)` nor several other useful constructs.

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Janis Papanagnou on Fri Mar 5 10:01:39 2021

On 3/5/2021 9:06 AM, Janis Papanagnou wrote:

On 05.03.2021 13:39, Ed Morton wrote:

On 3/5/2021 2:04 AM, Janis Papanagnou wrote:

On 04.03.2021 18:25, Ed Morton wrote:

There's simply no down side to having length(array) in the language,
other than that some awk implementations would need a trivial tweak to >>>> provide it and it lets us write scripts that can benefit from it (by
reduced code and/or improved efficiency and/or ability to write more
general functions that can operate on arrays) more portably.

All practical applications I pondered about and those grep'ed in
my sources did not show the necessity. All are typically solvable
in easy ways with negligible overhead without performance issues.

Maybe you can provide a concrete example from your practices that
obviously demonstrate the necessity of a standard 'length(array)'?

Of course it's not necessary. Neither is `length(string)`:

$ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++); print c}' >> 3

This is silly.

It's conceptually the same as and therefore no more silly than the
equivalent for arrays:

$ awk 'BEGIN{split("f o o",arr); for (i in arr) c++; print c}'
3

Obviously I'm populating `str` and `arr` that way for brevity for this
example so while _in this case_ you could save the output of split()
that's completely irrelevant to the general case where at some point in
your code you need to know the length of a string or an array,
regardless of how that string or array was constructed or modified
leading up to that point.

You can use a loop to get the length of a string or array and yet right
now we have `length(string)` in POSIX but not `length(array)`.

Anyway, looks like length(array) is present in [almost] all modern awks
and Arnold is taking care of requesting it be added to POSIX so there's
no point discussing it further.

Ed.

nor `delete(array)` nor several other useful constructs.

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Ed Morton on Fri Mar 5 17:23:08 2021

On 05.03.2021 17:01, Ed Morton wrote:

On 3/5/2021 9:06 AM, Janis Papanagnou wrote:

On 05.03.2021 13:39, Ed Morton wrote:

On 3/5/2021 2:04 AM, Janis Papanagnou wrote:

On 04.03.2021 18:25, Ed Morton wrote:

There's simply no down side to having length(array) in the language, >>>>> other than that some awk implementations would need a trivial tweak to >>>>> provide it and it lets us write scripts that can benefit from it (by >>>>> reduced code and/or improved efficiency and/or ability to write more >>>>> general functions that can operate on arrays) more portably.

All practical applications I pondered about and those grep'ed in
my sources did not show the necessity. All are typically solvable
in easy ways with negligible overhead without performance issues.

Maybe you can provide a concrete example from your practices that
obviously demonstrate the necessity of a standard 'length(array)'?

Of course it's not necessary. Neither is `length(string)`:

$ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++);
print c}'
3

This is silly.

It's conceptually the same as and therefore no more silly than the
equivalent for arrays:

I cannot take you serious with this sidetrack; my question was
simple and could be supported be (any existing) evidence.

Instead of writing such silly statement or vacuous nonsense like...

nor `delete(array)` nor several other useful constructs.

...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.

Awk is a terse language. My reflexive expectation would have been
that 'length(array)' is a useful feature. But really, I could not
find any sensible example. (That's a different experience I have
from other programming languages that have a strong emphasis on
data structures!) But with Awk's restrictions and idioms, really,
that feature appears to me to not be of any importance. Therefore
I was asking for substantial application cases.

Janis

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Janis Papanagnou on Fri Mar 5 11:16:07 2021

On 3/5/2021 10:23 AM, Janis Papanagnou wrote:

On 05.03.2021 17:01, Ed Morton wrote:

On 3/5/2021 9:06 AM, Janis Papanagnou wrote:

On 05.03.2021 13:39, Ed Morton wrote:

On 3/5/2021 2:04 AM, Janis Papanagnou wrote:

On 04.03.2021 18:25, Ed Morton wrote:

There's simply no down side to having length(array) in the language, >>>>>> other than that some awk implementations would need a trivial tweak to >>>>>> provide it and it lets us write scripts that can benefit from it (by >>>>>> reduced code and/or improved efficiency and/or ability to write more >>>>>> general functions that can operate on arrays) more portably.

All practical applications I pondered about and those grep'ed in
my sources did not show the necessity. All are typically solvable
in easy ways with negligible overhead without performance issues.

Maybe you can provide a concrete example from your practices that
obviously demonstrate the necessity of a standard 'length(array)'?

Of course it's not necessary. Neither is `length(string)`:

$ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++);
print c}'
3

This is silly.

It's conceptually the same as and therefore no more silly than the
equivalent for arrays:

I cannot take you serious with this sidetrack; my question was
simple and could be supported be (any existing) evidence.

Instead of writing such silly statement or vacuous nonsense like...

What sidetrack? I have no idea what you're talking about by saying that
nor why you're behaving like this. I've explained clearly and simply in response to your questions why length(array) is useful and appropriate
to be part of POSIX by showing how its behavior relates to existing and
pending functionality for strings and arrays. How you can claim that's
silly or vacuous is beyond me and frankly quite insulting.

nor `delete(array)` nor several other useful constructs.

...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.

I've given you enough examples. Apparently you find them silly and
vacuous despite them clearly and directly addressing the issue so
needless to say I won't be providing any more.

Awk is a terse language. My reflexive expectation would have been
that 'length(array)' is a useful feature. But really, I could not
find any sensible example. (That's a different experience I have
from other programming languages that have a strong emphasis on
data structures!) But with Awk's restrictions and idioms, really,
that feature appears to me to not be of any importance. Therefore
I was asking for substantial application cases.

You are, apparently deliberately and purposefully, confusing "necessary"
with "useful and consistent", and becoming rude so there's no point
continuing this conversation. I look forward to seeing length(array)
become part of the POSIX standard just as it's already part of almost
all modern awks for the reasons I've already stated.

Ed.

Janis

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Ed Morton on Fri Mar 5 20:20:40 2021

Ed Morton <mortonspam@gmail.com> writes:

On 3/5/2021 10:23 AM, Janis Papanagnou wrote:

...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.

I've given you enough examples.

Things have got a bit heated. I did not know about (and had not missed)
gawk's length(array) extension so I too was hoping to see what sort of
scripts use it. In case I'd missed them, I looked through all your
posts in this thread and could not see any examples where length(array)
was used.

Obviously I don't expect you to put in any work looking out such
examples just because I ask, but you might have some to hand, given the interest you have in including it in a future POSIX standard. Use cases
would be useful in that context too.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Ed Morton on Fri Mar 5 15:30:23 2021

On 3/5/2021 3:25 PM, Ed Morton wrote:
<snip>

    function paste(left,right,      lgthLeft,lgthRight,i,n,out) {
        for (i=1; i in left; i++) {

that can just be `for (i in left)` of course and ditto for right.

Ed.

            lgthLeft++
        }
        for (i=1; i in right; i++) {
            lgthRight++
        }
        n = ( lgthLeft > lgthRight ? lgthLeft : lgthRight )
        for (i=1; i<=n; i++) {
            out = out "<"left[i]":"right[i]">" OFS
        }
        return out
    }

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Ben Bacarisse on Fri Mar 5 15:25:34 2021

On 3/5/2021 2:20 PM, Ben Bacarisse wrote:

Ed Morton <mortonspam@gmail.com> writes:

On 3/5/2021 10:23 AM, Janis Papanagnou wrote:

...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.

I've given you enough examples.

Things have got a bit heated. I did not know about (and had not missed) gawk's length(array) extension so I too was hoping to see what sort of scripts use it. In case I'd missed them, I looked through all your
posts in this thread and could not see any examples where length(array)
was used.

True, the code I posted were examples of cases where length(array) would
be useful instead of the shown code and cases where length(string) is
similarly useful, not necessary, to draw the obvious comparison between
the 2 uses for length().

Obviously I don't expect you to put in any work looking out such
examples just because I ask, but you might have some to hand, given the interest you have in including it in a future POSIX standard. Use cases would be useful in that context too.

No problem. Consider, for example, this code using length(array) that
stores your input in an array and if some condition is encountered
deletes the entry at index 17 if it exists:

{ arr[++idx] = $0 }
some_condition { delete arr[17] }
END { print "Number of elements:", length(arr) }

vs this if you want to have to introduce a separate variable (`cnt`) to
track the number of elements in the array and remember to
increment/decrement it everywhere in your code that the array changes:

{ arr[++idx] = $0; cnt++ }
some_condition {
if ( 17 in arr ) {
delete arr[17]
cnt--
}
}
END { print "Number of elements:", cnt+0 }

or this otherwise:

{ arr[++idx] = $0 }
some_condition { delete arr[17] }
END {
for ( i in arr ) {
cnt++
}
print "Number of elements:", cnt+0
}

As another example, lets imagine you want a function to paste the
elements of 2 numerically indexed arrays side by side so you could do:

BEGIN {
split("sue bob jan",a)
split("sue joe jan alf",b)

print paste(a,b)
}

and get the output:

<sue:sue> <bob:joe> <jan:jan> <:alf>

Here is that function using length(array):

function paste(left,right, i,n,out) {
n = ( length(left) > length(right) ? length(left) : length(right) )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}

and here it is otherwise assuming you don't want to have to keep
separate counters of the sizes of every array, incrementing and
decrementing them every time the array s change, and change the
arguments to the function to pass those in:

function paste(left,right, lgthLeft,lgthRight,i,n,out) {
for (i=1; i in left; i++) {
lgthLeft++
}
for (i=1; i in right; i++) {
lgthRight++
}
n = ( lgthLeft > lgthRight ? lgthLeft : lgthRight )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}

Regards,

Regards,

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Ed Morton on Fri Mar 5 22:41:35 2021

Ed Morton <mortonspam@gmail.com> writes:

On 3/5/2021 2:20 PM, Ben Bacarisse wrote:

Ed Morton <mortonspam@gmail.com> writes:

On 3/5/2021 10:23 AM, Janis Papanagnou wrote:

...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.

I've given you enough examples.

Things have got a bit heated. I did not know about (and had not missed)
gawk's length(array) extension so I too was hoping to see what sort of
scripts use it. In case I'd missed them, I looked through all your
posts in this thread and could not see any examples where length(array)
was used.

True, the code I posted were examples of cases where length(array)
would be useful instead of the shown code and cases where
length(string) is similarly useful, not necessary, to draw the obvious comparison between the 2 uses for length().

Obviously I don't expect you to put in any work looking out such
examples just because I ask, but you might have some to hand, given the
interest you have in including it in a future POSIX standard. Use cases
would be useful in that context too.

No problem. Consider, for example, this code using length(array) that
stores your input in an array and if some condition is encountered
deletes the entry at index 17 if it exists:

{ arr[++idx] = $0 }
some_condition { delete arr[17] }
END { print "Number of elements:", length(arr) }

vs this if you want to have to introduce a separate variable (`cnt`)
to track the number of elements in the array and remember to increment/decrement it everywhere in your code that the array changes:

Is this a case you've come across? It look rather unusual.

{ arr[++idx] = $0; cnt++ }
some_condition {
if ( 17 in arr ) {
delete arr[17]
cnt--
}
}
END { print "Number of elements:", cnt+0 }

My preference would be

{ arr[++added] = $0 }
some_condition {
if ( 17 in arr ) {
delete arr[17]
deleted++
}
}
END { print "Number of elements:", added-deleted }

As another example, lets imagine you want a function to paste the
elements of 2 numerically indexed arrays side by side so you could do:

BEGIN {
split("sue bob jan",a)
split("sue joe jan alf",b)

print paste(a,b)
}

Good example. I can imagine this coming up in real code -- it's just
the function often called zip.

and get the output:

<sue:sue> <bob:joe> <jan:jan> <:alf>

Here is that function using length(array):

function paste(left,right, i,n,out) {
n = ( length(left) > length(right) ? length(left) : length(right) )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}

I'd write

function paste(left, right, i, out) {
for (i = 1; i in left || i in right; i++)
out = out "<" left[i] ":" right[i] ">" OFS
return out
}

It's shorter and, to me, more obvious without length. It also works
when elements have been deleted.

and here it is otherwise assuming you don't want to have to keep
separate counters of the sizes of every array, incrementing and
decrementing them every time the array s change, and change the
arguments to the function to pass those in:

function paste(left,right, lgthLeft,lgthRight,i,n,out) {
for (i=1; i in left; i++) {
lgthLeft++
}
for (i=1; i in right; i++) {
lgthRight++
}
n = ( lgthLeft > lgthRight ? lgthLeft : lgthRight )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Ben Bacarisse on Fri Mar 5 17:04:56 2021

On 3/5/2021 4:41 PM, Ben Bacarisse wrote:

Ed Morton <mortonspam@gmail.com> writes:

On 3/5/2021 2:20 PM, Ben Bacarisse wrote:

Ed Morton <mortonspam@gmail.com> writes:

On 3/5/2021 10:23 AM, Janis Papanagnou wrote:

...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.

I've given you enough examples.

Things have got a bit heated. I did not know about (and had not missed) >>> gawk's length(array) extension so I too was hoping to see what sort of
scripts use it. In case I'd missed them, I looked through all your
posts in this thread and could not see any examples where length(array)
was used.

True, the code I posted were examples of cases where length(array)
would be useful instead of the shown code and cases where
length(string) is similarly useful, not necessary, to draw the obvious
comparison between the 2 uses for length().

Obviously I don't expect you to put in any work looking out such
examples just because I ask, but you might have some to hand, given the
interest you have in including it in a future POSIX standard. Use cases >>> would be useful in that context too.

No problem. Consider, for example, this code using length(array) that
stores your input in an array and if some condition is encountered
deletes the entry at index 17 if it exists:

{ arr[++idx] = $0 }
some_condition { delete arr[17] }
END { print "Number of elements:", length(arr) }

vs this if you want to have to introduce a separate variable (`cnt`)
to track the number of elements in the array and remember to
increment/decrement it everywhere in your code that the array changes:

Is this a case you've come across? It look rather unusual.

Yes, adding elements to an array and later deleting some of them under
various conditions before finally needing to know how many elements are
in the array is common-place, I just minimized the code.

{ arr[++idx] = $0; cnt++ }
some_condition {
if ( 17 in arr ) {
delete arr[17]
cnt--
}
}
END { print "Number of elements:", cnt+0 }

My preference would be

{ arr[++added] = $0 }
some_condition {
if ( 17 in arr ) {
delete arr[17]
deleted++
}
}
END { print "Number of elements:", added-deleted }

My preference would for a single variable if length() wasn't an option
but either way. Keep in mind you'd be doing this for the N arrays that
you might need the length() of and so you'll have N * both variables in
general.

As another example, lets imagine you want a function to paste the
elements of 2 numerically indexed arrays side by side so you could do:

BEGIN {
split("sue bob jan",a)
split("sue joe jan alf",b)

print paste(a,b)
}

Good example. I can imagine this coming up in real code -- it's just
the function often called zip.

and get the output:

<sue:sue> <bob:joe> <jan:jan> <:alf>

Here is that function using length(array):

function paste(left,right, i,n,out) {
n = ( length(left) > length(right) ? length(left) : length(right) ) >> for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}

I'd write

function paste(left, right, i, out) {
for (i = 1; i in left || i in right; i++)
out = out "<" left[i] ":" right[i] ">" OFS
return out
}

It's shorter and, to me, more obvious without length. It also works
when elements have been deleted.

That would fail if the same index had been deleted from both arrays as
the loop would exit at that deleted index instead of continuing to
process the indices after that point.

Ed.

and here it is otherwise assuming you don't want to have to keep
separate counters of the sizes of every array, incrementing and
decrementing them every time the array s change, and change the
arguments to the function to pass those in:

function paste(left,right, lgthLeft,lgthRight,i,n,out) {
for (i=1; i in left; i++) {
lgthLeft++
}
for (i=1; i in right; i++) {
lgthRight++
}
n = ( lgthLeft > lgthRight ? lgthLeft : lgthRight )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Ben Bacarisse on Sat Mar 6 08:07:33 2021

On 3/6/2021 7:38 AM, Ben Bacarisse wrote:

Ed Morton <mortonspam@gmail.com> writes:

On 3/5/2021 4:41 PM, Ben Bacarisse wrote:

Ed Morton <mortonspam@gmail.com> writes:

As another example, lets imagine you want a function to paste the
elements of 2 numerically indexed arrays side by side so you could do: >>>>
BEGIN {
split("sue bob jan",a)
split("sue joe jan alf",b)

print paste(a,b)
}

Good example. I can imagine this coming up in real code -- it's just
the function often called zip.

and get the output:

<sue:sue> <bob:joe> <jan:jan> <:alf>

Here is that function using length(array):

function paste(left,right, i,n,out) {
n = ( length(left) > length(right) ? length(left) : length(right) )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}

I'd write

function paste(left, right, i, out) {
for (i = 1; i in left || i in right; i++)
out = out "<" left[i] ":" right[i] ">" OFS
return out
}

It's shorter and, to me, more obvious without length. It also works
when elements have been deleted.

That would fail if the same index had been deleted from both arrays as
the loop would exit at that deleted index instead of continuing to
process the indices after that point.

Thanks for pointing out the bug. I had, in fact, tested this case but I missed it because both functions have a another bug: they modify the
arrays. I'd called yours in the test before mine, and that masked the
bug. Both functions should probably use

out = out "<" (i in left ? left[i] : "") ":"\
(i in right ? right[i] : "") ">" OFS

to avoid altering either the length or the result of any "in" tests.

But the key point is that your version also fails in the face of deleted elements (though obviously in a different way).

It's an interesting problem, but so far, not one that shows that length helps. The fact that the length is adjusted by a delete is unhelpful
when what you really want is the maximum index.

Yeah, you're right, it was a crap example, I shouldn't have rushed to
put something together without really thinking it through. My enthusiasm
level is pretty low to invest any more time in this but just think of
any situation where you have N arrays in your code and a function that
takes an array (or multiple arrays) and needs to know how many elements
are in it/them to do something with it/them - your options are to:

a) keep "numberOfElementsN" variables (or numberAddedN and
numberedDeletedN as you suggested previously) for every such array and
pass those as args to the function or
b) write a loop in the function to count the elements in each array or
c) call length(array) in the function.

IMHO "c" is the clear winner. Again, it's not necessary, it's simply
useful and consistent with other functionality. If I feel a burning
desire later to actually look around for or come up with a specific
example I may do.

Ed.

TL;DR: If we can assume contiguous indexes, I prefer my solution above,
and in the presence of deletes, this is my best attempt:

function paste(left, right, i, ub, out) {
ub = 0;
for (i in left) if (i > ub) ub = i;
for (i in right) if (i > ub) ub = i;
for (i = 1; i <= ub; i++)
out = out "<" (i in left ? left[i] : "") ":"\
(i in right ? right[i] : "") ">" OFS
return out
}

Can length(arr) help with this?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Ed Morton on Sat Mar 6 13:38:03 2021

Ed Morton <mortonspam@gmail.com> writes:

On 3/5/2021 4:41 PM, Ben Bacarisse wrote:

Ed Morton <mortonspam@gmail.com> writes:

As another example, lets imagine you want a function to paste the
elements of 2 numerically indexed arrays side by side so you could do:

BEGIN {
split("sue bob jan",a)
split("sue joe jan alf",b)

print paste(a,b)
}

Good example. I can imagine this coming up in real code -- it's just
the function often called zip.

and get the output:

<sue:sue> <bob:joe> <jan:jan> <:alf>

Here is that function using length(array):

function paste(left,right, i,n,out) {
n = ( length(left) > length(right) ? length(left) : length(right) )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}

I'd write

function paste(left, right, i, out) {
for (i = 1; i in left || i in right; i++)
out = out "<" left[i] ":" right[i] ">" OFS
return out
}

It's shorter and, to me, more obvious without length. It also works
when elements have been deleted.

That would fail if the same index had been deleted from both arrays as
the loop would exit at that deleted index instead of continuing to
process the indices after that point.

Thanks for pointing out the bug. I had, in fact, tested this case but I
missed it because both functions have a another bug: they modify the
arrays. I'd called yours in the test before mine, and that masked the
bug. Both functions should probably use

out = out "<" (i in left ? left[i] : "") ":"\
(i in right ? right[i] : "") ">" OFS

to avoid altering either the length or the result of any "in" tests.

But the key point is that your version also fails in the face of deleted elements (though obviously in a different way).

It's an interesting problem, but so far, not one that shows that length
helps. The fact that the length is adjusted by a delete is unhelpful
when what you really want is the maximum index.

TL;DR: If we can assume contiguous indexes, I prefer my solution above,
and in the presence of deletes, this is my best attempt:

function paste(left, right, i, ub, out) {
ub = 0;
for (i in left) if (i > ub) ub = i;
for (i in right) if (i > ub) ub = i;
for (i = 1; i <= ub; i++)
out = out "<" (i in left ? left[i] : "") ":"\
(i in right ? right[i] : "") ">" OFS
return out
}

Can length(arr) help with this?

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Ed Morton on Sat Mar 6 15:38:08 2021

Ed Morton <mortonspam@gmail.com> writes:

... My enthusiasm level is pretty low to invest any more time in this
but just think of any situation where you have N arrays in your code
and a function that takes an array (or multiple arrays) and needs to
know how many elements are in it/them to do something with it/them -
your options are to:

a) keep "numberOfElementsN" variables (or numberAddedN and
numberedDeletedN as you suggested previously) for every such array and
pass those as args to the function or
b) write a loop in the function to count the elements in each array or
c) call length(array) in the function.

(a) suggests you are considering deleted elements, but (c) (the number
of elements) is not the right number to let you "do something with
it/them".

IMHO "c" is the clear winner.

I agree, provided no elements might have been deleted. Then you might
want maxindex(arr) to get highest numeric index. That would solve the
zip problem too. In fact, for arrays with numeric indexes (those where
you need to know how many elements are in it to do something with
it/them), maxindex(arr) (possibly with a matching minindex(arr)) might
be better in almost every case.

For arrays without numeric keys, maxindex(arr) would be useless, but
then length(arr) is not going to be so useful either.

I'd argue for both, but maxindex(arr) is probably not going to as simple
for implementations to add.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Ben Bacarisse on Sat Mar 6 09:54:54 2021

On 3/6/2021 9:38 AM, Ben Bacarisse wrote:

Ed Morton <mortonspam@gmail.com> writes:

... My enthusiasm level is pretty low to invest any more time in this
but just think of any situation where you have N arrays in your code
and a function that takes an array (or multiple arrays) and needs to
know how many elements are in it/them to do something with it/them -
your options are to:

a) keep "numberOfElementsN" variables (or numberAddedN and
numberedDeletedN as you suggested previously) for every such array and
pass those as args to the function or
b) write a loop in the function to count the elements in each array or
c) call length(array) in the function.

(a) suggests you are considering deleted elements, but (c) (the number
of elements) is not the right number to let you "do something with
it/them".

IMHO "c" is the clear winner.

I agree, provided no elements might have been deleted. Then you might
want maxindex(arr) to get highest numeric index.

Yeah you might but that's simply a different problem. Sometimes you want
the number of elements in any array (as length(array) provides) and
other times you want the min and/or max indices in a numeric array but
that's not the case I'm referring to in this discussion, it's simply the
common and garden case where you need to know how many elements are in
an array.

Ed.

That would solve the

zip problem too. In fact, for arrays with numeric indexes (those where
you need to know how many elements are in it to do something with
it/them), maxindex(arr) (possibly with a matching minindex(arr)) might
be better in almost every case.

For arrays without numeric keys, maxindex(arr) would be useless, but
then length(arr) is not going to be so useful either.

I'd argue for both, but maxindex(arr) is probably not going to as simple
for implementations to add.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Ben Bacarisse on Mon Mar 8 08:14:26 2021

On 3/5/2021 2:20 PM, Ben Bacarisse wrote:

Ed Morton <mortonspam@gmail.com> writes:

On 3/5/2021 10:23 AM, Janis Papanagnou wrote:

...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.

I've given you enough examples.

Things have got a bit heated. I did not know about (and had not missed) gawk's length(array) extension so I too was hoping to see what sort of scripts use it. In case I'd missed them, I looked through all your
posts in this thread and could not see any examples where length(array)
was used.

Obviously I don't expect you to put in any work looking out such
examples just because I ask, but you might have some to hand, given the interest you have in including it in a future POSIX standard. Use cases would be useful in that context too.

FWIW I just used length(array) so figured I'd share how given the above
request for use cases. I have an existing (about 10 years old) ~300 line
awk script that analyzes the emails I get at a given account and takes
various actions. It looks like this

{
Collect data and, among other things, populate cnt[] with
a count of emails of a specific type received per sender
email addresses.
}
END {
for (addr in cnt) {
if (cnt[addr] > threshold) {
do something
}
}
}
}

Today it was taking longer than usual to run so I decided it'd be useful
to know how many addresses are about to be processed before starting
that loop so that I get an indication of where it's at in the execution,
how much output to expect and how long it'll take to run and so tweaked
it to add:

printf "About to process %d addresses\n", length(cnt) | "cat>&2"

as the first line of the END section. Clean, simple, succinct, with no
need to add other variables or a loop or make any changes to the rest of
the 10-year-old code.

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kenny McCormack@21:1/5 to mortonspam@gmail.com on Mon Mar 8 15:53:08 2021

In article <s25bg2$dg8$1@dont-email.me>,
Ed Morton <mortonspam@gmail.com> wrote:
...

printf "About to process %d addresses\n", length(cnt) | "cat>&2"

print "About to process",length(cnt),"addresses" > "/dev/stderr"

HTH

--

"If God wanted us to believe in him, he'd exist."

(Linda Smith on "10 Funniest Londoners", TimeOut, 23rd June, 2005.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Ed Morton on Mon Mar 8 17:34:45 2021

On 2021-03-08, Ed Morton <mortonspam@gmail.com> wrote:

FWIW I just used length(array) so figured I'd share how given the above request for use cases. I have an existing (about 10 years old) ~300 line
awk script that analyzes the emails I get at a given account and takes various actions. It looks like this

My experiments with GNU Awk's length(array) reveal a serious flaw.

If array is not yet defined, then length(array) mutates its state,
turning it into a scalar.

For instance, this program has an issue when the input stream is empty,
unless we remove the length(a) line:

function report (a,
i)
{
printf("length(a) = %d\n", length(a))

for (i in a) {
printf("a[%s] = %s\n", i, a[i])
}
}

{ array[NR] = $0 }

END { report(array) }

$ awk -f length.awk
a
b
c
length(a) = 3
a[1] = a
a[2] = b
a[3] = c
$ awk -f length.awk
length(a) = 0
awk: length.awk:6: fatal: attempt to use scalar parameter `a' as an array

The requirement is that if x is undefined, length(x) should just leave
it undefined.

I can obtain a length function with this property if I make it
user-defined.

function array_len(a,
len, i)
{
len = 0
for (i in a)
len++
return len
}

function report (a,
i)
{
printf("length(a) = %d\n", array_len(a))
for (i in a) {
printf("a[%s] = %s\n", i, a[i])
}
}

{ array[NR] = $0 }

END { report(array) }

Now the program works no longer dies with an error message for an empty
input.

Why does the built-in length function/operator have to mutate the
variable, but a user-defined function appears to be pure pass-by-value?

That behavior is acceptable if the only operand to which we can apply
length is the character string. If length works only for character
strings, and those are scalar, then it makes sense to infer that x
must be scalar if length(x) has been applied.

(Well, or at least, it makes "sense" if we accept the premise that
strings cannot be traversed with for (i in str), nor accessed
with str[i].)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Kaz Kylheku on Mon Mar 8 15:50:21 2021

On 3/8/2021 11:34 AM, Kaz Kylheku wrote:

On 2021-03-08, Ed Morton <mortonspam@gmail.com> wrote:

FWIW I just used length(array) so figured I'd share how given the above
request for use cases. I have an existing (about 10 years old) ~300 line
awk script that analyzes the emails I get at a given account and takes
various actions. It looks like this

My experiments with GNU Awk's length(array) reveal a serious flaw.

If array is not yet defined, then length(array) mutates its state,
turning it into a scalar.

The gawk guys could provide C&V on this but FWIW - that behavior is
stated in the gawk manual (https://www.gnu.org/software/gawk/manual/gawk.html#String-Functions):

------
If length() is called with a variable that has not been used, gawk
forces the variable to be a scalar. Other implementations of awk leave
the variable without a type. (d.c.) Consider:

$ gawk 'BEGIN { print length(x) ; x[1] = 1 }'
-| 0
error→ gawk: fatal: attempt to use scalar `x' as array

$ nawk 'BEGIN { print length(x) ; x[1] = 1 }'
-| 0

If --lint has been specified on the command line, gawk issues a warning
about this.
------

It doesn't explain "why" that's the case, just states that it is, but I wouldn't be surprised if some or all versions of awk that don't support `length(array)` force the type of any variable passed to `length()` to
be string and, if so, then gawk would be consistent with that behavior
and so not break existing scripts being ported to gawk.

Ed.

For instance, this program has an issue when the input stream is empty, unless we remove the length(a) line:

function report (a,
i)
{
printf("length(a) = %d\n", length(a))

for (i in a) {
printf("a[%s] = %s\n", i, a[i])
}
}

{ array[NR] = $0 }

END { report(array) }

$ awk -f length.awk
a
b
c
length(a) = 3
a[1] = a
a[2] = b
a[3] = c
$ awk -f length.awk
length(a) = 0
awk: length.awk:6: fatal: attempt to use scalar parameter `a' as an array

The requirement is that if x is undefined, length(x) should just leave
it undefined.

I can obtain a length function with this property if I make it
user-defined.

function array_len(a,
len, i)
{
len = 0
for (i in a)
len++
return len
}

function report (a,
i)
{
printf("length(a) = %d\n", array_len(a))
for (i in a) {
printf("a[%s] = %s\n", i, a[i])
}
}

{ array[NR] = $0 }

END { report(array) }

Now the program works no longer dies with an error message for an empty input.

Why does the built-in length function/operator have to mutate the
variable, but a user-defined function appears to be pure pass-by-value?

That behavior is acceptable if the only operand to which we can apply
length is the character string. If length works only for character
strings, and those are scalar, then it makes sense to infer that x
must be scalar if length(x) has been applied.

(Well, or at least, it makes "sense" if we accept the premise that
strings cannot be traversed with for (i in str), nor accessed
with str[i].)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Thu Apr 18 21:44:01 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 19 09:15:26 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 19 08:49:01 2024
  from Wales, Uk via Telnet
- Chippey
  Fri Apr 19 02:45:49 2024
  from Winnipeg, Canada via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	293
Nodes:	16 (2 / 14)
Uptime:	216:45:16
Calls:	6,621
Calls today:	3
Files:	12,169
Messages:	5,317,616

Finding POSIX standard updates for awk

Who's Online

Recent Visitors

System Info