On 04.03.2021 14:36, Ed Morton wrote:
There are a few awk changes I see planned for a future POSIX standards
update, e.g.:
delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634
Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for
inclusion in the standard so I'm wondering if I'm just missing it [...]
The referenced issues carry comments about the features being widely available in other awks. Is that also true for 'length(array)'?
Janis
There are a few awk changes I see planned for a future POSIX standards update, e.g.:
delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634
Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for inclusion in the standard so I'm wondering if I'm just missing it [...]
On 04.03.2021 16:16, Janis Papanagnou wrote:
On 04.03.2021 14:36, Ed Morton wrote:
There are a few awk changes I see planned for a future POSIX standards
update, e.g.:
delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634
Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for
inclusion in the standard so I'm wondering if I'm just missing it [...]
The referenced issues carry comments about the features being widely
available in other awks. Is that also true for 'length(array)'?
And to add: also efficiency seems to be a concern (at least in 'delete(array)' and 'nextfile'). Dynamically determining the
'length(array)' might not qualify in that respect (also adding
an implicit counter might not qualify for obvious reasons).
On 3/4/2021 9:20 AM, Janis Papanagnou wrote:
On 04.03.2021 16:16, Janis Papanagnou wrote:
On 04.03.2021 14:36, Ed Morton wrote:
There are a few awk changes I see planned for a future POSIX standards >>>> update, e.g.:The referenced issues carry comments about the features being widely
delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634
Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for
inclusion in the standard so I'm wondering if I'm just missing it [...] >>>
available in other awks. Is that also true for 'length(array)'?
And to add: also efficiency seems to be a concern (at least in
'delete(array)' and 'nextfile'). Dynamically determining the
'length(array)' might not qualify in that respect (also adding
an implicit counter might not qualify for obvious reasons).
The description of `delete(array)` in that link is misleading as it's
really replacing `split("",array)` rather than `for (i in array) delete array[i]` in common use.
Idk how commonly length(array) is implemented but it's commonly used
when handling arrays in gawk scripts, would presumably be implemented
more efficiently than `c=0; for (i in array) c++`, and can't break any existing scripts when introduced so it seems to me like an excellent candidate for the standard.
Ed.
On 04.03.2021 16:53, Ed Morton wrote:
On 3/4/2021 9:20 AM, Janis Papanagnou wrote:
On 04.03.2021 16:16, Janis Papanagnou wrote:
On 04.03.2021 14:36, Ed Morton wrote:
There are a few awk changes I see planned for a future POSIX standards >>>>> update, e.g.:The referenced issues carry comments about the features being widely
delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634
Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for >>>>> inclusion in the standard so I'm wondering if I'm just missing it [...] >>>>
available in other awks. Is that also true for 'length(array)'?
And to add: also efficiency seems to be a concern (at least in
'delete(array)' and 'nextfile'). Dynamically determining the
'length(array)' might not qualify in that respect (also adding
an implicit counter might not qualify for obvious reasons).
The description of `delete(array)` in that link is misleading as it's
really replacing `split("",array)` rather than `for (i in array) delete
array[i]` in common use.
Idk how commonly length(array) is implemented but it's commonly used
when handling arrays in gawk scripts, would presumably be implemented
more efficiently than `c=0; for (i in array) c++`, and can't break any
existing scripts when introduced so it seems to me like an excellent
candidate for the standard.
Well, what I was aiming at was that in cases where you need to
interrogate the number of elements in the array you don't need
to traverse the whole array (that may be costly) but can count
on insertion and on deletion of elements. Or, of course, if it
fits better you can also loop-count (in cases where you'd not
need to interrogate that number often). A built-in length(arr)
function would need to impose costs on *any* user - either to increment/decrement a counter, or loop across the whole array,
which may unnecessarily be a of bad performance. I think that
characteristic makes it not a perfect feature candidate for a
standard implementation.
There's simply no down side to having length(array) in the language,
other than that some awk implementations would need a trivial tweak to provide it and it lets us write scripts that can benefit from it (by
reduced code and/or improved efficiency and/or ability to write more
general functions that can operate on arrays) more portably.
On 04.03.2021 14:36, Ed Morton wrote:
There are a few awk changes I see planned for a future POSIX standards
update, e.g.:
delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634
Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for
inclusion in the standard so I'm wondering if I'm just missing it [...]
The referenced issues carry comments about the features being widely >available in other awks. Is that also true for 'length(array)'?
delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634
The above 3 awk changes are tagged with "issue 8" so I assume they are
all going to be present in that 2022 issue of the standard, it appears I
can see all changes associated with "issue 8" (2022) at https://www.austingroupbugs.net/tag_view_page.php?tag_id=8 and clicking
on the "Attached Issues" link there lists 177 issues.
I can manually search for "awk" in the Summary for each issue but the
tool name isn't always present in the summary text so - is there a
robust way to see all planned changes for a given tool such as awk?
On 04.03.2021 18:25, Ed Morton wrote:
There's simply no down side to having length(array) in the language,
other than that some awk implementations would need a trivial tweak to
provide it and it lets us write scripts that can benefit from it (by
reduced code and/or improved efficiency and/or ability to write more
general functions that can operate on arrays) more portably.
All practical applications I pondered about and those grep'ed in
my sources did not show the necessity. All are typically solvable
in easy ways with negligible overhead without performance issues.
Maybe you can provide a concrete example from your practices that
obviously demonstrate the necessity of a standard 'length(array)'?
Ed Morton wrote:
delete(array): https://www.austingroupbugs.net/view.php?id=544[...]
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634
The above 3 awk changes are tagged with "issue 8" so I assume they are
all going to be present in that 2022 issue of the standard, it appears I
can see all changes associated with "issue 8" (2022) at
https://www.austingroupbugs.net/tag_view_page.php?tag_id=8 and clicking
on the "Attached Issues" link there lists 177 issues.
I can manually search for "awk" in the Summary for each issue but the
tool name isn't always present in the summary text so - is there a
robust way to see all planned changes for a given tool such as awk?
Restricting your search to bugs tagged "issue8" could mean you miss
some things, for two reasons:
1. The "issue8" tag is added when a bug is resolved (with a change to
be made), so there could be as yet unresolved bugs that will result in
a change in Issue 8.
2. There is also a "tc3-2008" tag which is for bugs that would be
suitable for inclusion in a 3rd TC for Issue 7, if the Austin Group
decides to produce one. (It doesn't currently plan to, but is keeping
the option open.) The Issue 8 draft has these applied as well as the "issue8" tagged bugs.
As regards a more robust way to list the awk-related bugs of interest to
you: if you are primarily looking for feature additions (as opposed to
minor bug fixes etc.) then I would suggest you expand the filters and
click on "Advanced Filters" then select all of the Status values except "Closed", and all of the Section values that include awk (there are 4 at
the moment - use your browser's "find in page" feature to find them
quickly), then click "Apply Filter". This will finds bugs that were
reported specifically about awk (or awk and other things) - currently
13 bugs.
If you really want to find everything that affects awk, then instead of
using the Section values you should just put "awk" in the search box.
This will, of course, produce some false positives from words like
awkward, but there are ony 36 results so not a huge amount to weed
through.
The reason for omitting "Closed" status is because old bugs that were
fixed in Issue 7 or one of its TCs will be on Closed status.
In article <s1qtka$a90$1@news-1.m-online.net>,
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
On 04.03.2021 14:36, Ed Morton wrote:
There are a few awk changes I see planned for a future POSIX standards
update, e.g.:
delete(array): https://www.austingroupbugs.net/view.php?id=544
nextfile: https://www.austingroupbugs.net/view.php?id=607
fflush(): https://www.austingroupbugs.net/view.php?id=634
Disappointingly (since it's common, useful, and wouldn't break any
existing scripts if implemented) I don't see length(array) listed for
inclusion in the standard so I'm wondering if I'm just missing it [...]
I have a good sized list of comments / enhancement requests for POSIX
and length(array) is one of them. I just need a few hours to enter
them into the Austin Group bug system.
The referenced issues carry comments about the features being widely
available in other awks. Is that also true for 'length(array)'?
It's implemented in gawk and BWK awk, as well as in mawk 1.9.9.6.
Gawk and BWK awk keep a count of the number of elements in an array,
so length(array) has neglible cost.
It's been in BWK awk since 2002 (!), and was in gawk even before then. So, it's about time it got standardized.
On 3/5/2021 2:04 AM, Janis Papanagnou wrote:
On 04.03.2021 18:25, Ed Morton wrote:
There's simply no down side to having length(array) in the language,
other than that some awk implementations would need a trivial tweak to
provide it and it lets us write scripts that can benefit from it (by
reduced code and/or improved efficiency and/or ability to write more
general functions that can operate on arrays) more portably.
All practical applications I pondered about and those grep'ed in
my sources did not show the necessity. All are typically solvable
in easy ways with negligible overhead without performance issues.
Maybe you can provide a concrete example from your practices that
obviously demonstrate the necessity of a standard 'length(array)'?
Of course it's not necessary. Neither is `length(string)`:
$ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++); print c}'
3
nor `delete(array)` nor several other useful constructs.
Ed.
On 05.03.2021 13:39, Ed Morton wrote:
On 3/5/2021 2:04 AM, Janis Papanagnou wrote:
On 04.03.2021 18:25, Ed Morton wrote:
There's simply no down side to having length(array) in the language,
other than that some awk implementations would need a trivial tweak to >>>> provide it and it lets us write scripts that can benefit from it (by
reduced code and/or improved efficiency and/or ability to write more
general functions that can operate on arrays) more portably.
All practical applications I pondered about and those grep'ed in
my sources did not show the necessity. All are typically solvable
in easy ways with negligible overhead without performance issues.
Maybe you can provide a concrete example from your practices that
obviously demonstrate the necessity of a standard 'length(array)'?
Of course it's not necessary. Neither is `length(string)`:
$ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++); print c}' >> 3
This is silly.
nor `delete(array)` nor several other useful constructs.
Ed.
On 3/5/2021 9:06 AM, Janis Papanagnou wrote:
On 05.03.2021 13:39, Ed Morton wrote:
On 3/5/2021 2:04 AM, Janis Papanagnou wrote:
On 04.03.2021 18:25, Ed Morton wrote:
There's simply no down side to having length(array) in the language, >>>>> other than that some awk implementations would need a trivial tweak to >>>>> provide it and it lets us write scripts that can benefit from it (by >>>>> reduced code and/or improved efficiency and/or ability to write more >>>>> general functions that can operate on arrays) more portably.
All practical applications I pondered about and those grep'ed in
my sources did not show the necessity. All are typically solvable
in easy ways with negligible overhead without performance issues.
Maybe you can provide a concrete example from your practices that
obviously demonstrate the necessity of a standard 'length(array)'?
Of course it's not necessary. Neither is `length(string)`:
$ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++);
print c}'
3
This is silly.
It's conceptually the same as and therefore no more silly than the
equivalent for arrays:
nor `delete(array)` nor several other useful constructs.
Ed.
On 05.03.2021 17:01, Ed Morton wrote:
On 3/5/2021 9:06 AM, Janis Papanagnou wrote:
On 05.03.2021 13:39, Ed Morton wrote:
On 3/5/2021 2:04 AM, Janis Papanagnou wrote:
On 04.03.2021 18:25, Ed Morton wrote:
There's simply no down side to having length(array) in the language, >>>>>> other than that some awk implementations would need a trivial tweak to >>>>>> provide it and it lets us write scripts that can benefit from it (by >>>>>> reduced code and/or improved efficiency and/or ability to write more >>>>>> general functions that can operate on arrays) more portably.
All practical applications I pondered about and those grep'ed in
my sources did not show the necessity. All are typically solvable
in easy ways with negligible overhead without performance issues.
Maybe you can provide a concrete example from your practices that
obviously demonstrate the necessity of a standard 'length(array)'?
Of course it's not necessary. Neither is `length(string)`:
$ awk 'BEGIN{str="foo"; for (c=0; substr(str,c+1,1) != ""; c++);
print c}'
3
This is silly.
It's conceptually the same as and therefore no more silly than the
equivalent for arrays:
I cannot take you serious with this sidetrack; my question was
simple and could be supported be (any existing) evidence.
Instead of writing such silly statement or vacuous nonsense like...
nor `delete(array)` nor several other useful constructs.
...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.
Awk is a terse language. My reflexive expectation would have been
that 'length(array)' is a useful feature. But really, I could not
find any sensible example. (That's a different experience I have
from other programming languages that have a strong emphasis on
data structures!) But with Awk's restrictions and idioms, really,
that feature appears to me to not be of any importance. Therefore
I was asking for substantial application cases.
Janis
Ed.
On 3/5/2021 10:23 AM, Janis Papanagnou wrote:
...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.
I've given you enough examples.
function paste(left,right, lgthLeft,lgthRight,i,n,out) {that can just be `for (i in left)` of course and ditto for right.
for (i=1; i in left; i++) {
lgthLeft++
}
for (i=1; i in right; i++) {
lgthRight++
}
n = ( lgthLeft > lgthRight ? lgthLeft : lgthRight )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}
Ed Morton <mortonspam@gmail.com> writes:
On 3/5/2021 10:23 AM, Janis Papanagnou wrote:
...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.
I've given you enough examples.
Things have got a bit heated. I did not know about (and had not missed) gawk's length(array) extension so I too was hoping to see what sort of scripts use it. In case I'd missed them, I looked through all your
posts in this thread and could not see any examples where length(array)
was used.
Obviously I don't expect you to put in any work looking out such
examples just because I ask, but you might have some to hand, given the interest you have in including it in a future POSIX standard. Use cases would be useful in that context too.
On 3/5/2021 2:20 PM, Ben Bacarisse wrote:
Ed Morton <mortonspam@gmail.com> writes:
On 3/5/2021 10:23 AM, Janis Papanagnou wrote:
...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.
I've given you enough examples.
Things have got a bit heated. I did not know about (and had not missed)
gawk's length(array) extension so I too was hoping to see what sort of
scripts use it. In case I'd missed them, I looked through all your
posts in this thread and could not see any examples where length(array)
was used.
True, the code I posted were examples of cases where length(array)
would be useful instead of the shown code and cases where
length(string) is similarly useful, not necessary, to draw the obvious comparison between the 2 uses for length().
Obviously I don't expect you to put in any work looking out such
examples just because I ask, but you might have some to hand, given the
interest you have in including it in a future POSIX standard. Use cases
would be useful in that context too.
No problem. Consider, for example, this code using length(array) that
stores your input in an array and if some condition is encountered
deletes the entry at index 17 if it exists:
{ arr[++idx] = $0 }
some_condition { delete arr[17] }
END { print "Number of elements:", length(arr) }
vs this if you want to have to introduce a separate variable (`cnt`)
to track the number of elements in the array and remember to increment/decrement it everywhere in your code that the array changes:
{ arr[++idx] = $0; cnt++ }
some_condition {
if ( 17 in arr ) {
delete arr[17]
cnt--
}
}
END { print "Number of elements:", cnt+0 }
As another example, lets imagine you want a function to paste the
elements of 2 numerically indexed arrays side by side so you could do:
BEGIN {
split("sue bob jan",a)
split("sue joe jan alf",b)
print paste(a,b)
}
and get the output:
<sue:sue> <bob:joe> <jan:jan> <:alf>
Here is that function using length(array):
function paste(left,right, i,n,out) {
n = ( length(left) > length(right) ? length(left) : length(right) )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}
and here it is otherwise assuming you don't want to have to keep
separate counters of the sizes of every array, incrementing and
decrementing them every time the array s change, and change the
arguments to the function to pass those in:
function paste(left,right, lgthLeft,lgthRight,i,n,out) {
for (i=1; i in left; i++) {
lgthLeft++
}
for (i=1; i in right; i++) {
lgthRight++
}
n = ( lgthLeft > lgthRight ? lgthLeft : lgthRight )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}
Ed Morton <mortonspam@gmail.com> writes:
On 3/5/2021 2:20 PM, Ben Bacarisse wrote:
Ed Morton <mortonspam@gmail.com> writes:
On 3/5/2021 10:23 AM, Janis Papanagnou wrote:
...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.
I've given you enough examples.
Things have got a bit heated. I did not know about (and had not missed) >>> gawk's length(array) extension so I too was hoping to see what sort of
scripts use it. In case I'd missed them, I looked through all your
posts in this thread and could not see any examples where length(array)
was used.
True, the code I posted were examples of cases where length(array)
would be useful instead of the shown code and cases where
length(string) is similarly useful, not necessary, to draw the obvious
comparison between the 2 uses for length().
Obviously I don't expect you to put in any work looking out such
examples just because I ask, but you might have some to hand, given the
interest you have in including it in a future POSIX standard. Use cases >>> would be useful in that context too.
No problem. Consider, for example, this code using length(array) that
stores your input in an array and if some condition is encountered
deletes the entry at index 17 if it exists:
{ arr[++idx] = $0 }
some_condition { delete arr[17] }
END { print "Number of elements:", length(arr) }
vs this if you want to have to introduce a separate variable (`cnt`)
to track the number of elements in the array and remember to
increment/decrement it everywhere in your code that the array changes:
Is this a case you've come across? It look rather unusual.
{ arr[++idx] = $0; cnt++ }
some_condition {
if ( 17 in arr ) {
delete arr[17]
cnt--
}
}
END { print "Number of elements:", cnt+0 }
My preference would be
{ arr[++added] = $0 }
some_condition {
if ( 17 in arr ) {
delete arr[17]
deleted++
}
}
END { print "Number of elements:", added-deleted }
As another example, lets imagine you want a function to paste the
elements of 2 numerically indexed arrays side by side so you could do:
BEGIN {
split("sue bob jan",a)
split("sue joe jan alf",b)
print paste(a,b)
}
Good example. I can imagine this coming up in real code -- it's just
the function often called zip.
and get the output:
<sue:sue> <bob:joe> <jan:jan> <:alf>
Here is that function using length(array):
function paste(left,right, i,n,out) {
n = ( length(left) > length(right) ? length(left) : length(right) ) >> for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}
I'd write
function paste(left, right, i, out) {
for (i = 1; i in left || i in right; i++)
out = out "<" left[i] ":" right[i] ">" OFS
return out
}
It's shorter and, to me, more obvious without length. It also works
when elements have been deleted.
and here it is otherwise assuming you don't want to have to keep
separate counters of the sizes of every array, incrementing and
decrementing them every time the array s change, and change the
arguments to the function to pass those in:
function paste(left,right, lgthLeft,lgthRight,i,n,out) {
for (i=1; i in left; i++) {
lgthLeft++
}
for (i=1; i in right; i++) {
lgthRight++
}
n = ( lgthLeft > lgthRight ? lgthLeft : lgthRight )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}
Ed Morton <mortonspam@gmail.com> writes:
On 3/5/2021 4:41 PM, Ben Bacarisse wrote:
Ed Morton <mortonspam@gmail.com> writes:
As another example, lets imagine you want a function to paste the
elements of 2 numerically indexed arrays side by side so you could do: >>>>
BEGIN {
split("sue bob jan",a)
split("sue joe jan alf",b)
print paste(a,b)
}
Good example. I can imagine this coming up in real code -- it's just
the function often called zip.
and get the output:
<sue:sue> <bob:joe> <jan:jan> <:alf>
Here is that function using length(array):
function paste(left,right, i,n,out) {
n = ( length(left) > length(right) ? length(left) : length(right) )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}
I'd write
function paste(left, right, i, out) {
for (i = 1; i in left || i in right; i++)
out = out "<" left[i] ":" right[i] ">" OFS
return out
}
It's shorter and, to me, more obvious without length. It also works
when elements have been deleted.
That would fail if the same index had been deleted from both arrays as
the loop would exit at that deleted index instead of continuing to
process the indices after that point.
Thanks for pointing out the bug. I had, in fact, tested this case but I missed it because both functions have a another bug: they modify the
arrays. I'd called yours in the test before mine, and that masked the
bug. Both functions should probably use
out = out "<" (i in left ? left[i] : "") ":"\
(i in right ? right[i] : "") ">" OFS
to avoid altering either the length or the result of any "in" tests.
But the key point is that your version also fails in the face of deleted elements (though obviously in a different way).
It's an interesting problem, but so far, not one that shows that length helps. The fact that the length is adjusted by a delete is unhelpful
when what you really want is the maximum index.
TL;DR: If we can assume contiguous indexes, I prefer my solution above,
and in the presence of deletes, this is my best attempt:
function paste(left, right, i, ub, out) {
ub = 0;
for (i in left) if (i > ub) ub = i;
for (i in right) if (i > ub) ub = i;
for (i = 1; i <= ub; i++)
out = out "<" (i in left ? left[i] : "") ":"\
(i in right ? right[i] : "") ">" OFS
return out
}
Can length(arr) help with this?
On 3/5/2021 4:41 PM, Ben Bacarisse wrote:
Ed Morton <mortonspam@gmail.com> writes:
As another example, lets imagine you want a function to paste the
elements of 2 numerically indexed arrays side by side so you could do:
BEGIN {
split("sue bob jan",a)
split("sue joe jan alf",b)
print paste(a,b)
}
Good example. I can imagine this coming up in real code -- it's just
the function often called zip.
and get the output:
<sue:sue> <bob:joe> <jan:jan> <:alf>
Here is that function using length(array):
function paste(left,right, i,n,out) {
n = ( length(left) > length(right) ? length(left) : length(right) )
for (i=1; i<=n; i++) {
out = out "<"left[i]":"right[i]">" OFS
}
return out
}
I'd write
function paste(left, right, i, out) {
for (i = 1; i in left || i in right; i++)
out = out "<" left[i] ":" right[i] ">" OFS
return out
}
It's shorter and, to me, more obvious without length. It also works
when elements have been deleted.
That would fail if the same index had been deleted from both arrays as
the loop would exit at that deleted index instead of continuing to
process the indices after that point.
... My enthusiasm level is pretty low to invest any more time in this
but just think of any situation where you have N arrays in your code
and a function that takes an array (or multiple arrays) and needs to
know how many elements are in it/them to do something with it/them -
your options are to:
a) keep "numberOfElementsN" variables (or numberAddedN and
numberedDeletedN as you suggested previously) for every such array and
pass those as args to the function or
b) write a loop in the function to count the elements in each array or
c) call length(array) in the function.
IMHO "c" is the clear winner.
Ed Morton <mortonspam@gmail.com> writes:
... My enthusiasm level is pretty low to invest any more time in this
but just think of any situation where you have N arrays in your code
and a function that takes an array (or multiple arrays) and needs to
know how many elements are in it/them to do something with it/them -
your options are to:
a) keep "numberOfElementsN" variables (or numberAddedN and
numberedDeletedN as you suggested previously) for every such array and
pass those as args to the function or
b) write a loop in the function to count the elements in each array or
c) call length(array) in the function.
(a) suggests you are considering deleted elements, but (c) (the number
of elements) is not the right number to let you "do something with
it/them".
IMHO "c" is the clear winner.
I agree, provided no elements might have been deleted. Then you might
want maxindex(arr) to get highest numeric index.
zip problem too. In fact, for arrays with numeric indexes (those where
you need to know how many elements are in it to do something with
it/them), maxindex(arr) (possibly with a matching minindex(arr)) might
be better in almost every case.
For arrays without numeric keys, maxindex(arr) would be useless, but
then length(arr) is not going to be so useful either.
I'd argue for both, but maxindex(arr) is probably not going to as simple
for implementations to add.
Ed Morton <mortonspam@gmail.com> writes:
On 3/5/2021 10:23 AM, Janis Papanagnou wrote:
...I had expected that you could provide some examples from your
practical contexts - I'd expected you have some. So it seems you
have none. That would have been a sufficient answer, either way.
Just claiming it's useful isn't helpful for the question asked.
I've given you enough examples.
Things have got a bit heated. I did not know about (and had not missed) gawk's length(array) extension so I too was hoping to see what sort of scripts use it. In case I'd missed them, I looked through all your
posts in this thread and could not see any examples where length(array)
was used.
Obviously I don't expect you to put in any work looking out such
examples just because I ask, but you might have some to hand, given the interest you have in including it in a future POSIX standard. Use cases would be useful in that context too.
printf "About to process %d addresses\n", length(cnt) | "cat>&2"
FWIW I just used length(array) so figured I'd share how given the above request for use cases. I have an existing (about 10 years old) ~300 line
awk script that analyzes the emails I get at a given account and takes various actions. It looks like this
On 2021-03-08, Ed Morton <mortonspam@gmail.com> wrote:
FWIW I just used length(array) so figured I'd share how given the above
request for use cases. I have an existing (about 10 years old) ~300 line
awk script that analyzes the emails I get at a given account and takes
various actions. It looks like this
My experiments with GNU Awk's length(array) reveal a serious flaw.
If array is not yet defined, then length(array) mutates its state,
turning it into a scalar.
For instance, this program has an issue when the input stream is empty, unless we remove the length(a) line:
function report (a,
i)
{
printf("length(a) = %d\n", length(a))
for (i in a) {
printf("a[%s] = %s\n", i, a[i])
}
}
{ array[NR] = $0 }
END { report(array) }
$ awk -f length.awk
a
b
c
length(a) = 3
a[1] = a
a[2] = b
a[3] = c
$ awk -f length.awk
length(a) = 0
awk: length.awk:6: fatal: attempt to use scalar parameter `a' as an array
The requirement is that if x is undefined, length(x) should just leave
it undefined.
I can obtain a length function with this property if I make it
user-defined.
function array_len(a,
len, i)
{
len = 0
for (i in a)
len++
return len
}
function report (a,
i)
{
printf("length(a) = %d\n", array_len(a))
for (i in a) {
printf("a[%s] = %s\n", i, a[i])
}
}
{ array[NR] = $0 }
END { report(array) }
Now the program works no longer dies with an error message for an empty input.
Why does the built-in length function/operator have to mutate the
variable, but a user-defined function appears to be pure pass-by-value?
That behavior is acceptable if the only operand to which we can apply
length is the character string. If length works only for character
strings, and those are scalar, then it makes sense to infer that x
must be scalar if length(x) has been applied.
(Well, or at least, it makes "sense" if we accept the premise that
strings cannot be traversed with for (i in str), nor accessed
with str[i].)
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 293 |
Nodes: | 16 (2 / 14) |
Uptime: | 216:45:16 |
Calls: | 6,621 |
Calls today: | 3 |
Files: | 12,169 |
Messages: | 5,317,616 |