Under what circumstances will sorting with a "how" value of
"@val_type_asc" produce different results from sorting with
"@val_str_asc"?
On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
Under what circumstances will sorting with a "how" value of
"@val_type_asc" produce different results from sorting with
"@val_str_asc"?
Note how the latter sorts the values numerically because they look like >numbers while the former sorts alphabetically.
On 3 Apr 2021 Ed Morton wrote:
On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
Under what circumstances will sorting with a "how" value of
"@val_type_asc" produce different results from sorting with
"@val_str_asc"?
[snip]
Note how the latter sorts the values numerically because they look likeThanks. Further experiment:
numbers while the former sorts alphabetically.
BEGIN {
split("21 3 red black",a)
a[100]=10
a[20][1]="dot"
sortdemo("@val_str_asc",a)
sortdemo("@val_type_asc",a)
sortdemo("@val_num_asc",a)
}
function sortdemo(format,arr, i)
{
PROCINFO["sorted_in"] = format
print format
for (i in arr) {
printf("%8s ",typeof(arr[i]))
if (isarray(arr[i])==0)
print arr[i]
}
print "\n-----------"
}
@val_str_asc
number 10
strnum 21
strnum 3
string black
string red
array
-----------
@val_type_asc
strnum 3
number 10
strnum 21
string black
string red
array
-----------
@val_num_asc
string black
string red
strnum 3
number 10
strnum 21
array
-----------
So @val_type_asc does exactly the same sort as @val_num_asc, except that
it sorts strings after numbers instead of sorting them according to
their numerical value of zero. The documentation doesn't say that @val_num_asc sorts strings in alphabetical order, but my previous tests suggested that in fact it always does.
All sorts place sub-arrays last.
And, confusingly, despite its name @val_type_asc *doesn't* separate the
type 'strnum' from the type 'number', which would seem to be the
main useful feature of 'sorting by type', but sorts them together. What
is it used for? The only references I could find online were to
scanning FUNCTAB.
On 4/3/2021 4:08 PM, Igenlode Wordsmith wrote:
On 3 Apr 2021 Ed Morton wrote:
On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
Under what circumstances will sorting with a "how" value of
"@val_type_asc" produce different results from sorting with
"@val_str_asc"?
[snip]
Note how the latter sorts the values numerically because they look likeThanks. Further experiment:
numbers while the former sorts alphabetically.
BEGIN {
split("21 3 red black",a)
a[100]=10
a[20][1]="dot"
sortdemo("@val_str_asc",a)
sortdemo("@val_type_asc",a)
sortdemo("@val_num_asc",a)
}
function sortdemo(format,arr, i)
{
PROCINFO["sorted_in"] = format
print format
for (i in arr) {
printf("%8s ",typeof(arr[i]))
if (isarray(arr[i])==0)
print arr[i]
}
print "\n-----------"
}
@val_str_asc
number 10
strnum 21
strnum 3
string black
string red
array
-----------
@val_type_asc
strnum 3
number 10
strnum 21
string black
string red
array
-----------
@val_num_asc
string black
string red
strnum 3
number 10
strnum 21
array
-----------
In my tests of seemingly IDENTICALLY numerically valued data,So @val_type_asc does exactly the same sort as @val_num_asc, except that
it sorts strings after numbers instead of sorting them according to
their numerical value of zero. The documentation doesn't say that @val_num_asc sorts strings in alphabetical order, but my previous tests suggested that in fact it always does.
All sorts place sub-arrays last.Well, they have to go first or last, right? I mean arrays can't suddenly appear in the middle of the sorted strings or sorted numbers and it
doesn't make sense for them to appear between the two.
And, confusingly, despite its name @val_type_asc *doesn't* separate the type 'strnum' from the type 'number', which would seem to be theI think you're overthinking this. @val_type_asc means if the values are scalar numbers they are sorted as numbers, otherwise if they're scalar they're sorted as strings. Apparently numbers get printed before strings (maybe a locale thing, idk) and subarrays get printed last.
main useful feature of 'sorting by type', but sorts them together. What
is it used for? The only references I could find online were to
scanning FUNCTAB.
Given this script:
$ cat tst.awk
{ a[NR] = $1 }
END { sortdemo(fmt,a) }
function sortdemo(format,arr, i)
{
PROCINFO["sorted_in"] = format
print format
for (i in arr) {
printf("%8s ",typeof(arr[i]))
if (isarray(arr[i])==0)
print arr[i]
}
}
Look at the consistency with sort for string and numeric sorting:
$ printf '2\n11\nb\nc\n' | sort
11
2
b
c
$
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_str_asc' -f tst.awk
@val_str_asc
strnum 11
strnum 2
string b
string c
---------------
$ printf '2\n11\nb\nc\n' | sort -n
b
c
2
11
$
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_num_asc' -f tst.awk
@val_num_asc
string b
string c
strnum 2
strnum 11
and then it's very obvious what the "type" sort will do, i.e. print
numbers before strings just like a string sort but this time with the
numbers sorted numerically rather than alphabetically:
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_type_asc' -f tst.awk @val_type_asc
strnum 2
strnum 11
string b
string c
Hope that helps.
Ed.
On Saturday, 3 April 2021 at 20:05:03 UTC-4, Ed Morton wrote:
On 4/3/2021 4:08 PM, Igenlode Wordsmith wrote:In my tests of seemingly IDENTICALLY numerically valued data,
On 3 Apr 2021 Ed Morton wrote:Well, they have to go first or last, right? I mean arrays can't suddenly
On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
Under what circumstances will sorting with a "how" value of
"@val_type_asc" produce different results from sorting with
"@val_str_asc"?
[snip]
Note how the latter sorts the values numerically because they look like >>>> numbers while the former sorts alphabetically.Thanks. Further experiment:
BEGIN {
split("21 3 red black",a)
a[100]=10
a[20][1]="dot"
sortdemo("@val_str_asc",a)
sortdemo("@val_type_asc",a)
sortdemo("@val_num_asc",a)
}
function sortdemo(format,arr, i)
{
PROCINFO["sorted_in"] = format
print format
for (i in arr) {
printf("%8s ",typeof(arr[i]))
if (isarray(arr[i])==0)
print arr[i]
}
print "\n-----------"
}
@val_str_asc
number 10
strnum 21
strnum 3
string black
string red
array
-----------
@val_type_asc
strnum 3
number 10
strnum 21
string black
string red
array
-----------
@val_num_asc
string black
string red
strnum 3
number 10
strnum 21
array
-----------
So @val_type_asc does exactly the same sort as @val_num_asc, except that >>> it sorts strings after numbers instead of sorting them according to
their numerical value of zero. The documentation doesn't say that
@val_num_asc sorts strings in alphabetical order, but my previous tests
suggested that in fact it always does.
All sorts place sub-arrays last.
appear in the middle of the sorted strings or sorted numbers and it
doesn't make sense for them to appear between the two.
And, confusingly, despite its name @val_type_asc *doesn't* separate theI think you're overthinking this. @val_type_asc means if the values are
type 'strnum' from the type 'number', which would seem to be the
main useful feature of 'sorting by type', but sorts them together. What
is it used for? The only references I could find online were to
scanning FUNCTAB.
scalar numbers they are sorted as numbers, otherwise if they're scalar
they're sorted as strings. Apparently numbers get printed before strings
(maybe a locale thing, idk) and subarrays get printed last.
Given this script:
$ cat tst.awk
{ a[NR] = $1 }
END { sortdemo(fmt,a) }
function sortdemo(format,arr, i)
{
PROCINFO["sorted_in"] = format
print format
for (i in arr) {
printf("%8s ",typeof(arr[i]))
if (isarray(arr[i])==0)
print arr[i]
}
}
Look at the consistency with sort for string and numeric sorting:
$ printf '2\n11\nb\nc\n' | sort
11
2
b
c
$
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_str_asc' -f tst.awk
@val_str_asc
strnum 11
strnum 2
string b
string c
---------------
$ printf '2\n11\nb\nc\n' | sort -n
b
c
2
11
$
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_num_asc' -f tst.awk
@val_num_asc
string b
string c
strnum 2
strnum 11
and then it's very obvious what the "type" sort will do, i.e. print
numbers before strings just like a string sort but this time with the
numbers sorted numerically rather than alphabetically:
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_type_asc' -f tst.awk
@val_type_asc
strnum 2
strnum 11
string b
string c
Hope that helps.
Ed.
the sort order of "@val_str_asc" == (as strings, then INDEX strings)
('INDEX strings' appears to be sorted by "@ind_str_asc")
versus sort order of "@val_num_asc" == (as numeric, then as strings, then INDEX strings)
thus essentially "@val_num_asc"==(as numeric, then "@val_str_asc") The output order of @val_type_asc is not the same, as expected.
Since all my numeric values seem to be the same, "@val_num_asc" shifts to "@val_str_asc", which sorts string versions into SUBGROUPS of identical strings
and, lastly, sorts the subgroups by the index string values of the array. ("@ind_str_asc")
Scrambling the index string values for the same data values sorts as expected, as above.
This behavior appears to be EXACTLY as advertised. (? true for POSIX, now or future ?)
Conclusion: WATCH your array[indices]= (identical data values). Hope this helps someone, somewhere.
Below is one of my tests to illustrate "seemingly IDENTICALLY numerically valued data" (except 321.0 is a "string")
The spaces are intentional and strings of length 5 with whitespace sorts into a different subgroup than ones with 4, etc.
BEGIN{
split(" 321 ;321 ;321 ",strnumbr,";")
a["A"]="321.0"
a["G"]=strnumbr[3]
a["C"]=321
a["D"]=strnumbr[1]
a["E"]="321 "
a["B"]=strnumbr[2]
a["F"]="321"
a["x"]=strnumbr[3]
a["z"]=strnumbr[1]
a["y"]=strnumbr[2]then INDEX string values") sortdemo("@val_num_asc",a,"scalars as num, then all as string, then INDEX string values")
sortdemo("@val_str_asc",a,"scalar as string, then INDEX string values") sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
}
output sort: @val_str_asc == @val_num_asc == D z C F B E G x y A
On 4/4/2021 3:00 PM, J Naman wrote:Ed, I made a lot of tests, including on the subgroup array indices, and have 42 test output files. However, no matter what index sort order, unsorted or some sort, programmers should at least 1) be reminded that the User Guide says indices are used last (
On Saturday, 3 April 2021 at 20:05:03 UTC-4, Ed Morton wrote:
On 4/3/2021 4:08 PM, Igenlode Wordsmith wrote:In my tests of seemingly IDENTICALLY numerically valued data,
On 3 Apr 2021 Ed Morton wrote:Well, they have to go first or last, right? I mean arrays can't suddenly >> appear in the middle of the sorted strings or sorted numbers and it
On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
Under what circumstances will sorting with a "how" value of
"@val_type_asc" produce different results from sorting with
"@val_str_asc"?
[snip]
Note how the latter sorts the values numerically because they look like >>>> numbers while the former sorts alphabetically.Thanks. Further experiment:
BEGIN {
split("21 3 red black",a)
a[100]=10
a[20][1]="dot"
sortdemo("@val_str_asc",a)
sortdemo("@val_type_asc",a)
sortdemo("@val_num_asc",a)
}
function sortdemo(format,arr, i)
{
PROCINFO["sorted_in"] = format
print format
for (i in arr) {
printf("%8s ",typeof(arr[i]))
if (isarray(arr[i])==0)
print arr[i]
}
print "\n-----------"
}
@val_str_asc
number 10
strnum 21
strnum 3
string black
string red
array
-----------
@val_type_asc
strnum 3
number 10
strnum 21
string black
string red
array
-----------
@val_num_asc
string black
string red
strnum 3
number 10
strnum 21
array
-----------
So @val_type_asc does exactly the same sort as @val_num_asc, except that >>> it sorts strings after numbers instead of sorting them according to
their numerical value of zero. The documentation doesn't say that
@val_num_asc sorts strings in alphabetical order, but my previous tests >>> suggested that in fact it always does.
All sorts place sub-arrays last.
doesn't make sense for them to appear between the two.
And, confusingly, despite its name @val_type_asc *doesn't* separate the >>> type 'strnum' from the type 'number', which would seem to be theI think you're overthinking this. @val_type_asc means if the values are >> scalar numbers they are sorted as numbers, otherwise if they're scalar
main useful feature of 'sorting by type', but sorts them together. What >>> is it used for? The only references I could find online were to
scanning FUNCTAB.
they're sorted as strings. Apparently numbers get printed before strings >> (maybe a locale thing, idk) and subarrays get printed last.
Given this script:
$ cat tst.awk
{ a[NR] = $1 }
END { sortdemo(fmt,a) }
function sortdemo(format,arr, i)
{
PROCINFO["sorted_in"] = format
print format
for (i in arr) {
printf("%8s ",typeof(arr[i]))
if (isarray(arr[i])==0)
print arr[i]
}
}
Look at the consistency with sort for string and numeric sorting:
$ printf '2\n11\nb\nc\n' | sort
11
2
b
c
$
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_str_asc' -f tst.awk
@val_str_asc
strnum 11
strnum 2
string b
string c
---------------
$ printf '2\n11\nb\nc\n' | sort -n
b
c
2
11
$
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_num_asc' -f tst.awk
@val_num_asc
string b
string c
strnum 2
strnum 11
and then it's very obvious what the "type" sort will do, i.e. print
numbers before strings just like a string sort but this time with the
numbers sorted numerically rather than alphabetically:
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_type_asc' -f tst.awk
@val_type_asc
strnum 2
strnum 11
string b
string c
Hope that helps.
Ed.
the sort order of "@val_str_asc" == (as strings, then INDEX strings) ('INDEX strings' appears to be sorted by "@ind_str_asc")
versus sort order of "@val_num_asc" == (as numeric, then as strings, then INDEX strings)
thus essentially "@val_num_asc"==(as numeric, then "@val_str_asc")
The output order of @val_type_asc is not the same, as expected.
Since all my numeric values seem to be the same, "@val_num_asc" shifts to "@val_str_asc", which sorts string versions into SUBGROUPS of identical strings
and, lastly, sorts the subgroups by the index string values of the array. ("@ind_str_asc")
Scrambling the index string values for the same data values sorts as expected, as above.
This behavior appears to be EXACTLY as advertised. (? true for POSIX, now or future ?)
Conclusion: WATCH your array[indices]= (identical data values). Hope this helps someone, somewhere.
Below is one of my tests to illustrate "seemingly IDENTICALLY numerically valued data" (except 321.0 is a "string")
The spaces are intentional and strings of length 5 with whitespace sorts into a different subgroup than ones with 4, etc.
BEGIN{
split(" 321 ;321 ;321 ",strnumbr,";")
a["A"]="321.0"
a["G"]=strnumbr[3]
a["C"]=321
a["D"]=strnumbr[1]
a["E"]="321 "
a["B"]=strnumbr[2]
a["F"]="321"
a["x"]=strnumbr[3]
a["z"]=strnumbr[1]
a["y"]=strnumbr[2]then INDEX string values") sortdemo("@val_num_asc",a,"scalars as num, then all as string, then INDEX string values")
sortdemo("@val_str_asc",a,"scalar as string, then INDEX string values") sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
}
output sort: @val_str_asc == @val_num_asc == D z C F B E G x y A
It seems like what you're saying above is that awk does the equivalent
for arrays as GNU sort does when called with `-s` for "stable sorting",
i.e. preserves original (index) order for duplicate key values:
$ printf ' 321 z\n321 y\n321 w\n' | sort -k1,1n
321 z
321 w
321 y
$ printf ' 321 z\n321 y\n321 w\n' | sort -s -k1,1n
321 z
321 y
321 w
I wonder if that's true, though, or if you just happened to get that
order for duplicates just like you will often get alphabetic or
numerically ordered output from `for (i in arr)` without specifying any order at all just due to how they fell in the hash.
I don't see anything about that in the gawk manual so I suspect the
ordering you're seeing for duplicates is just coincidence as I doubt the gawk implementers would sacrifice speed of execution to do a
second-level string sort on indices that the user didn't ask for and is probably not useful.
Ed.
On Monday, 5 April 2021 at 08:53:13 UTC-4, Ed Morton wrote:last (to break ties) and 2) keep aware of their subgroup index order (maybe want to index sort themselves) and how that may change if they append new data. Just a caution sign to save a lot of debugging if their results seemed odd or seemingly
On 4/4/2021 3:00 PM, J Naman wrote:Ed, I made a lot of tests, including on the subgroup array indices, and have 42 test output files. However, no matter what index sort order, unsorted or some sort, programmers should at least 1) be reminded that the User Guide says indices are used
On Saturday, 3 April 2021 at 20:05:03 UTC-4, Ed Morton wrote:It seems like what you're saying above is that awk does the equivalent
On 4/3/2021 4:08 PM, Igenlode Wordsmith wrote:In my tests of seemingly IDENTICALLY numerically valued data,
On 3 Apr 2021 Ed Morton wrote:Well, they have to go first or last, right? I mean arrays can't suddenly >>>> appear in the middle of the sorted strings or sorted numbers and it
On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
Under what circumstances will sorting with a "how" value of
"@val_type_asc" produce different results from sorting with
"@val_str_asc"?
[snip]
Note how the latter sorts the values numerically because they look like >>>>>> numbers while the former sorts alphabetically.Thanks. Further experiment:
BEGIN {
split("21 3 red black",a)
a[100]=10
a[20][1]="dot"
sortdemo("@val_str_asc",a)
sortdemo("@val_type_asc",a)
sortdemo("@val_num_asc",a)
}
function sortdemo(format,arr, i)
{
PROCINFO["sorted_in"] = format
print format
for (i in arr) {
printf("%8s ",typeof(arr[i]))
if (isarray(arr[i])==0)
print arr[i]
}
print "\n-----------"
}
@val_str_asc
number 10
strnum 21
strnum 3
string black
string red
array
-----------
@val_type_asc
strnum 3
number 10
strnum 21
string black
string red
array
-----------
@val_num_asc
string black
string red
strnum 3
number 10
strnum 21
array
-----------
So @val_type_asc does exactly the same sort as @val_num_asc, except that >>>>> it sorts strings after numbers instead of sorting them according to
their numerical value of zero. The documentation doesn't say that
@val_num_asc sorts strings in alphabetical order, but my previous tests >>>>> suggested that in fact it always does.
All sorts place sub-arrays last.
doesn't make sense for them to appear between the two.
And, confusingly, despite its name @val_type_asc *doesn't* separate the >>>>> type 'strnum' from the type 'number', which would seem to be theI think you're overthinking this. @val_type_asc means if the values are >>>> scalar numbers they are sorted as numbers, otherwise if they're scalar >>>> they're sorted as strings. Apparently numbers get printed before strings >>>> (maybe a locale thing, idk) and subarrays get printed last.
main useful feature of 'sorting by type', but sorts them together. What >>>>> is it used for? The only references I could find online were to
scanning FUNCTAB.
Given this script:
$ cat tst.awk
{ a[NR] = $1 }
END { sortdemo(fmt,a) }
function sortdemo(format,arr, i)
{
PROCINFO["sorted_in"] = format
print format
for (i in arr) {
printf("%8s ",typeof(arr[i]))
if (isarray(arr[i])==0)
print arr[i]
}
}
Look at the consistency with sort for string and numeric sorting:
$ printf '2\n11\nb\nc\n' | sort
11
2
b
c
$
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_str_asc' -f tst.awk
@val_str_asc
strnum 11
strnum 2
string b
string c
---------------
$ printf '2\n11\nb\nc\n' | sort -n
b
c
2
11
$
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_num_asc' -f tst.awk
@val_num_asc
string b
string c
strnum 2
strnum 11
and then it's very obvious what the "type" sort will do, i.e. print
numbers before strings just like a string sort but this time with the
numbers sorted numerically rather than alphabetically:
$ printf '2\n11\nb\nc\n' | awk -v fmt='@val_type_asc' -f tst.awk
@val_type_asc
strnum 2
strnum 11
string b
string c
Hope that helps.
Ed.
the sort order of "@val_str_asc" == (as strings, then INDEX strings)
('INDEX strings' appears to be sorted by "@ind_str_asc")
versus sort order of "@val_num_asc" == (as numeric, then as strings, then INDEX strings)
thus essentially "@val_num_asc"==(as numeric, then "@val_str_asc")
The output order of @val_type_asc is not the same, as expected.
Since all my numeric values seem to be the same, "@val_num_asc" shifts to >>> "@val_str_asc", which sorts string versions into SUBGROUPS of identical strings
and, lastly, sorts the subgroups by the index string values of the array. ("@ind_str_asc")
Scrambling the index string values for the same data values sorts as expected, as above.
This behavior appears to be EXACTLY as advertised. (? true for POSIX, now or future ?)
Conclusion: WATCH your array[indices]= (identical data values). Hope this helps someone, somewhere.
Below is one of my tests to illustrate "seemingly IDENTICALLY numerically valued data" (except 321.0 is a "string")
The spaces are intentional and strings of length 5 with whitespace sorts into a different subgroup than ones with 4, etc.
BEGIN{
split(" 321 ;321 ;321 ",strnumbr,";")
a["A"]="321.0"
a["G"]=strnumbr[3]
a["C"]=321
a["D"]=strnumbr[1]
a["E"]="321 "
a["B"]=strnumbr[2]
a["F"]="321"
a["x"]=strnumbr[3]
a["z"]=strnumbr[1]
a["y"]=strnumbr[2]then INDEX string values")
sortdemo("@val_num_asc",a,"scalars as num, then all as string, then INDEX string values")
sortdemo("@val_str_asc",a,"scalar as string, then INDEX string values")
sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
}
output sort: @val_str_asc == @val_num_asc == D z C F B E G x y A
for arrays as GNU sort does when called with `-s` for "stable sorting",
i.e. preserves original (index) order for duplicate key values:
$ printf ' 321 z\n321 y\n321 w\n' | sort -k1,1n
321 z
321 w
321 y
$ printf ' 321 z\n321 y\n321 w\n' | sort -s -k1,1n
321 z
321 y
321 w
I wonder if that's true, though, or if you just happened to get that
order for duplicates just like you will often get alphabetic or
numerically ordered output from `for (i in arr)` without specifying any
order at all just due to how they fell in the hash.
I don't see anything about that in the gawk manual so I suspect the
ordering you're seeing for duplicates is just coincidence as I doubt the
gawk implementers would sacrifice speed of execution to do a
second-level string sort on indices that the user didn't ask for and is
probably not useful.
Ed.
My short answer is that I ran the SAME VALUE DATA twice, only changing the indices so that I could see if the index sort was "unsorted" (hypothesis 0) or any real sort. Ver 1 indices are "A" -> "z" (testing upper and lower case) & Ver 2 is the SAMEalpha index with a digit in front, e.g. "2z", "1y", "5B", etc. The results are summarized below:
Srt order 1 @val_num_asc= 2z 7D 4F 8C 1y 3x 5B 6E 9G aA
Srt order 2 @val_num_asc= D z C F B E G x y A
The subgroup indices sorted in digit order vs alpha order. It can't be a coincidence. I didn't want to bother Arnold/Aharon Robbins. If it is important for POSIX, etc. he can decide about further documenting it for GAWK, *Awks, etc.
I'll be happy to email actual code and let you or others see actual results. It seemed too long for a Forum post. One set of test input lines are:
split(" 321 ;321 ;321 ",strnumbr,";") split(" 321 ;321 ;321 ",strnumbr2,";") a["aA"]="321.0" b["A"]="321.0"
a["9G"]=strnumbr[3] b["G"]=strnumbr2[3]
a["8C"]=321 b["C"]=321
a["7D"]=strnumbr[1] b["D"]=strnumbr2[1]
a["6E"]="321 " b["E"]="321 "
a["5B"]=strnumbr[2] b["B"]=strnumbr2[2]
a["4F"]="321" b["F"]="321"
a["3x"]=strnumbr[3] b["x"]=strnumbr2[3]
a["2z"]=strnumbr[1] b["z"]=strnumbr2[1]
a["1y"]=strnumbr[2] b["y"]=strnumbr2[2]
Let me know if anyone wants the full actual source, 54 lines w/out comments, to this group or individual email.
'Best, John Naman, retired PhD who (over)analyzes just about everything.
In article <s4f17n$6e9$1@dont-email.me>,
Ed Morton <mortonspam@gmail.com> wrote:
I wonder if that's true, though, or if you just happened to get that
order for duplicates just like you will often get alphabetic or
numerically ordered output from `for (i in arr)` without specifying any
order at all just due to how they fell in the hash.
I don't see anything about that in the gawk manual so I suspect the
ordering you're seeing for duplicates is just coincidence as I doubt the
gawk implementers would sacrifice speed of execution to do a
second-level string sort on indices that the user didn't ask for and is
probably not useful.
Ed,
Consider:
a[1] = 42
a[2] = 24
a[3] = 42
If sorting by numeric value, how do you determine whether a[1]'s value
comes first, or a[3]'s?
value of the index string to do that, essentially as a last resort.
This is documented at https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning
which says:
When sorting an array by element values, if a value happens to be
a subarray then it is considered to be greater than any string or
numeric value, regardless of what the subarray itself contains,
and all subarrays are treated as being equal to each other. Their
order relative to each other is determined by their index strings.
The code is available to peruse and is also pretty clear.
Let's put this thread to rest please.
Arnold
I wonder if that's true, though, or if you just happened to get that
order for duplicates just like you will often get alphabetic or
numerically ordered output from `for (i in arr)` without specifying any
order at all just due to how they fell in the hash.
I don't see anything about that in the gawk manual so I suspect the
ordering you're seeing for duplicates is just coincidence as I doubt the
gawk implementers would sacrifice speed of execution to do a
second-level string sort on indices that the user didn't ask for and is >probably not useful.
In article <s4fjq4$jcm$1@dont-email.me>,
Ed Morton <mortonspam@gmail.com> wrote:
If sorting by numeric value, how do you determine whether a[1]'s value
comes first, or a[3]'s?
The same way you determine which value to print if you _aren't_ sorting
by values, i.e. hash order I think is common, would seem to be the most
obvious approach.
Hash order isn't available at that point in the code. The index values provide a sensible and unambiguous way to order values when values
are identical.
The code has to make a choice. It uses the
value of the index string to do that, essentially as a last resort.
There's no reason to think the string order of the indices is any more
appropriate than any other order as the indices typically have nothing
at all to do with the order of the values. It's exactly the same
argument that explains the order of a plain old `for (i in array)` being
undefined.
No, we're sorting. That's not the same as for (i in array).
This is documented at
https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning
which says:
When sorting an array by element values, if a value happens to be
a subarray then it is considered to be greater than any string or
numeric value, regardless of what the subarray itself contains,
and all subarrays are treated as being equal to each other. Their
order relative to each other is determined by their index strings.
That paragraph seems to be describing the order that subarrays will be
visited, not about the order of duplicate scalar values.
Those same keywords may be used with asort and asorti.
I'm done.
If sorting by numeric value, how do you determine whether a[1]'s value
comes first, or a[3]'s?
The same way you determine which value to print if you _aren't_ sorting
by values, i.e. hash order I think is common, would seem to be the most >obvious approach.
The code has to make a choice. It uses the
value of the index string to do that, essentially as a last resort.
There's no reason to think the string order of the indices is any more >appropriate than any other order as the indices typically have nothing
at all to do with the order of the values. It's exactly the same
argument that explains the order of a plain old `for (i in array)` being >undefined.
This is documented at
https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning
which says:
When sorting an array by element values, if a value happens to be
a subarray then it is considered to be greater than any string or
numeric value, regardless of what the subarray itself contains,
and all subarrays are treated as being equal to each other. Their
order relative to each other is determined by their index strings.
That paragraph seems to be describing the order that subarrays will be >visited, not about the order of duplicate scalar values.
In article <s4fq2i$eg$1@dont-email.me>,
Ed Morton <mortonspam@gmail.com> wrote:
There's no reason to think the string order of the indices is any more >>>> appropriate than any other order as the indices typically have nothing >>>> at all to do with the order of the values. It's exactly the same
argument that explains the order of a plain old `for (i in array)` being >>>> undefined.
No, we're sorting. That's not the same as for (i in array).
Yes it is the same because the sorting that the caller asked for has
already been done at that point,
No. The code under discussion is the comparison function used by the C
qsort function to *do the sorting*. Given two identical values, it has
to decide which one sorts before the other. It uses the index value to
do this.
If I tell you gawk works a particular way for a particular reason,
there's a very good chance that I wrote the code and that I know what
I'm talking about.
There's no reason to think the string order of the indices is any more
appropriate than any other order as the indices typically have nothing
at all to do with the order of the values. It's exactly the same
argument that explains the order of a plain old `for (i in array)` being >>> undefined.
No, we're sorting. That's not the same as for (i in array).
Yes it is the same because the sorting that the caller asked for has
already been done at that point,
On 4/3/2021 4:08 PM, Igenlode Wordsmith wrote:
[snip]So @val_type_asc does exactly the same sort as @val_num_asc, except that
it sorts strings after numbers instead of sorting them according to
their numerical value of zero.
And, confusingly, despite its name @val_type_asc *doesn't* separate the
type 'strnum' from the type 'number', which would seem to be the
main useful feature of 'sorting by type', but sorts them together. What
is it used for? The only references I could find online were to
scanning FUNCTAB.
I think you're overthinking this. @val_type_asc means if the values are >scalar numbers they are sorted as numbers, otherwise if they're scalar >they're sorted as strings.
Apparently numbers get printed before strings (maybe a locale thing,
idk) and subarrays get printed last.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 285 |
Nodes: | 16 (2 / 14) |
Uptime: | 63:22:32 |
Calls: | 6,488 |
Calls today: | 1 |
Files: | 12,096 |
Messages: | 5,274,677 |