• val_type_asc and val_str_asc

    From Igenlode Wordsmith@21:1/5 to All on Fri Apr 2 17:06:39 2021
    Under what circumstances will sorting with a "how" value of
    "@val_type_asc" produce different results from sorting with
    "@val_str_asc"? I thought that it would at least differentiate between numbers and strnum values, but in practice it doesn't seem to do so.
    And both sorts appear to separate out subarrays at the end of the
    listing.

    The only online references I can find to "@val_type_asc" seem to be in
    the context of traversing the FUNCTAB predefined array.

    --
    Igenlode Visit the Ivory Tower http://ivory.ueuo.com/Tower/

    Those jaded in their emotions demand monstrous things to arouse them

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Igenlode Wordsmith on Sat Apr 3 14:26:09 2021
    On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
    Under what circumstances will sorting with a "how" value of
    "@val_type_asc" produce different results from sorting with
    "@val_str_asc"?

    $ cat tst.awk
    BEGIN {
    split("21 3",a)

    PROCINFO["sorted_in"] = "@val_str_asc"
    for (i in a) {
    print PROCINFO["sorted_in"], a[i]
    }

    print "-----------"

    PROCINFO["sorted_in"] = "@val_type_asc"
    for (i in a) {
    print PROCINFO["sorted_in"], a[i]
    }
    }

    $ awk -f tst.awk
    @val_str_asc 21
    @val_str_asc 3
    -----------
    @val_type_asc 3
    @val_type_asc 21

    Note how the latter sorts the values numerically because they look like
    numbers while the former sorts alphabetically.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Igenlode Wordsmith@21:1/5 to All on Sat Apr 3 22:08:28 2021
    On 3 Apr 2021 Ed Morton wrote:

    On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
    Under what circumstances will sorting with a "how" value of
    "@val_type_asc" produce different results from sorting with
    "@val_str_asc"?

    [snip]


    Note how the latter sorts the values numerically because they look like >numbers while the former sorts alphabetically.

    Thanks. Further experiment:

    BEGIN {
    split("21 3 red black",a)
    a[100]=10
    a[20][1]="dot"

    sortdemo("@val_str_asc",a)
    sortdemo("@val_type_asc",a)
    sortdemo("@val_num_asc",a)
    }

    function sortdemo(format,arr, i)
    {
    PROCINFO["sorted_in"] = format
    print format
    for (i in arr) {
    printf("%8s ",typeof(arr[i]))
    if (isarray(arr[i])==0)
    print arr[i]
    }
    print "\n-----------"
    }

    @val_str_asc
    number 10
    strnum 21
    strnum 3
    string black
    string red
    array
    -----------
    @val_type_asc
    strnum 3
    number 10
    strnum 21
    string black
    string red
    array
    -----------
    @val_num_asc
    string black
    string red
    strnum 3
    number 10
    strnum 21
    array
    -----------



    So @val_type_asc does exactly the same sort as @val_num_asc, except that
    it sorts strings after numbers instead of sorting them according to
    their numerical value of zero. The documentation doesn't say that
    @val_num_asc sorts strings in alphabetical order, but my previous tests suggested that in fact it always does.

    All sorts place sub-arrays last.

    And, confusingly, despite its name @val_type_asc *doesn't* separate the
    type 'strnum' from the type 'number', which would seem to be the
    main useful feature of 'sorting by type', but sorts them together. What
    is it used for? The only references I could find online were to
    scanning FUNCTAB.


    --
    Igenlode Visit the Ivory Tower http://ivory.ueuo.com/Tower/

    Buster Keaton fan http://uk.imdb.com/mymovies/list?l=17884208

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Igenlode Wordsmith on Sat Apr 3 19:05:00 2021
    On 4/3/2021 4:08 PM, Igenlode Wordsmith wrote:
    On 3 Apr 2021 Ed Morton wrote:

    On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
    Under what circumstances will sorting with a "how" value of
    "@val_type_asc" produce different results from sorting with
    "@val_str_asc"?

    [snip]


    Note how the latter sorts the values numerically because they look like
    numbers while the former sorts alphabetically.

    Thanks. Further experiment:

    BEGIN {
    split("21 3 red black",a)
    a[100]=10
    a[20][1]="dot"

    sortdemo("@val_str_asc",a)
    sortdemo("@val_type_asc",a)
    sortdemo("@val_num_asc",a)
    }

    function sortdemo(format,arr, i)
    {
    PROCINFO["sorted_in"] = format
    print format
    for (i in arr) {
    printf("%8s ",typeof(arr[i]))
    if (isarray(arr[i])==0)
    print arr[i]
    }
    print "\n-----------"
    }

    @val_str_asc
    number 10
    strnum 21
    strnum 3
    string black
    string red
    array
    -----------
    @val_type_asc
    strnum 3
    number 10
    strnum 21
    string black
    string red
    array
    -----------
    @val_num_asc
    string black
    string red
    strnum 3
    number 10
    strnum 21
    array
    -----------



    So @val_type_asc does exactly the same sort as @val_num_asc, except that
    it sorts strings after numbers instead of sorting them according to
    their numerical value of zero. The documentation doesn't say that @val_num_asc sorts strings in alphabetical order, but my previous tests suggested that in fact it always does.

    All sorts place sub-arrays last.

    Well, they have to go first or last, right? I mean arrays can't suddenly
    appear in the middle of the sorted strings or sorted numbers and it
    doesn't make sense for them to appear between the two.

    And, confusingly, despite its name @val_type_asc *doesn't* separate the
    type 'strnum' from the type 'number', which would seem to be the
    main useful feature of 'sorting by type', but sorts them together. What
    is it used for? The only references I could find online were to
    scanning FUNCTAB.

    I think you're overthinking this. @val_type_asc means if the values are
    scalar numbers they are sorted as numbers, otherwise if they're scalar
    they're sorted as strings. Apparently numbers get printed before strings
    (maybe a locale thing, idk) and subarrays get printed last.

    Given this script:

    $ cat tst.awk
    { a[NR] = $1 }
    END { sortdemo(fmt,a) }

    function sortdemo(format,arr, i)
    {
    PROCINFO["sorted_in"] = format
    print format
    for (i in arr) {
    printf("%8s ",typeof(arr[i]))
    if (isarray(arr[i])==0)
    print arr[i]
    }
    }


    Look at the consistency with sort for string and numeric sorting:

    $ printf '2\n11\nb\nc\n' | sort
    11
    2
    b
    c
    $
    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_str_asc' -f tst.awk
    @val_str_asc
    strnum 11
    strnum 2
    string b
    string c

    ---------------

    $ printf '2\n11\nb\nc\n' | sort -n
    b
    c
    2
    11
    $
    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_num_asc' -f tst.awk
    @val_num_asc
    string b
    string c
    strnum 2
    strnum 11

    and then it's very obvious what the "type" sort will do, i.e. print
    numbers before strings just like a string sort but this time with the
    numbers sorted numerically rather than alphabetically:

    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_type_asc' -f tst.awk
    @val_type_asc
    strnum 2
    strnum 11
    string b
    string c

    Hope that helps.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J Naman@21:1/5 to Ed Morton on Sun Apr 4 13:00:14 2021
    On Saturday, 3 April 2021 at 20:05:03 UTC-4, Ed Morton wrote:
    On 4/3/2021 4:08 PM, Igenlode Wordsmith wrote:
    On 3 Apr 2021 Ed Morton wrote:

    On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
    Under what circumstances will sorting with a "how" value of
    "@val_type_asc" produce different results from sorting with
    "@val_str_asc"?

    [snip]


    Note how the latter sorts the values numerically because they look like
    numbers while the former sorts alphabetically.

    Thanks. Further experiment:

    BEGIN {
    split("21 3 red black",a)
    a[100]=10
    a[20][1]="dot"

    sortdemo("@val_str_asc",a)
    sortdemo("@val_type_asc",a)
    sortdemo("@val_num_asc",a)
    }

    function sortdemo(format,arr, i)
    {
    PROCINFO["sorted_in"] = format
    print format
    for (i in arr) {
    printf("%8s ",typeof(arr[i]))
    if (isarray(arr[i])==0)
    print arr[i]
    }
    print "\n-----------"
    }

    @val_str_asc
    number 10
    strnum 21
    strnum 3
    string black
    string red
    array
    -----------
    @val_type_asc
    strnum 3
    number 10
    strnum 21
    string black
    string red
    array
    -----------
    @val_num_asc
    string black
    string red
    strnum 3
    number 10
    strnum 21
    array
    -----------



    So @val_type_asc does exactly the same sort as @val_num_asc, except that
    it sorts strings after numbers instead of sorting them according to
    their numerical value of zero. The documentation doesn't say that @val_num_asc sorts strings in alphabetical order, but my previous tests suggested that in fact it always does.

    All sorts place sub-arrays last.
    Well, they have to go first or last, right? I mean arrays can't suddenly appear in the middle of the sorted strings or sorted numbers and it
    doesn't make sense for them to appear between the two.
    And, confusingly, despite its name @val_type_asc *doesn't* separate the type 'strnum' from the type 'number', which would seem to be the
    main useful feature of 'sorting by type', but sorts them together. What
    is it used for? The only references I could find online were to
    scanning FUNCTAB.
    I think you're overthinking this. @val_type_asc means if the values are scalar numbers they are sorted as numbers, otherwise if they're scalar they're sorted as strings. Apparently numbers get printed before strings (maybe a locale thing, idk) and subarrays get printed last.

    Given this script:

    $ cat tst.awk
    { a[NR] = $1 }
    END { sortdemo(fmt,a) }
    function sortdemo(format,arr, i)
    {
    PROCINFO["sorted_in"] = format
    print format
    for (i in arr) {
    printf("%8s ",typeof(arr[i]))
    if (isarray(arr[i])==0)
    print arr[i]
    }
    }
    Look at the consistency with sort for string and numeric sorting:

    $ printf '2\n11\nb\nc\n' | sort
    11
    2
    b
    c
    $
    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_str_asc' -f tst.awk
    @val_str_asc
    strnum 11
    strnum 2
    string b
    string c

    ---------------

    $ printf '2\n11\nb\nc\n' | sort -n
    b
    c
    2
    11
    $
    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_num_asc' -f tst.awk
    @val_num_asc
    string b
    string c
    strnum 2
    strnum 11

    and then it's very obvious what the "type" sort will do, i.e. print
    numbers before strings just like a string sort but this time with the
    numbers sorted numerically rather than alphabetically:

    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_type_asc' -f tst.awk @val_type_asc
    strnum 2
    strnum 11
    string b
    string c

    Hope that helps.

    Ed.
    In my tests of seemingly IDENTICALLY numerically valued data,
    the sort order of "@val_str_asc" == (as strings, then INDEX strings)
    ('INDEX strings' appears to be sorted by "@ind_str_asc")
    versus sort order of "@val_num_asc" == (as numeric, then as strings, then INDEX strings)
    thus essentially "@val_num_asc"==(as numeric, then "@val_str_asc")
    The output order of @val_type_asc is not the same, as expected.

    Since all my numeric values seem to be the same, "@val_num_asc" shifts to "@val_str_asc", which sorts string versions into SUBGROUPS of identical strings and, lastly, sorts the subgroups by the index string values of the array. ("@ind_str_asc")
    Scrambling the index string values for the same data values sorts as expected, as above.

    This behavior appears to be EXACTLY as advertised. (? true for POSIX, now or future ?)

    Conclusion: WATCH your array[indices]= (identical data values). Hope this helps someone, somewhere.

    Below is one of my tests to illustrate "seemingly IDENTICALLY numerically valued data" (except 321.0 is a "string")
    The spaces are intentional and strings of length 5 with whitespace sorts into a different subgroup than ones with 4, etc.
    BEGIN{
    split(" 321 ;321 ;321 ",strnumbr,";")
    a["A"]="321.0"
    a["G"]=strnumbr[3]
    a["C"]=321
    a["D"]=strnumbr[1]
    a["E"]="321 "
    a["B"]=strnumbr[2]
    a["F"]="321"
    a["x"]=strnumbr[3]
    a["z"]=strnumbr[1]
    a["y"]=strnumbr[2]then INDEX string values")
    sortdemo("@val_num_asc",a,"scalars as num, then all as string, then INDEX string values")
    sortdemo("@val_str_asc",a,"scalar as string, then INDEX string values") sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
    sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
    }
    output sort: @val_str_asc == @val_num_asc == D z C F B E G x y A

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to J Naman on Mon Apr 5 07:53:11 2021
    On 4/4/2021 3:00 PM, J Naman wrote:
    On Saturday, 3 April 2021 at 20:05:03 UTC-4, Ed Morton wrote:
    On 4/3/2021 4:08 PM, Igenlode Wordsmith wrote:
    On 3 Apr 2021 Ed Morton wrote:

    On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
    Under what circumstances will sorting with a "how" value of
    "@val_type_asc" produce different results from sorting with
    "@val_str_asc"?

    [snip]


    Note how the latter sorts the values numerically because they look like >>>> numbers while the former sorts alphabetically.

    Thanks. Further experiment:

    BEGIN {
    split("21 3 red black",a)
    a[100]=10
    a[20][1]="dot"

    sortdemo("@val_str_asc",a)
    sortdemo("@val_type_asc",a)
    sortdemo("@val_num_asc",a)
    }

    function sortdemo(format,arr, i)
    {
    PROCINFO["sorted_in"] = format
    print format
    for (i in arr) {
    printf("%8s ",typeof(arr[i]))
    if (isarray(arr[i])==0)
    print arr[i]
    }
    print "\n-----------"
    }

    @val_str_asc
    number 10
    strnum 21
    strnum 3
    string black
    string red
    array
    -----------
    @val_type_asc
    strnum 3
    number 10
    strnum 21
    string black
    string red
    array
    -----------
    @val_num_asc
    string black
    string red
    strnum 3
    number 10
    strnum 21
    array
    -----------



    So @val_type_asc does exactly the same sort as @val_num_asc, except that >>> it sorts strings after numbers instead of sorting them according to
    their numerical value of zero. The documentation doesn't say that
    @val_num_asc sorts strings in alphabetical order, but my previous tests
    suggested that in fact it always does.

    All sorts place sub-arrays last.
    Well, they have to go first or last, right? I mean arrays can't suddenly
    appear in the middle of the sorted strings or sorted numbers and it
    doesn't make sense for them to appear between the two.
    And, confusingly, despite its name @val_type_asc *doesn't* separate the
    type 'strnum' from the type 'number', which would seem to be the
    main useful feature of 'sorting by type', but sorts them together. What
    is it used for? The only references I could find online were to
    scanning FUNCTAB.
    I think you're overthinking this. @val_type_asc means if the values are
    scalar numbers they are sorted as numbers, otherwise if they're scalar
    they're sorted as strings. Apparently numbers get printed before strings
    (maybe a locale thing, idk) and subarrays get printed last.

    Given this script:

    $ cat tst.awk
    { a[NR] = $1 }
    END { sortdemo(fmt,a) }
    function sortdemo(format,arr, i)
    {
    PROCINFO["sorted_in"] = format
    print format
    for (i in arr) {
    printf("%8s ",typeof(arr[i]))
    if (isarray(arr[i])==0)
    print arr[i]
    }
    }
    Look at the consistency with sort for string and numeric sorting:

    $ printf '2\n11\nb\nc\n' | sort
    11
    2
    b
    c
    $
    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_str_asc' -f tst.awk
    @val_str_asc
    strnum 11
    strnum 2
    string b
    string c

    ---------------

    $ printf '2\n11\nb\nc\n' | sort -n
    b
    c
    2
    11
    $
    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_num_asc' -f tst.awk
    @val_num_asc
    string b
    string c
    strnum 2
    strnum 11

    and then it's very obvious what the "type" sort will do, i.e. print
    numbers before strings just like a string sort but this time with the
    numbers sorted numerically rather than alphabetically:

    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_type_asc' -f tst.awk
    @val_type_asc
    strnum 2
    strnum 11
    string b
    string c

    Hope that helps.

    Ed.
    In my tests of seemingly IDENTICALLY numerically valued data,
    the sort order of "@val_str_asc" == (as strings, then INDEX strings)
    ('INDEX strings' appears to be sorted by "@ind_str_asc")
    versus sort order of "@val_num_asc" == (as numeric, then as strings, then INDEX strings)
    thus essentially "@val_num_asc"==(as numeric, then "@val_str_asc") The output order of @val_type_asc is not the same, as expected.

    Since all my numeric values seem to be the same, "@val_num_asc" shifts to "@val_str_asc", which sorts string versions into SUBGROUPS of identical strings
    and, lastly, sorts the subgroups by the index string values of the array. ("@ind_str_asc")
    Scrambling the index string values for the same data values sorts as expected, as above.

    This behavior appears to be EXACTLY as advertised. (? true for POSIX, now or future ?)

    Conclusion: WATCH your array[indices]= (identical data values). Hope this helps someone, somewhere.

    Below is one of my tests to illustrate "seemingly IDENTICALLY numerically valued data" (except 321.0 is a "string")
    The spaces are intentional and strings of length 5 with whitespace sorts into a different subgroup than ones with 4, etc.
    BEGIN{
    split(" 321 ;321 ;321 ",strnumbr,";")
    a["A"]="321.0"
    a["G"]=strnumbr[3]
    a["C"]=321
    a["D"]=strnumbr[1]
    a["E"]="321 "
    a["B"]=strnumbr[2]
    a["F"]="321"
    a["x"]=strnumbr[3]
    a["z"]=strnumbr[1]
    a["y"]=strnumbr[2]then INDEX string values") sortdemo("@val_num_asc",a,"scalars as num, then all as string, then INDEX string values")
    sortdemo("@val_str_asc",a,"scalar as string, then INDEX string values") sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
    sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
    }
    output sort: @val_str_asc == @val_num_asc == D z C F B E G x y A


    It seems like what you're saying above is that awk does the equivalent
    for arrays as GNU sort does when called with `-s` for "stable sorting",
    i.e. preserves original (index) order for duplicate key values:

    $ printf ' 321 z\n321 y\n321 w\n' | sort -k1,1n
    321 z
    321 w
    321 y

    $ printf ' 321 z\n321 y\n321 w\n' | sort -s -k1,1n
    321 z
    321 y
    321 w

    I wonder if that's true, though, or if you just happened to get that
    order for duplicates just like you will often get alphabetic or
    numerically ordered output from `for (i in arr)` without specifying any
    order at all just due to how they fell in the hash.

    I don't see anything about that in the gawk manual so I suspect the
    ordering you're seeing for duplicates is just coincidence as I doubt the
    gawk implementers would sacrifice speed of execution to do a
    second-level string sort on indices that the user didn't ask for and is probably not useful.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J Naman@21:1/5 to Ed Morton on Mon Apr 5 08:17:29 2021
    On Monday, 5 April 2021 at 08:53:13 UTC-4, Ed Morton wrote:
    On 4/4/2021 3:00 PM, J Naman wrote:
    On Saturday, 3 April 2021 at 20:05:03 UTC-4, Ed Morton wrote:
    On 4/3/2021 4:08 PM, Igenlode Wordsmith wrote:
    On 3 Apr 2021 Ed Morton wrote:

    On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
    Under what circumstances will sorting with a "how" value of
    "@val_type_asc" produce different results from sorting with
    "@val_str_asc"?

    [snip]


    Note how the latter sorts the values numerically because they look like >>>> numbers while the former sorts alphabetically.

    Thanks. Further experiment:

    BEGIN {
    split("21 3 red black",a)
    a[100]=10
    a[20][1]="dot"

    sortdemo("@val_str_asc",a)
    sortdemo("@val_type_asc",a)
    sortdemo("@val_num_asc",a)
    }

    function sortdemo(format,arr, i)
    {
    PROCINFO["sorted_in"] = format
    print format
    for (i in arr) {
    printf("%8s ",typeof(arr[i]))
    if (isarray(arr[i])==0)
    print arr[i]
    }
    print "\n-----------"
    }

    @val_str_asc
    number 10
    strnum 21
    strnum 3
    string black
    string red
    array
    -----------
    @val_type_asc
    strnum 3
    number 10
    strnum 21
    string black
    string red
    array
    -----------
    @val_num_asc
    string black
    string red
    strnum 3
    number 10
    strnum 21
    array
    -----------



    So @val_type_asc does exactly the same sort as @val_num_asc, except that >>> it sorts strings after numbers instead of sorting them according to
    their numerical value of zero. The documentation doesn't say that
    @val_num_asc sorts strings in alphabetical order, but my previous tests >>> suggested that in fact it always does.

    All sorts place sub-arrays last.
    Well, they have to go first or last, right? I mean arrays can't suddenly >> appear in the middle of the sorted strings or sorted numbers and it
    doesn't make sense for them to appear between the two.
    And, confusingly, despite its name @val_type_asc *doesn't* separate the >>> type 'strnum' from the type 'number', which would seem to be the
    main useful feature of 'sorting by type', but sorts them together. What >>> is it used for? The only references I could find online were to
    scanning FUNCTAB.
    I think you're overthinking this. @val_type_asc means if the values are >> scalar numbers they are sorted as numbers, otherwise if they're scalar
    they're sorted as strings. Apparently numbers get printed before strings >> (maybe a locale thing, idk) and subarrays get printed last.

    Given this script:

    $ cat tst.awk
    { a[NR] = $1 }
    END { sortdemo(fmt,a) }
    function sortdemo(format,arr, i)
    {
    PROCINFO["sorted_in"] = format
    print format
    for (i in arr) {
    printf("%8s ",typeof(arr[i]))
    if (isarray(arr[i])==0)
    print arr[i]
    }
    }
    Look at the consistency with sort for string and numeric sorting:

    $ printf '2\n11\nb\nc\n' | sort
    11
    2
    b
    c
    $
    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_str_asc' -f tst.awk
    @val_str_asc
    strnum 11
    strnum 2
    string b
    string c

    ---------------

    $ printf '2\n11\nb\nc\n' | sort -n
    b
    c
    2
    11
    $
    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_num_asc' -f tst.awk
    @val_num_asc
    string b
    string c
    strnum 2
    strnum 11

    and then it's very obvious what the "type" sort will do, i.e. print
    numbers before strings just like a string sort but this time with the
    numbers sorted numerically rather than alphabetically:

    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_type_asc' -f tst.awk
    @val_type_asc
    strnum 2
    strnum 11
    string b
    string c

    Hope that helps.

    Ed.
    In my tests of seemingly IDENTICALLY numerically valued data,
    the sort order of "@val_str_asc" == (as strings, then INDEX strings) ('INDEX strings' appears to be sorted by "@ind_str_asc")
    versus sort order of "@val_num_asc" == (as numeric, then as strings, then INDEX strings)
    thus essentially "@val_num_asc"==(as numeric, then "@val_str_asc")
    The output order of @val_type_asc is not the same, as expected.

    Since all my numeric values seem to be the same, "@val_num_asc" shifts to "@val_str_asc", which sorts string versions into SUBGROUPS of identical strings
    and, lastly, sorts the subgroups by the index string values of the array. ("@ind_str_asc")
    Scrambling the index string values for the same data values sorts as expected, as above.

    This behavior appears to be EXACTLY as advertised. (? true for POSIX, now or future ?)

    Conclusion: WATCH your array[indices]= (identical data values). Hope this helps someone, somewhere.

    Below is one of my tests to illustrate "seemingly IDENTICALLY numerically valued data" (except 321.0 is a "string")
    The spaces are intentional and strings of length 5 with whitespace sorts into a different subgroup than ones with 4, etc.
    BEGIN{
    split(" 321 ;321 ;321 ",strnumbr,";")
    a["A"]="321.0"
    a["G"]=strnumbr[3]
    a["C"]=321
    a["D"]=strnumbr[1]
    a["E"]="321 "
    a["B"]=strnumbr[2]
    a["F"]="321"
    a["x"]=strnumbr[3]
    a["z"]=strnumbr[1]
    a["y"]=strnumbr[2]then INDEX string values") sortdemo("@val_num_asc",a,"scalars as num, then all as string, then INDEX string values")
    sortdemo("@val_str_asc",a,"scalar as string, then INDEX string values") sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
    sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
    }
    output sort: @val_str_asc == @val_num_asc == D z C F B E G x y A

    It seems like what you're saying above is that awk does the equivalent
    for arrays as GNU sort does when called with `-s` for "stable sorting",
    i.e. preserves original (index) order for duplicate key values:

    $ printf ' 321 z\n321 y\n321 w\n' | sort -k1,1n
    321 z
    321 w
    321 y

    $ printf ' 321 z\n321 y\n321 w\n' | sort -s -k1,1n
    321 z
    321 y
    321 w

    I wonder if that's true, though, or if you just happened to get that
    order for duplicates just like you will often get alphabetic or
    numerically ordered output from `for (i in arr)` without specifying any order at all just due to how they fell in the hash.

    I don't see anything about that in the gawk manual so I suspect the
    ordering you're seeing for duplicates is just coincidence as I doubt the gawk implementers would sacrifice speed of execution to do a
    second-level string sort on indices that the user didn't ask for and is probably not useful.

    Ed.
    Ed, I made a lot of tests, including on the subgroup array indices, and have 42 test output files. However, no matter what index sort order, unsorted or some sort, programmers should at least 1) be reminded that the User Guide says indices are used last (
    to break ties) and 2) keep aware of their subgroup index order (maybe want to index sort themselves) and how that may change if they append new data. Just a caution sign to save a lot of debugging if their results seemed odd or seemingly inexplicable.
    My short answer is that I ran the SAME VALUE DATA twice, only changing the indices so that I could see if the index sort was "unsorted" (hypothesis 0) or any real sort. Ver 1 indices are "A" -> "z" (testing upper and lower case) & Ver 2 is the SAME alpha
    index with a digit in front, e.g. "2z", "1y", "5B", etc. The results are summarized below:
    Srt order 1 @val_num_asc= 2z 7D 4F 8C 1y 3x 5B 6E 9G aA
    Srt order 2 @val_num_asc= D z C F B E G x y A
    The subgroup indices sorted in digit order vs alpha order. It can't be a coincidence. I didn't want to bother Arnold/Aharon Robbins. If it is important for POSIX, etc. he can decide about further documenting it for GAWK, *Awks, etc.
    I'll be happy to email actual code and let you or others see actual results. It seemed too long for a Forum post. One set of test input lines are:
    split(" 321 ;321 ;321 ",strnumbr,";") split(" 321 ;321 ;321 ",strnumbr2,";") a["aA"]="321.0" b["A"]="321.0"
    a["9G"]=strnumbr[3] b["G"]=strnumbr2[3]
    a["8C"]=321 b["C"]=321
    a["7D"]=strnumbr[1] b["D"]=strnumbr2[1]
    a["6E"]="321 " b["E"]="321 "
    a["5B"]=strnumbr[2] b["B"]=strnumbr2[2]
    a["4F"]="321" b["F"]="321"
    a["3x"]=strnumbr[3] b["x"]=strnumbr2[3]
    a["2z"]=strnumbr[1] b["z"]=strnumbr2[1]
    a["1y"]=strnumbr[2] b["y"]=strnumbr2[2]
    Let me know if anyone wants the full actual source, 54 lines w/out comments, to this group or individual email.
    'Best, John Naman, retired PhD who (over)analyzes just about everything.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to J Naman on Mon Apr 5 11:21:23 2021
    On 4/5/2021 10:17 AM, J Naman wrote:
    On Monday, 5 April 2021 at 08:53:13 UTC-4, Ed Morton wrote:
    On 4/4/2021 3:00 PM, J Naman wrote:
    On Saturday, 3 April 2021 at 20:05:03 UTC-4, Ed Morton wrote:
    On 4/3/2021 4:08 PM, Igenlode Wordsmith wrote:
    On 3 Apr 2021 Ed Morton wrote:

    On 4/2/2021 5:06 PM, Igenlode Wordsmith wrote:
    Under what circumstances will sorting with a "how" value of
    "@val_type_asc" produce different results from sorting with
    "@val_str_asc"?

    [snip]


    Note how the latter sorts the values numerically because they look like >>>>>> numbers while the former sorts alphabetically.

    Thanks. Further experiment:

    BEGIN {
    split("21 3 red black",a)
    a[100]=10
    a[20][1]="dot"

    sortdemo("@val_str_asc",a)
    sortdemo("@val_type_asc",a)
    sortdemo("@val_num_asc",a)
    }

    function sortdemo(format,arr, i)
    {
    PROCINFO["sorted_in"] = format
    print format
    for (i in arr) {
    printf("%8s ",typeof(arr[i]))
    if (isarray(arr[i])==0)
    print arr[i]
    }
    print "\n-----------"
    }

    @val_str_asc
    number 10
    strnum 21
    strnum 3
    string black
    string red
    array
    -----------
    @val_type_asc
    strnum 3
    number 10
    strnum 21
    string black
    string red
    array
    -----------
    @val_num_asc
    string black
    string red
    strnum 3
    number 10
    strnum 21
    array
    -----------



    So @val_type_asc does exactly the same sort as @val_num_asc, except that >>>>> it sorts strings after numbers instead of sorting them according to
    their numerical value of zero. The documentation doesn't say that
    @val_num_asc sorts strings in alphabetical order, but my previous tests >>>>> suggested that in fact it always does.

    All sorts place sub-arrays last.
    Well, they have to go first or last, right? I mean arrays can't suddenly >>>> appear in the middle of the sorted strings or sorted numbers and it
    doesn't make sense for them to appear between the two.
    And, confusingly, despite its name @val_type_asc *doesn't* separate the >>>>> type 'strnum' from the type 'number', which would seem to be the
    main useful feature of 'sorting by type', but sorts them together. What >>>>> is it used for? The only references I could find online were to
    scanning FUNCTAB.
    I think you're overthinking this. @val_type_asc means if the values are >>>> scalar numbers they are sorted as numbers, otherwise if they're scalar >>>> they're sorted as strings. Apparently numbers get printed before strings >>>> (maybe a locale thing, idk) and subarrays get printed last.

    Given this script:

    $ cat tst.awk
    { a[NR] = $1 }
    END { sortdemo(fmt,a) }
    function sortdemo(format,arr, i)
    {
    PROCINFO["sorted_in"] = format
    print format
    for (i in arr) {
    printf("%8s ",typeof(arr[i]))
    if (isarray(arr[i])==0)
    print arr[i]
    }
    }
    Look at the consistency with sort for string and numeric sorting:

    $ printf '2\n11\nb\nc\n' | sort
    11
    2
    b
    c
    $
    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_str_asc' -f tst.awk
    @val_str_asc
    strnum 11
    strnum 2
    string b
    string c

    ---------------

    $ printf '2\n11\nb\nc\n' | sort -n
    b
    c
    2
    11
    $
    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_num_asc' -f tst.awk
    @val_num_asc
    string b
    string c
    strnum 2
    strnum 11

    and then it's very obvious what the "type" sort will do, i.e. print
    numbers before strings just like a string sort but this time with the
    numbers sorted numerically rather than alphabetically:

    $ printf '2\n11\nb\nc\n' | awk -v fmt='@val_type_asc' -f tst.awk
    @val_type_asc
    strnum 2
    strnum 11
    string b
    string c

    Hope that helps.

    Ed.
    In my tests of seemingly IDENTICALLY numerically valued data,
    the sort order of "@val_str_asc" == (as strings, then INDEX strings)
    ('INDEX strings' appears to be sorted by "@ind_str_asc")
    versus sort order of "@val_num_asc" == (as numeric, then as strings, then INDEX strings)
    thus essentially "@val_num_asc"==(as numeric, then "@val_str_asc")
    The output order of @val_type_asc is not the same, as expected.

    Since all my numeric values seem to be the same, "@val_num_asc" shifts to >>> "@val_str_asc", which sorts string versions into SUBGROUPS of identical strings
    and, lastly, sorts the subgroups by the index string values of the array. ("@ind_str_asc")
    Scrambling the index string values for the same data values sorts as expected, as above.

    This behavior appears to be EXACTLY as advertised. (? true for POSIX, now or future ?)

    Conclusion: WATCH your array[indices]= (identical data values). Hope this helps someone, somewhere.

    Below is one of my tests to illustrate "seemingly IDENTICALLY numerically valued data" (except 321.0 is a "string")
    The spaces are intentional and strings of length 5 with whitespace sorts into a different subgroup than ones with 4, etc.
    BEGIN{
    split(" 321 ;321 ;321 ",strnumbr,";")
    a["A"]="321.0"
    a["G"]=strnumbr[3]
    a["C"]=321
    a["D"]=strnumbr[1]
    a["E"]="321 "
    a["B"]=strnumbr[2]
    a["F"]="321"
    a["x"]=strnumbr[3]
    a["z"]=strnumbr[1]
    a["y"]=strnumbr[2]then INDEX string values")
    sortdemo("@val_num_asc",a,"scalars as num, then all as string, then INDEX string values")
    sortdemo("@val_str_asc",a,"scalar as string, then INDEX string values")
    sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
    sortdemo("@val_type_asc",a,"numeric before string (strnum before|after numeric?)")
    }
    output sort: @val_str_asc == @val_num_asc == D z C F B E G x y A

    It seems like what you're saying above is that awk does the equivalent
    for arrays as GNU sort does when called with `-s` for "stable sorting",
    i.e. preserves original (index) order for duplicate key values:

    $ printf ' 321 z\n321 y\n321 w\n' | sort -k1,1n
    321 z
    321 w
    321 y

    $ printf ' 321 z\n321 y\n321 w\n' | sort -s -k1,1n
    321 z
    321 y
    321 w

    I wonder if that's true, though, or if you just happened to get that
    order for duplicates just like you will often get alphabetic or
    numerically ordered output from `for (i in arr)` without specifying any
    order at all just due to how they fell in the hash.

    I don't see anything about that in the gawk manual so I suspect the
    ordering you're seeing for duplicates is just coincidence as I doubt the
    gawk implementers would sacrifice speed of execution to do a
    second-level string sort on indices that the user didn't ask for and is
    probably not useful.

    Ed.
    Ed, I made a lot of tests, including on the subgroup array indices, and have 42 test output files. However, no matter what index sort order, unsorted or some sort, programmers should at least 1) be reminded that the User Guide says indices are used
    last (to break ties) and 2) keep aware of their subgroup index order (maybe want to index sort themselves) and how that may change if they append new data. Just a caution sign to save a lot of debugging if their results seemed odd or seemingly
    inexplicable.
    My short answer is that I ran the SAME VALUE DATA twice, only changing the indices so that I could see if the index sort was "unsorted" (hypothesis 0) or any real sort. Ver 1 indices are "A" -> "z" (testing upper and lower case) & Ver 2 is the SAME
    alpha index with a digit in front, e.g. "2z", "1y", "5B", etc. The results are summarized below:
    Srt order 1 @val_num_asc= 2z 7D 4F 8C 1y 3x 5B 6E 9G aA
    Srt order 2 @val_num_asc= D z C F B E G x y A
    The subgroup indices sorted in digit order vs alpha order. It can't be a coincidence. I didn't want to bother Arnold/Aharon Robbins. If it is important for POSIX, etc. he can decide about further documenting it for GAWK, *Awks, etc.
    I'll be happy to email actual code and let you or others see actual results. It seemed too long for a Forum post. One set of test input lines are:
    split(" 321 ;321 ;321 ",strnumbr,";") split(" 321 ;321 ;321 ",strnumbr2,";") a["aA"]="321.0" b["A"]="321.0"
    a["9G"]=strnumbr[3] b["G"]=strnumbr2[3]
    a["8C"]=321 b["C"]=321
    a["7D"]=strnumbr[1] b["D"]=strnumbr2[1]
    a["6E"]="321 " b["E"]="321 "
    a["5B"]=strnumbr[2] b["B"]=strnumbr2[2]
    a["4F"]="321" b["F"]="321"
    a["3x"]=strnumbr[3] b["x"]=strnumbr2[3]
    a["2z"]=strnumbr[1] b["z"]=strnumbr2[1]
    a["1y"]=strnumbr[2] b["y"]=strnumbr2[2]
    Let me know if anyone wants the full actual source, 54 lines w/out comments, to this group or individual email.
    'Best, John Naman, retired PhD who (over)analyzes just about everything.


    I'm sorry, I'm really not 100% sure I understand what you're trying to
    tell us above and I don't know what we're supposed to get out of the
    data you posted. It seems like you're saying that you ran your script
    twice with the same data and got the same output - OK, I would hope so.
    It seems like you're also saying that there was an alphabetic order of
    indices for the output of same-string values - OK, that is often the
    case when you do `for (i in arr)` for a few values with no requested
    ordering, e.g.

    $ seq 5 | awk '{a[NR]=$0} END{for (i in a) print i}'
    1
    2
    3
    4
    5

    but that doesn't mean they are always sorted alphabetically or
    otherwise, just that that's the order they happen to appear in the hash.
    It's not a coincidence and it's not unsorted, it's the hash order.

    Could you provide a _minimal_ script with _minimal_ input/output that
    just demonstrates whatever it is you're trying to show us? Can you
    provide a link (and maybe a quote?) to where the user guide says array
    indices are used last to break ties?

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Aharon Robbins on Mon Apr 5 13:10:10 2021
    On 4/5/2021 12:20 PM, Aharon Robbins wrote:
    In article <s4f17n$6e9$1@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    I wonder if that's true, though, or if you just happened to get that
    order for duplicates just like you will often get alphabetic or
    numerically ordered output from `for (i in arr)` without specifying any
    order at all just due to how they fell in the hash.

    I don't see anything about that in the gawk manual so I suspect the
    ordering you're seeing for duplicates is just coincidence as I doubt the
    gawk implementers would sacrifice speed of execution to do a
    second-level string sort on indices that the user didn't ask for and is
    probably not useful.

    Ed,

    Consider:

    a[1] = 42
    a[2] = 24
    a[3] = 42

    If sorting by numeric value, how do you determine whether a[1]'s value
    comes first, or a[3]'s?

    The same way you determine which value to print if you _aren't_ sorting
    by values, i.e. hash order I think is common, would seem to be the most
    obvious approach.

    The code has to make a choice. It uses the
    value of the index string to do that, essentially as a last resort.

    There's no reason to think the string order of the indices is any more appropriate than any other order as the indices typically have nothing
    at all to do with the order of the values. It's exactly the same
    argument that explains the order of a plain old `for (i in array)` being undefined.


    This is documented at https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning
    which says:

    When sorting an array by element values, if a value happens to be
    a subarray then it is considered to be greater than any string or
    numeric value, regardless of what the subarray itself contains,
    and all subarrays are treated as being equal to each other. Their
    order relative to each other is determined by their index strings.

    That paragraph seems to be describing the order that subarrays will be
    visited, not about the order of duplicate scalar values.

    Ed.


    The code is available to peruse and is also pretty clear.

    Let's put this thread to rest please.

    Arnold


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aharon Robbins@21:1/5 to mortonspam@gmail.com on Mon Apr 5 17:20:27 2021
    In article <s4f17n$6e9$1@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    I wonder if that's true, though, or if you just happened to get that
    order for duplicates just like you will often get alphabetic or
    numerically ordered output from `for (i in arr)` without specifying any
    order at all just due to how they fell in the hash.

    I don't see anything about that in the gawk manual so I suspect the
    ordering you're seeing for duplicates is just coincidence as I doubt the
    gawk implementers would sacrifice speed of execution to do a
    second-level string sort on indices that the user didn't ask for and is >probably not useful.

    Ed,

    Consider:

    a[1] = 42
    a[2] = 24
    a[3] = 42

    If sorting by numeric value, how do you determine whether a[1]'s value
    comes first, or a[3]'s? The code has to make a choice. It uses the
    value of the index string to do that, essentially as a last resort.

    This is documented at https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning
    which says:

    When sorting an array by element values, if a value happens to be
    a subarray then it is considered to be greater than any string or
    numeric value, regardless of what the subarray itself contains,
    and all subarrays are treated as being equal to each other. Their
    order relative to each other is determined by their index strings.

    The code is available to peruse and is also pretty clear.

    Let's put this thread to rest please.

    Arnold
    --
    Aharon (Arnold) Robbins arnold AT skeeve DOT com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Aharon Robbins on Mon Apr 5 14:57:05 2021
    On 4/5/2021 2:18 PM, Aharon Robbins wrote:
    In article <s4fjq4$jcm$1@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    If sorting by numeric value, how do you determine whether a[1]'s value
    comes first, or a[3]'s?

    The same way you determine which value to print if you _aren't_ sorting
    by values, i.e. hash order I think is common, would seem to be the most
    obvious approach.

    Hash order isn't available at that point in the code. The index values provide a sensible and unambiguous way to order values when values
    are identical.

    That makes sense then.


    The code has to make a choice. It uses the
    value of the index string to do that, essentially as a last resort.

    There's no reason to think the string order of the indices is any more
    appropriate than any other order as the indices typically have nothing
    at all to do with the order of the values. It's exactly the same
    argument that explains the order of a plain old `for (i in array)` being
    undefined.

    No, we're sorting. That's not the same as for (i in array).

    Yes it is the same because the sorting that the caller asked for has
    already been done at that point, what you have to do now for the
    duplicate values is exactly the same as what you have to do given `for
    (i in array)` - just access those values as efficiently as possible as
    there's no way to know what order the caller would want. If the
    implementation decides to do that using the values of the indices in
    some order, that's absolutely fine of course, and the chosen order in
    this case makes sense since it's presumably as efficient as any other
    order at that point in the code, more efficient than some, and will be
    executed relatively rarely (just in the duplicate cases) anyway.


    This is documented at
    https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning
    which says:

    When sorting an array by element values, if a value happens to be
    a subarray then it is considered to be greater than any string or
    numeric value, regardless of what the subarray itself contains,
    and all subarrays are treated as being equal to each other. Their
    order relative to each other is determined by their index strings.

    That paragraph seems to be describing the order that subarrays will be
    visited, not about the order of duplicate scalar values.

    Those same keywords may be used with asort and asorti.

    Looks like the manual has been updated recently as it now says:

    -----
    "@val_str_asc"

    Order by element values in ascending order (rather than by
    indices). Scalar values are compared as strings. If the string values
    are identical, the index string values are compared instead.
    -----

    which I haven't seen on previous reads of that section.


    I'm done.


    Thanks for the explanation.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aharon Robbins@21:1/5 to mortonspam@gmail.com on Mon Apr 5 19:18:27 2021
    In article <s4fjq4$jcm$1@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    If sorting by numeric value, how do you determine whether a[1]'s value
    comes first, or a[3]'s?

    The same way you determine which value to print if you _aren't_ sorting
    by values, i.e. hash order I think is common, would seem to be the most >obvious approach.

    Hash order isn't available at that point in the code. The index values
    provide a sensible and unambiguous way to order values when values
    are identical.

    The code has to make a choice. It uses the
    value of the index string to do that, essentially as a last resort.

    There's no reason to think the string order of the indices is any more >appropriate than any other order as the indices typically have nothing
    at all to do with the order of the values. It's exactly the same
    argument that explains the order of a plain old `for (i in array)` being >undefined.

    No, we're sorting. That's not the same as for (i in array).

    This is documented at
    https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning
    which says:

    When sorting an array by element values, if a value happens to be
    a subarray then it is considered to be greater than any string or
    numeric value, regardless of what the subarray itself contains,
    and all subarrays are treated as being equal to each other. Their
    order relative to each other is determined by their index strings.

    That paragraph seems to be describing the order that subarrays will be >visited, not about the order of duplicate scalar values.

    Those same keywords may be used with asort and asorti.

    I'm done.
    --
    Aharon (Arnold) Robbins arnold AT skeeve DOT com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Aharon Robbins on Tue Apr 6 13:15:05 2021
    On 4/6/2021 12:16 PM, Aharon Robbins wrote:
    In article <s4fq2i$eg$1@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    There's no reason to think the string order of the indices is any more >>>> appropriate than any other order as the indices typically have nothing >>>> at all to do with the order of the values. It's exactly the same
    argument that explains the order of a plain old `for (i in array)` being >>>> undefined.

    No, we're sorting. That's not the same as for (i in array).

    Yes it is the same because the sorting that the caller asked for has
    already been done at that point,

    No. The code under discussion is the comparison function used by the C
    qsort function to *do the sorting*. Given two identical values, it has
    to decide which one sorts before the other. It uses the index value to
    do this.

    If I tell you gawk works a particular way for a particular reason,
    there's a very good chance that I wrote the code and that I know what
    I'm talking about.


    I'm not questioning your knowledge of the gawk code. You're talking
    about how you chose to implement the code (visit duplicate values sorted
    by index which is a perfectly reasonable implementation, no argument
    there), while I'm talking abstractly about what the code has to do (i.e.
    visit duplicate values as quickly as possible in any order) and I also
    know what I'm talking about.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aharon Robbins@21:1/5 to mortonspam@gmail.com on Tue Apr 6 17:16:57 2021
    In article <s4fq2i$eg$1@dont-email.me>,
    Ed Morton <mortonspam@gmail.com> wrote:
    There's no reason to think the string order of the indices is any more
    appropriate than any other order as the indices typically have nothing
    at all to do with the order of the values. It's exactly the same
    argument that explains the order of a plain old `for (i in array)` being >>> undefined.

    No, we're sorting. That's not the same as for (i in array).

    Yes it is the same because the sorting that the caller asked for has
    already been done at that point,

    No. The code under discussion is the comparison function used by the C
    qsort function to *do the sorting*. Given two identical values, it has
    to decide which one sorts before the other. It uses the index value to
    do this.

    If I tell you gawk works a particular way for a particular reason,
    there's a very good chance that I wrote the code and that I know what
    I'm talking about.
    --
    Aharon (Arnold) Robbins arnold AT skeeve DOT com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Igenlode Wordsmith@21:1/5 to All on Wed Apr 7 16:28:45 2021
    On 4 Apr 2021 Ed Morton wrote:

    On 4/3/2021 4:08 PM, Igenlode Wordsmith wrote:


    So @val_type_asc does exactly the same sort as @val_num_asc, except that
    it sorts strings after numbers instead of sorting them according to
    their numerical value of zero.
    [snip]

    And, confusingly, despite its name @val_type_asc *doesn't* separate the
    type 'strnum' from the type 'number', which would seem to be the
    main useful feature of 'sorting by type', but sorts them together. What
    is it used for? The only references I could find online were to
    scanning FUNCTAB.

    I think you're overthinking this. @val_type_asc means if the values are >scalar numbers they are sorted as numbers, otherwise if they're scalar >they're sorted as strings.

    Yes, but @val_num_asc does that too.

    Apparently numbers get printed before strings (maybe a locale thing,
    idk) and subarrays get printed last.

    I probably am overthinking it -- I was just trying to understand what
    the function of this 'extra' sort that only exists for values and not
    indices (?because an array index is always a string?) could be.
    Especially the 'type' element, because the other sorts already sort
    strings, numbers and arrays separately, and this one doesn't seem to distinguish between types to any greater degree than they do.

    Maybe it's because I'm not used to an OS with a command-line 'sort'
    utility, so the third option doesn't strike me as being intrinsically
    missing.


    It took me long enough to get my head round what the difference was
    between sorting an array using asort and @ind_str_asc (force it to sort
    on indices) and sorting it using asorti (sorts on indices by default) --
    the answer being that this affects whether it is the values or indices
    that get *preserved*, rather than what the sort order is!

    --
    Igenlode Visit the Ivory Tower http://ivory.ueuo.com/Tower/

    -Yes, it hurts. The trick is not *minding* that it hurts.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)