• Fast lookup of 234 possible 1/2/3 character codes

    From Robert Prins@21:1/5 to All on Mon Feb 13 18:00:11 2023
    How likely is it that that there's a faster way than doing just a binary lookup,
    i.e. something like the current

    mov edx, in_plate
    bswap edx

    mov esi, 1
    mov edi, type plates / type plate

    @01:
    lea eax, [esi + edi]
    and eax, -2
    shl eax, 5

    mov ebx, dword ptr plates[eax - type plate]
    bswap ebx

    shr eax, 6

    cmp ebx, edx
    je @04

    jl @02

    lea edi, [eax - 1]
    jmp @03

    @02:
    lea esi, [eax + 1]

    @03:
    cmp esi, edi
    jle @01

    dw op_ud2

    @04:
    ret

    where plates is an array of plate:

    //------------------------------------------------
    // List of all countries / nationalities
    //
    // - sign : license plate sign
    //
    // - stat : status
    // ' ': official
    // '*': unofficial
    // - visit:
    // '!': visited (non-hitching, comment only)
    // 'V': visited
    // ' ': unvisited
    // - hitch:
    // 'H': hitched-in
    // '*': hitched through
    // - local:
    // '+': in-country ride with local
    // '-': out-of-country ride with nationality //------------------------------------------------
    type
    plate = record
    sign : tycona; // array [1..4] of char
    stat : char;
    visit: char;
    hitch: char;
    local: char;
    cntyl: byte;
    cntys: array [1..55] of char;
    end;

    Thanks,

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Robert Prins on Wed Feb 15 09:14:32 2023
    This one cries out for a hash table!

    Since you have a fixed set of codes, you can spend a little time finding
    a fast hash that converts this into a single byte value with zero
    collisions, at which point the hash entry is your country code?

    Try something like c[0]+c[1]*2+c[2]*4 as a first attempt, then mix in a
    few XORs or rotations instead of those SHL only ops.

    It should be doable in well under 10 cycles and maybe under 5...

    If a near-perfect hash is too slow or hard to find, then you could allow
    a fixed maximal number of collisions (like max two entries) and scan
    those, but that needs room for the original values in the hash table.

    Terje

    Robert Prins wrote:
    How likely is it that that there's a faster way than doing just a binary lookup, i.e. something like the current

    mov edx, in_plate
    bswap edx

    mov esi, 1
    mov edi, type plates / type plate

    @01:
    lea eax, [esi + edi]
    and eax, -2
    shl eax, 5

    mov ebx, dword ptr plates[eax - type plate]
    bswap ebx

    shr eax, 6

    cmp ebx, edx
    je @04

    jl @02

    lea edi, [eax - 1]
    jmp @03

    @02:
    lea esi, [eax + 1]

    @03:
    cmp esi, edi
    jle @01

    dw op_ud2

    @04:
    ret

    where plates is an array of plate:

    //------------------------------------------------
    // List of all countries / nationalities
    //
    // - sign : license plate sign
    //
    // - stat : status
    // ' ': official
    // '*': unofficial
    // - visit:
    // '!': visited (non-hitching, comment only)
    // 'V': visited
    // ' ': unvisited
    // - hitch:
    // 'H': hitched-in
    // '*': hitched through
    // - local:
    // '+': in-country ride with local
    // '-': out-of-country ride with nationality //------------------------------------------------
    type
    plate = record
    sign : tycona; // array [1..4] of char
    stat : char;
    visit: char;
    hitch: char;
    local: char;
    cntyl: byte;
    cntys: array [1..55] of char;
    end;

    Thanks,

    Robert


    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Prins@21:1/5 to Terje Mathisen on Wed Feb 15 20:02:24 2023
    On 2023-02-15 08:14, Terje Mathisen wrote:
    This one cries out for a hash table!

    Yes, but with the 1/2/3 letter codes being the (un)official car licence plate codes of all 195 independent countries, with some extra odd-man-out ones like GBJ/GBG, and the oddest, SMA, I've given this a try, but GNU's GPERF doesn't come up with anything. And yes, I've also read <http://courses.cs.vt.edu/~cs3114/Fall10/Notes/Supplemental/p17-cichelli.pdf> ;)

    Since you have a fixed set of codes, you can spend a little time finding a fast
    hash that converts this into a single byte value with zero collisions, at which
    point the hash entry is your country code?

    I've tried I don't know how many variations, but nothing has come close, and although it will be slower than hashing, it might well be that a simple lookup of the first three of four most common countries/nationalities (D/B/LT/GB) might
    be a decent compromise, I did use this technique at a Belgian employer where the
    three most common nationalities were B, NL, and F, accounting for more than 90% of the insured. The best I got out of GPERF was a table of 425 entries, but the email from 2016 didn't include the hash function. :( But being someone who rarely, if ever, deletes something, it might still be around.

    Try something like c[0]+c[1]*2+c[2]*4 as a first attempt, then mix in a few XORs
    or rotations instead of those SHL only ops.

    It should be doable in well under 10 cycles and maybe under 5...
    Which would handsomely beat a binary lookup...

    If a near-perfect hash is too slow or hard to find, then you could allow a fixed
    maximal number of collisions (like max two entries) and scan those, but that needs room for the original values in the hash table.

    I've got the code that Borland used in TP (4?/5?6/7) to do hashing, and that should be (relatively) easy to adapt.

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    Robert Prins wrote:
    How likely is it that that there's a faster way than doing just a binary
    lookup, i.e. something like the current

       mov     edx, in_plate
       bswap   edx

       mov     esi, 1
       mov     edi, type plates / type plate

    @01:
       lea     eax, [esi + edi]
       and     eax, -2
       shl     eax, 5

       mov     ebx, dword ptr plates[eax - type plate]
       bswap   ebx

       shr     eax, 6

       cmp     ebx, edx
       je      @04

       jl      @02

       lea     edi, [eax - 1]
       jmp     @03

    @02:
       lea     esi, [eax + 1]

    @03:
       cmp     esi, edi
       jle     @01

       dw      op_ud2

    @04:
       ret

    where plates is an array of plate:

    //------------------------------------------------
    // List of all countries / nationalities
    //
    // - sign      : license plate sign
    //
    // - stat : status
    //    ' ': official
    //    '*': unofficial
    // - visit:
    //    '!': visited (non-hitching, comment only)
    //    'V': visited
    //    ' ': unvisited
    // - hitch:
    //    'H': hitched-in
    //    '*': hitched through
    // - local:
    //    '+': in-country ride with local
    //    '-': out-of-country ride with nationality
    //------------------------------------------------
    type
       plate = record
                 sign : tycona; // array [1..4] of char
                 stat : char;
                 visit: char;
                 hitch: char;
                 local: char;
                 cntyl: byte;
                 cntys: array [1..55] of char;
               end;

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Robert Prins on Wed Feb 15 22:15:17 2023
    Can you please post the full list? comma, space or newline separated?

    Terje

    Robert Prins wrote:
    On 2023-02-15 08:14, Terje Mathisen wrote:
    This one cries out for a hash table!

    Yes, but with the 1/2/3 letter codes being the (un)official car licence
    plate codes of all 195 independent countries, with some extra
    odd-man-out ones like GBJ/GBG, and the oddest, SMA, I've given this a
    try, but GNU's GPERF doesn't come up with anything. And yes, I've also
    read <http://courses.cs.vt.edu/~cs3114/Fall10/Notes/Supplemental/p17-cichelli.pdf> ;)

    Since you have a fixed set of codes, you can spend a little time
    finding a fast hash that converts this into a single byte value with
    zero collisions, at which point the hash entry is your country code?

    I've tried I don't know how many variations, but nothing has come close,
    and although it will be slower than hashing, it might well be that a
    simple lookup of the first three of four most common
    countries/nationalities (D/B/LT/GB) might be a decent compromise, I did
    use this technique at a Belgian employer where the three most common nationalities were B, NL, and F, accounting for more than 90% of the
    insured. The best I got out of GPERF was a table of 425 entries, but the email from 2016 didn't include the hash function. :( But being someone
    who rarely, if ever, deletes something, it might still be around.

    Try something like c[0]+c[1]*2+c[2]*4 as a first attempt, then mix in
    a few XORs or rotations instead of those SHL only ops.

    It should be doable in well under 10 cycles and maybe under 5...
    Which would handsomely beat a binary lookup...

    If a near-perfect hash is too slow or hard to find, then you could
    allow a fixed maximal number of collisions (like max two entries) and
    scan those, but that needs room for the original values in the hash
    table.

    I've got the code that Borland used in TP (4?/5?6/7) to do hashing, and
    that should be (relatively) easy to adapt.

    Robert


    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Prins@21:1/5 to Terje Mathisen on Thu Feb 16 00:21:17 2023
    On 2023-02-15 21:15, Terje Mathisen wrote:
    Can you please post the full list? comma, space or newline separated?

    (sign: 'A '; cnty: 'Austria ' ),
    (sign: 'AFG '; cnty: 'Afghanistan ' ),
    (sign: 'AG '; cnty: 'Antigua and Barbuda ' ),
    (sign: 'AL '; cnty: 'Albania ' ),
    (sign: 'AM '; cnty: 'Armenia ' ),
    (sign: 'AND '; cnty: 'Andorra ' ),
    (sign: 'ANG '; cnty: 'Angola ' ),
    (sign: 'ARK '; cnty: 'Antarctica ' ),
    (sign: 'ARU '; cnty: 'Aruba ' ),
    (sign: 'AS '; cnty: 'Asturias ' ),
    (sign: 'AUA '; cnty: 'Aruba ' ),
    (sign: 'AUS '; cnty: 'Australia ' ),
    (sign: 'AX '; cnty: 'Aland Islands ' ),
    (sign: 'AXA '; cnty: 'Anguilla ' ),
    (sign: 'AZ '; cnty: 'Azerbaijan ' ),
    (sign: 'B '; cnty: 'Belgium ' ),
    (sign: 'BD '; cnty: 'Bangladesh ' ),
    (sign: 'BDS '; cnty: 'Barbados ' ),
    (sign: 'BF '; cnty: 'Burkina Faso ' ),
    (sign: 'BG '; cnty: 'Bulgaria ' ),
    (sign: 'BH '; cnty: 'Belize ' ),
    (sign: 'BHT '; cnty: 'Bhutan ' ),
    (sign: 'BIH '; cnty: 'Bosnia and Herzegovina ' ),
    (sign: 'BOL '; cnty: 'Bolivia ' ),
    (sign: 'BR '; cnty: 'Brazil ' ),
    (sign: 'BRN '; cnty: 'Bahrain ' ),
    (sign: 'BRU '; cnty: 'Brunei ' ),
    (sign: 'BS '; cnty: 'Bahamas ' ),
    (sign: 'BUR '; cnty: 'Myanmar ' ),
    (sign: 'BVI '; cnty: 'British Virgin Islands ' ),
    (sign: 'BW '; cnty: 'Botswana ' ),
    (sign: 'BY '; cnty: 'Belarus ' ),
    (sign: 'BZ '; cnty: 'Belize ' ),
    (sign: 'C '; cnty: 'Cuba ' ),
    (sign: 'CAM '; cnty: 'Cameroon ' ),
    (sign: 'CDN '; cnty: 'Canada ' ),
    (sign: 'CGO '; cnty: 'Democratic Republic of the Congo ' ),
    (sign: 'CH '; cnty: 'Switzerland ' ),
    (sign: 'CHN '; cnty: 'People''s Republic of China ' ),
    (sign: 'CI '; cnty: 'Ivory Coast (Cote d''Ivoire) ' ),
    (sign: 'CL '; cnty: 'Sri Lanka ' ),
    (sign: 'CO '; cnty: 'Colombia ' ),
    (sign: 'COM '; cnty: 'Comoros ' ),
    (sign: 'CR '; cnty: 'Costa Rica ' ),
    (sign: 'CV '; cnty: 'Cape Verde ' ),
    (sign: 'CY '; cnty: 'Cyprus ' ),
    (sign: 'CYM '; cnty: 'Wales ' ),
    (sign: 'CZ '; cnty: 'Czech Republic ' ),
    (sign: 'D '; cnty: 'Germany ' ),
    (sign: 'DJI '; cnty: 'Djibouti ' ),
    (sign: 'DK '; cnty: 'Denmark ' ),
    (sign: 'DOM '; cnty: 'Dominican Republic ' ),
    (sign: 'DY '; cnty: 'Benin ' ),
    (sign: 'DZ '; cnty: 'Algeria ' ),
    (sign: 'E '; cnty: 'Spain ' ),
    (sign: 'EAK '; cnty: 'Kenya ' ),
    (sign: 'EAT '; cnty: 'Tanzania ' ),
    (sign: 'EAU '; cnty: 'Uganda ' ),
    (sign: 'EAZ '; cnty: 'Zanzibar ' ),
    (sign: 'EC '; cnty: 'Ecuador ' ),
    (sign: 'ENG '; cnty: 'England ' ),
    (sign: 'ER '; cnty: 'Eritrea ' ),
    (sign: 'ES '; cnty: 'El Salvador ' ),
    (sign: 'EST '; cnty: 'Estonia ' ),
    (sign: 'ET '; cnty: 'Egypt ' ),
    (sign: 'ETH '; cnty: 'Ethiopia ' ),
    (sign: 'F '; cnty: 'France ' ),
    (sign: 'FIN '; cnty: 'Finland ' ),
    (sign: 'FJI '; cnty: 'Fiji ' ),
    (sign: 'FL '; cnty: 'Liechtenstein ' ),
    (sign: 'FO '; cnty: 'Faroe Islands ' ),
    (sign: 'FSM '; cnty: 'Federated States of Micronesia ' ),
    (sign: 'G '; cnty: 'Gabon ' ),
    (sign: 'GB '; cnty: 'United Kingdom (UK) ' ),
    (sign: 'GBA '; cnty: 'Alderney ' ),
    (sign: 'GBG '; cnty: 'Guernsey ' ),
    (sign: 'GBJ '; cnty: 'Jersey ' ),
    (sign: 'GBM '; cnty: 'Isle of Man ' ),
    (sign: 'GBZ '; cnty: 'Gibraltar ' ),
    (sign: 'GCA '; cnty: 'Guatemala ' ),
    (sign: 'GE '; cnty: 'Georgia ' ),
    (sign: 'GH '; cnty: 'Ghana ' ),
    (sign: 'GQ '; cnty: 'Equatorial Guinea ' ),
    (sign: 'GR '; cnty: 'Greece ' ),
    (sign: 'GUY '; cnty: 'Guyana ' ),
    (sign: 'GW '; cnty: 'Guinea-Bissau ' ),
    (sign: 'H '; cnty: 'Hungary ' ),
    (sign: 'HK '; cnty: 'Hong Kong ' ),
    (sign: 'HKJ '; cnty: 'Jordan ' ),
    (sign: 'HN '; cnty: 'Honduras ' ),
    (sign: 'HR '; cnty: 'Croatia ' ),
    (sign: 'I '; cnty: 'Italy ' ),
    (sign: 'IL '; cnty: 'Israel ' ),
    (sign: 'IND '; cnty: 'India ' ),
    (sign: 'IR '; cnty: 'Iran ' ),
    (sign: 'IRL '; cnty: 'Ireland ' ),
    (sign: 'IRQ '; cnty: 'Iraq ' ),
    (sign: 'IS '; cnty: 'Iceland ' ),
    (sign: 'J '; cnty: 'Japan ' ),
    (sign: 'JA '; cnty: 'Jamaica ' ),
    (sign: 'K '; cnty: 'Cambodia ' ),
    (sign: 'KAN '; cnty: 'Saint Kitts and Nevis ' ),
    (sign: 'KGZ '; cnty: 'Kyrgyzstan ' ),
    (sign: 'KIR '; cnty: 'Kiribati ' ),
    (sign: 'KN '; cnty: 'Greenland ' ),
    (sign: 'KP '; cnty: 'Democratic People''s Republic of Korea ' ),
    (sign: 'KS '; cnty: 'Kyrgyzstan ' ),
    (sign: 'KSA '; cnty: 'Saudi Arabia ' ),
    (sign: 'KWT '; cnty: 'Kuwait ' ),
    (sign: 'KZ '; cnty: 'Kazakhstan ' ),
    (sign: 'L '; cnty: 'Luxembourg ' ),
    (sign: 'LAO '; cnty: 'Laos ' ),
    (sign: 'LAR '; cnty: 'Libya ' ),
    (sign: 'LB '; cnty: 'Liberia ' ),
    (sign: 'LS '; cnty: 'Lesotho ' ),
    (sign: 'LT '; cnty: 'Lithuania ' ),
    (sign: 'LV '; cnty: 'Latvia ' ),
    (sign: 'M '; cnty: 'Malta ' ),
    (sign: 'MA '; cnty: 'Morocco ' ),
    (sign: 'MAL '; cnty: 'Malaysia ' ),
    (sign: 'MC '; cnty: 'Monaco ' ),
    (sign: 'MD '; cnty: 'Moldova ' ),
    (sign: 'MEX '; cnty: 'Mexico ' ),
    (sign: 'MGL '; cnty: 'Mongolia ' ),
    (sign: 'MH '; cnty: 'Marshall Islands ' ),
    (sign: 'MK '; cnty: 'Macedonia (NMK) ' ),
    (sign: 'MNE '; cnty: 'Montenegro ' ),
    (sign: 'MO '; cnty: 'Macau ' ),
    (sign: 'MOC '; cnty: 'Mozambique ' ),
    (sign: 'MS '; cnty: 'Mauritius ' ),
    (sign: 'MV '; cnty: 'Maldives ' ),
    (sign: 'MW '; cnty: 'Malawi ' ),
    (sign: 'N '; cnty: 'Norway ' ),
    (sign: 'NA '; cnty: 'Netherlands Antilles ' ),
    (sign: 'NAM '; cnty: 'Namibia ' ),
    (sign: 'NAU '; cnty: 'Nauru ' ),
    (sign: 'NC '; cnty: 'New Caledonia ' ),
    (sign: 'NEP '; cnty: 'Nepal ' ),
    (sign: 'NI '; cnty: 'Northern Ireland ' ),
    (sign: 'NIC '; cnty: 'Nicaragua ' ),
    (sign: 'NL '; cnty: 'Netherlands ' ),
    (sign: 'NMK '; cnty: 'North Macedonia ' ),
    (sign: 'NZ '; cnty: 'New Zealand ' ),
    (sign: 'OM '; cnty: 'Oman ' ),
    (sign: 'P '; cnty: 'Portugal ' ),
    (sign: 'PA '; cnty: 'Panama ' ),
    (sign: 'PAL '; cnty: 'Palau ' ),
    (sign: 'PE '; cnty: 'Peru ' ),
    (sign: 'PK '; cnty: 'Pakistan ' ),
    (sign: 'PL '; cnty: 'Poland ' ),
    (sign: 'PMR '; cnty: 'Transnistria ' ),
    (sign: 'PNG '; cnty: 'Papua New Guinea ' ),
    (sign: 'PR '; cnty: 'Puerto Rico ' ),
    (sign: 'PS '; cnty: 'Palestine ' ),
    (sign: 'PY '; cnty: 'Paraguay ' ),
    (sign: 'Q '; cnty: 'Qatar ' ),
    (sign: 'RA '; cnty: 'Argentina ' ),
    (sign: 'RB '; cnty: 'Botswana ' ),
    (sign: 'RC '; cnty: 'Republic of China (Taiwan) ' ),
    (sign: 'RCA '; cnty: 'Central African Republic ' ),
    (sign: 'RCB '; cnty: 'Republic of the Congo ' ),
    (sign: 'RCH '; cnty: 'Chile ' ),
    (sign: 'RG '; cnty: 'Guinea ' ),
    (sign: 'RGB '; cnty: 'Guinea-Bissau ' ),
    (sign: 'RH '; cnty: 'Haiti ' ),
    (sign: 'RI '; cnty: 'Indonesia ' ),
    (sign: 'RIM '; cnty: 'Mauritania ' ),
    (sign: 'RKS '; cnty: 'Kosovo ' ),
    (sign: 'RL '; cnty: 'Lebanon ' ),
    (sign: 'RM '; cnty: 'Madagascar ' ),
    (sign: 'RMM '; cnty: 'Mali ' ),
    (sign: 'RN '; cnty: 'Niger ' ),
    (sign: 'RO '; cnty: 'Romania ' ),
    (sign: 'ROK '; cnty: 'Republic of Korea ' ),
    (sign: 'ROU '; cnty: 'Uruguay (UY) ' ),
    (sign: 'RP '; cnty: 'Philippines ' ),
    (sign: 'RSM '; cnty: 'San Marino ' ),
    (sign: 'RU '; cnty: 'Burundi ' ),
    (sign: 'RUS '; cnty: 'Russia ' ),
    (sign: 'RWA '; cnty: 'Rwanda ' ),
    (sign: 'S '; cnty: 'Sweden ' ),
    (sign: 'SCO '; cnty: 'Scotland ' ),
    (sign: 'SCV '; cnty: 'Vatican City ' ),
    (sign: 'SD '; cnty: 'Swaziland ' ),
    (sign: 'SF '; cnty: 'Finland (FIN) ' ),
    (sign: 'SGP '; cnty: 'Singapore ' ),
    (sign: 'SK '; cnty: 'Slovakia ' ),
    (sign: 'SLE '; cnty: 'Sierra Leone ' ),
    (sign: 'SLO '; cnty: 'Slovenia ' ),
    (sign: 'SME '; cnty: 'Suriname ' ),
    (sign: 'SMO '; cnty: 'Sovereign Military Order of Malta ' ),
    (sign: 'SN '; cnty: 'Senegal ' ),
    (sign: 'SO '; cnty: 'Somalia ' ),
    (sign: 'SOL '; cnty: 'Solomon Islands ' ),
    (sign: 'SRB '; cnty: 'Serbia ' ),
    (sign: 'STP '; cnty: 'S o Tom‚ and Pr¡ncipe ' ),
    (sign: 'SUD '; cnty: 'Sudan ' ),
    (sign: 'SY '; cnty: 'Seychelles ' ),
    (sign: 'SYR '; cnty: 'Syria ' ),
    (sign: 'T '; cnty: 'Thailand ' ),
    (sign: 'TCH '; cnty: 'Chad ' ),
    (sign: 'TD '; cnty: 'Chad ' ),
    (sign: 'TG '; cnty: 'Togo ' ),
    (sign: 'TJ '; cnty: 'Tajikistan ' ),
    (sign: 'TL '; cnty: 'Timor-Leste ' ),
    (sign: 'TM '; cnty: 'Turkmenistan ' ),
    (sign: 'TN '; cnty: 'Tunisia ' ),
    (sign: 'TO '; cnty: 'Tonga ' ),
    (sign: 'TR '; cnty: 'Turkey ' ),
    (sign: 'TT '; cnty: 'Trinidad and Tobago ' ),
    (sign: 'TUV '; cnty: 'Tuvalu ' ),
    (sign: 'UA '; cnty: 'Ukraine ' ),
    (sign: 'UAE '; cnty: 'United Arab Emirates ' ),
    (sign: 'USA '; cnty: 'United States ' ),
    (sign: 'UY '; cnty: 'Uruguay ' ),
    (sign: 'UZ '; cnty: 'Uzbekistan ' ),
    (sign: 'V '; cnty: 'Vatican City ' ),
    (sign: 'VN '; cnty: 'Vietnam ' ),
    (sign: 'VU '; cnty: 'Vanuatu ' ),
    (sign: 'WAG '; cnty: 'Gambia ' ),
    (sign: 'WAL '; cnty: 'Sierra Leone ' ),
    (sign: 'WAN '; cnty: 'Nigeria ' ),
    (sign: 'WD '; cnty: 'Dominica ' ),
    (sign: 'WG '; cnty: 'Grenada ' ),
    (sign: 'WL '; cnty: 'Saint Lucia ' ),
    (sign: 'WS '; cnty: 'Samoa ' ),
    (sign: 'WSA '; cnty: 'Western Sahara ' ),
    (sign: 'WV '; cnty: 'Saint Vincent and the Grenadines ' ),
    (sign: 'YAR '; cnty: 'Yemen ' ),
    (sign: 'YU '; cnty: 'Yugoslavia ' ),
    (sign: 'YV '; cnty: 'Venezuela ' ),
    (sign: 'Z '; cnty: 'Zambia ' ),
    (sign: 'ZA '; cnty: 'South Africa ' ),
    (sign: 'ZW '; cnty: 'Zimbabwe ' ),

    And as you can see, there are multiple entries for four countries,

    GB/UK,
    MK/NMK,
    ROU/UY, and
    SF/FIN

    which have decided to change their country codes after I had a first ride with a
    national of them, the few other users of my program might be using the new codes, but as long as only one of the two is used,it should not matter, but returning the same hash for both would be magic. The table should possibly also have included DDR, SU to name just two countries that no longer exist. Are there
    more, don't really know...

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html





    Terje

    Robert Prins wrote:
    On 2023-02-15 08:14, Terje Mathisen wrote:
    This one cries out for a hash table!

    Yes, but with the 1/2/3 letter codes being the (un)official car licence plate
    codes of all 195 independent countries, with some extra odd-man-out ones like
    GBJ/GBG, and the oddest, SMA, I've given this a try, but GNU's GPERF doesn't >> come up with anything. And yes, I've also read
    <http://courses.cs.vt.edu/~cs3114/Fall10/Notes/Supplemental/p17-cichelli.pdf> ;)

    Since you have a fixed set of codes, you can spend a little time finding a >>> fast hash that converts this into a single byte value with zero collisions, >>> at which point the hash entry is your country code?

    I've tried I don't know how many variations, but nothing has come close, and >> although it will be slower than hashing, it might well be that a simple lookup
    of the first three of four most common countries/nationalities (D/B/LT/GB) >> might be a decent compromise, I did use this technique at a Belgian employer >> where the three most common nationalities were B, NL, and F, accounting for >> more than 90% of the insured. The best I got out of GPERF was a table of 425 >> entries, but the email from 2016 didn't include the hash function. :( But
    being someone who rarely, if ever, deletes something, it might still be around.

    Try something like c[0]+c[1]*2+c[2]*4 as a first attempt, then mix in a few >>> XORs or rotations instead of those SHL only ops.

    It should be doable in well under 10 cycles and maybe under 5...
    Which would handsomely beat a binary lookup...

    If a near-perfect hash is too slow or hard to find, then you could allow a >>> fixed maximal number of collisions (like max two entries) and scan those, but
    that needs room for the original values in the hash table.

    I've got the code that Borland used in TP (4?/5?6/7) to do hashing, and that >> should be (relatively) easy to adapt.

    Robert



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Robert Prins on Thu Feb 23 21:59:54 2023
    Robert, are you intentionally trying to annoy me? :-)

    My own N, Norway is missing!

    I'll take a look. In the meantime, have you considered an extremely
    simple single-entry cache: Last country used?

    I am very confident that it is likely that any given user will tend to
    pick up subsequent rides from another car in the same area, right?

    If you want to be really fancy then you back that single entry up with 3
    or 7 more most recently used, pushing the oldest out when you find a new country. You resort on each new country

    Robert Prins wrote:
    On 2023-02-15 21:15, Terje Mathisen wrote:
    Can you please post the full list? comma, space or newline separated?

    (sign: 'A '; cnty: 'Austria ' ),
    (sign: 'AFG '; cnty: 'Afghanistan ' ),
    (sign: 'AG '; cnty: 'Antigua and Barbuda ' ),
    (sign: 'AL '; cnty: 'Albania ' ),
    (sign: 'AM '; cnty: 'Armenia ' ),
    (sign: 'AND '; cnty: 'Andorra ' ),
    (sign: 'ANG '; cnty: 'Angola ' ),
    (sign: 'ARK '; cnty: 'Antarctica ' ),
    (sign: 'ARU '; cnty: 'Aruba ' ),
    (sign: 'AS '; cnty: 'Asturias ' ),
    (sign: 'AUA '; cnty: 'Aruba ' ),
    (sign: 'AUS '; cnty: 'Australia ' ),
    (sign: 'AX '; cnty: 'Aland Islands ' ),
    (sign: 'AXA '; cnty: 'Anguilla ' ),
    (sign: 'AZ '; cnty: 'Azerbaijan ' ),
    (sign: 'B '; cnty: 'Belgium ' ),
    (sign: 'BD '; cnty: 'Bangladesh ' ),
    (sign: 'BDS '; cnty: 'Barbados ' ),
    (sign: 'BF '; cnty: 'Burkina Faso ' ),
    (sign: 'BG '; cnty: 'Bulgaria ' ),
    (sign: 'BH '; cnty: 'Belize ' ),
    (sign: 'BHT '; cnty: 'Bhutan ' ),
    (sign: 'BIH '; cnty: 'Bosnia and Herzegovina ' ),
    (sign: 'BOL '; cnty: 'Bolivia ' ),
    (sign: 'BR '; cnty: 'Brazil ' ),
    (sign: 'BRN '; cnty: 'Bahrain ' ),
    (sign: 'BRU '; cnty: 'Brunei ' ),
    (sign: 'BS '; cnty: 'Bahamas ' ),
    (sign: 'BUR '; cnty: 'Myanmar ' ),
    (sign: 'BVI '; cnty: 'British Virgin Islands ' ),
    (sign: 'BW '; cnty: 'Botswana ' ),
    (sign: 'BY '; cnty: 'Belarus ' ),
    (sign: 'BZ '; cnty: 'Belize ' ),
    (sign: 'C '; cnty: 'Cuba ' ),
    (sign: 'CAM '; cnty: 'Cameroon ' ),
    (sign: 'CDN '; cnty: 'Canada ' ),
    (sign: 'CGO '; cnty: 'Democratic Republic of the Congo ' ),
    (sign: 'CH '; cnty: 'Switzerland ' ),
    (sign: 'CHN '; cnty: 'People''s Republic of China ' ),
    (sign: 'CI '; cnty: 'Ivory Coast (Cote d''Ivoire) ' ),
    (sign: 'CL '; cnty: 'Sri Lanka ' ),
    (sign: 'CO '; cnty: 'Colombia ' ),
    (sign: 'COM '; cnty: 'Comoros ' ),
    (sign: 'CR '; cnty: 'Costa Rica ' ),
    (sign: 'CV '; cnty: 'Cape Verde ' ),
    (sign: 'CY '; cnty: 'Cyprus ' ),
    (sign: 'CYM '; cnty: 'Wales ' ),
    (sign: 'CZ '; cnty: 'Czech Republic ' ),
    (sign: 'D '; cnty: 'Germany ' ),
    (sign: 'DJI '; cnty: 'Djibouti ' ),
    (sign: 'DK '; cnty: 'Denmark ' ),
    (sign: 'DOM '; cnty: 'Dominican Republic ' ),
    (sign: 'DY '; cnty: 'Benin ' ),
    (sign: 'DZ '; cnty: 'Algeria ' ),
    (sign: 'E '; cnty: 'Spain ' ),
    (sign: 'EAK '; cnty: 'Kenya ' ),
    (sign: 'EAT '; cnty: 'Tanzania ' ),
    (sign: 'EAU '; cnty: 'Uganda ' ),
    (sign: 'EAZ '; cnty: 'Zanzibar ' ),
    (sign: 'EC '; cnty: 'Ecuador ' ),
    (sign: 'ENG '; cnty: 'England ' ),
    (sign: 'ER '; cnty: 'Eritrea ' ),
    (sign: 'ES '; cnty: 'El Salvador ' ),
    (sign: 'EST '; cnty: 'Estonia ' ),
    (sign: 'ET '; cnty: 'Egypt ' ),
    (sign: 'ETH '; cnty: 'Ethiopia ' ),
    (sign: 'F '; cnty: 'France ' ),
    (sign: 'FIN '; cnty: 'Finland ' ),
    (sign: 'FJI '; cnty: 'Fiji ' ),
    (sign: 'FL '; cnty: 'Liechtenstein ' ),
    (sign: 'FO '; cnty: 'Faroe Islands ' ),
    (sign: 'FSM '; cnty: 'Federated States of Micronesia ' ),
    (sign: 'G '; cnty: 'Gabon ' ),
    (sign: 'GB '; cnty: 'United Kingdom (UK) ' ),
    (sign: 'GBA '; cnty: 'Alderney ' ),
    (sign: 'GBG '; cnty: 'Guernsey ' ),
    (sign: 'GBJ '; cnty: 'Jersey ' ),
    (sign: 'GBM '; cnty: 'Isle of Man ' ),
    (sign: 'GBZ '; cnty: 'Gibraltar ' ),
    (sign: 'GCA '; cnty: 'Guatemala ' ),
    (sign: 'GE '; cnty: 'Georgia ' ),
    (sign: 'GH '; cnty: 'Ghana ' ),
    (sign: 'GQ '; cnty: 'Equatorial Guinea ' ),
    (sign: 'GR '; cnty: 'Greece ' ),
    (sign: 'GUY '; cnty: 'Guyana ' ),
    (sign: 'GW '; cnty: 'Guinea-Bissau ' ),
    (sign: 'H '; cnty: 'Hungary ' ),
    (sign: 'HK '; cnty: 'Hong Kong ' ),
    (sign: 'HKJ '; cnty: 'Jordan ' ),
    (sign: 'HN '; cnty: 'Honduras ' ),
    (sign: 'HR '; cnty: 'Croatia ' ),
    (sign: 'I '; cnty: 'Italy ' ),
    (sign: 'IL '; cnty: 'Israel ' ),
    (sign: 'IND '; cnty: 'India ' ),
    (sign: 'IR '; cnty: 'Iran ' ),
    (sign: 'IRL '; cnty: 'Ireland ' ),
    (sign: 'IRQ '; cnty: 'Iraq ' ),
    (sign: 'IS '; cnty: 'Iceland ' ),
    (sign: 'J '; cnty: 'Japan ' ),
    (sign: 'JA '; cnty: 'Jamaica ' ),
    (sign: 'K '; cnty: 'Cambodia ' ),
    (sign: 'KAN '; cnty: 'Saint Kitts and Nevis ' ),
    (sign: 'KGZ '; cnty: 'Kyrgyzstan ' ),
    (sign: 'KIR '; cnty: 'Kiribati ' ),
    (sign: 'KN '; cnty: 'Greenland ' ),
    (sign: 'KP '; cnty: 'Democratic People''s Republic of Korea ' ),
    (sign: 'KS '; cnty: 'Kyrgyzstan ' ),
    (sign: 'KSA '; cnty: 'Saudi Arabia ' ),
    (sign: 'KWT '; cnty: 'Kuwait ' ),
    (sign: 'KZ '; cnty: 'Kazakhstan ' ),
    (sign: 'L '; cnty: 'Luxembourg ' ),
    (sign: 'LAO '; cnty: 'Laos ' ),
    (sign: 'LAR '; cnty: 'Libya ' ),
    (sign: 'LB '; cnty: 'Liberia ' ),
    (sign: 'LS '; cnty: 'Lesotho ' ),
    (sign: 'LT '; cnty: 'Lithuania ' ),
    (sign: 'LV '; cnty: 'Latvia ' ),
    (sign: 'M '; cnty: 'Malta ' ),
    (sign: 'MA '; cnty: 'Morocco ' ),
    (sign: 'MAL '; cnty: 'Malaysia ' ),
    (sign: 'MC '; cnty: 'Monaco ' ),
    (sign: 'MD '; cnty: 'Moldova ' ),
    (sign: 'MEX '; cnty: 'Mexico ' ),
    (sign: 'MGL '; cnty: 'Mongolia ' ),
    (sign: 'MH '; cnty: 'Marshall Islands ' ),
    (sign: 'MK '; cnty: 'Macedonia (NMK) ' ),
    (sign: 'MNE '; cnty: 'Montenegro ' ),
    (sign: 'MO '; cnty: 'Macau ' ),
    (sign: 'MOC '; cnty: 'Mozambique ' ),
    (sign: 'MS '; cnty: 'Mauritius ' ),
    (sign: 'MV '; cnty: 'Maldives ' ),
    (sign: 'MW '; cnty: 'Malawi ' ),
    (sign: 'N '; cnty: 'Norway ' ),
    (sign: 'NA '; cnty: 'Netherlands Antilles ' ),
    (sign: 'NAM '; cnty: 'Namibia ' ),
    (sign: 'NAU '; cnty: 'Nauru ' ),
    (sign: 'NC '; cnty: 'New Caledonia ' ),
    (sign: 'NEP '; cnty: 'Nepal ' ),
    (sign: 'NI '; cnty: 'Northern Ireland ' ),
    (sign: 'NIC '; cnty: 'Nicaragua ' ),
    (sign: 'NL '; cnty: 'Netherlands ' ),
    (sign: 'NMK '; cnty: 'North Macedonia ' ),
    (sign: 'NZ '; cnty: 'New Zealand ' ),
    (sign: 'OM '; cnty: 'Oman ' ),
    (sign: 'P '; cnty: 'Portugal ' ),
    (sign: 'PA '; cnty: 'Panama ' ),
    (sign: 'PAL '; cnty: 'Palau ' ),
    (sign: 'PE '; cnty: 'Peru ' ),
    (sign: 'PK '; cnty: 'Pakistan ' ),
    (sign: 'PL '; cnty: 'Poland ' ),
    (sign: 'PMR '; cnty: 'Transnistria ' ),
    (sign: 'PNG '; cnty: 'Papua New Guinea ' ),
    (sign: 'PR '; cnty: 'Puerto Rico ' ),
    (sign: 'PS '; cnty: 'Palestine ' ),
    (sign: 'PY '; cnty: 'Paraguay ' ),
    (sign: 'Q '; cnty: 'Qatar ' ),
    (sign: 'RA '; cnty: 'Argentina ' ),
    (sign: 'RB '; cnty: 'Botswana ' ),
    (sign: 'RC '; cnty: 'Republic of China (Taiwan) ' ),
    (sign: 'RCA '; cnty: 'Central African Republic ' ),
    (sign: 'RCB '; cnty: 'Republic of the Congo ' ),
    (sign: 'RCH '; cnty: 'Chile ' ),
    (sign: 'RG '; cnty: 'Guinea ' ),
    (sign: 'RGB '; cnty: 'Guinea-Bissau ' ),
    (sign: 'RH '; cnty: 'Haiti ' ),
    (sign: 'RI '; cnty: 'Indonesia ' ),
    (sign: 'RIM '; cnty: 'Mauritania ' ),
    (sign: 'RKS '; cnty: 'Kosovo ' ),
    (sign: 'RL '; cnty: 'Lebanon ' ),
    (sign: 'RM '; cnty: 'Madagascar ' ),
    (sign: 'RMM '; cnty: 'Mali ' ),
    (sign: 'RN '; cnty: 'Niger ' ),
    (sign: 'RO '; cnty: 'Romania ' ),
    (sign: 'ROK '; cnty: 'Republic of Korea ' ),
    (sign: 'ROU '; cnty: 'Uruguay (UY) ' ),
    (sign: 'RP '; cnty: 'Philippines ' ),
    (sign: 'RSM '; cnty: 'San Marino ' ),
    (sign: 'RU '; cnty: 'Burundi ' ),
    (sign: 'RUS '; cnty: 'Russia ' ),
    (sign: 'RWA '; cnty: 'Rwanda ' ),
    (sign: 'S '; cnty: 'Sweden ' ),
    (sign: 'SCO '; cnty: 'Scotland ' ),
    (sign: 'SCV '; cnty: 'Vatican City ' ),
    (sign: 'SD '; cnty: 'Swaziland ' ),
    (sign: 'SF '; cnty: 'Finland (FIN) ' ),
    (sign: 'SGP '; cnty: 'Singapore ' ),
    (sign: 'SK '; cnty: 'Slovakia ' ),
    (sign: 'SLE '; cnty: 'Sierra Leone ' ),
    (sign: 'SLO '; cnty: 'Slovenia ' ),
    (sign: 'SME '; cnty: 'Suriname ' ),
    (sign: 'SMO '; cnty: 'Sovereign Military Order of Malta ' ),
    (sign: 'SN '; cnty: 'Senegal ' ),
    (sign: 'SO '; cnty: 'Somalia ' ),
    (sign: 'SOL '; cnty: 'Solomon Islands ' ),
    (sign: 'SRB '; cnty: 'Serbia ' ),
    (sign: 'STP '; cnty: 'S o Tom‚ and Pr¡ncipe ' ), (sign: 'SUD '; cnty: 'Sudan ' ),
    (sign: 'SY '; cnty: 'Seychelles ' ),
    (sign: 'SYR '; cnty: 'Syria ' ),
    (sign: 'T '; cnty: 'Thailand ' ),
    (sign: 'TCH '; cnty: 'Chad ' ),
    (sign: 'TD '; cnty: 'Chad ' ),
    (sign: 'TG '; cnty: 'Togo ' ),
    (sign: 'TJ '; cnty: 'Tajikistan ' ),
    (sign: 'TL '; cnty: 'Timor-Leste ' ),
    (sign: 'TM '; cnty: 'Turkmenistan ' ),
    (sign: 'TN '; cnty: 'Tunisia ' ),
    (sign: 'TO '; cnty: 'Tonga ' ),
    (sign: 'TR '; cnty: 'Turkey ' ),
    (sign: 'TT '; cnty: 'Trinidad and Tobago ' ),
    (sign: 'TUV '; cnty: 'Tuvalu ' ),
    (sign: 'UA '; cnty: 'Ukraine ' ),
    (sign: 'UAE '; cnty: 'United Arab Emirates ' ),
    (sign: 'USA '; cnty: 'United States ' ),
    (sign: 'UY '; cnty: 'Uruguay ' ),
    (sign: 'UZ '; cnty: 'Uzbekistan ' ),
    (sign: 'V '; cnty: 'Vatican City ' ),
    (sign: 'VN '; cnty: 'Vietnam ' ),
    (sign: 'VU '; cnty: 'Vanuatu ' ),
    (sign: 'WAG '; cnty: 'Gambia ' ),
    (sign: 'WAL '; cnty: 'Sierra Leone ' ),
    (sign: 'WAN '; cnty: 'Nigeria ' ),
    (sign: 'WD '; cnty: 'Dominica ' ),
    (sign: 'WG '; cnty: 'Grenada ' ),
    (sign: 'WL '; cnty: 'Saint Lucia ' ),
    (sign: 'WS '; cnty: 'Samoa ' ),
    (sign: 'WSA '; cnty: 'Western Sahara ' ),
    (sign: 'WV '; cnty: 'Saint Vincent and the Grenadines ' ),
    (sign: 'YAR '; cnty: 'Yemen ' ),
    (sign: 'YU '; cnty: 'Yugoslavia ' ),
    (sign: 'YV '; cnty: 'Venezuela ' ),
    (sign: 'Z '; cnty: 'Zambia ' ),
    (sign: 'ZA '; cnty: 'South Africa ' ),
    (sign: 'ZW '; cnty: 'Zimbabwe ' ),

    And as you can see, there are multiple entries for four countries,

    GB/UK,
    MK/NMK,
    ROU/UY, and
    SF/FIN

    which have decided to change their country codes after I had a first
    ride with a national of them, the few other users of my program might be using the new codes, but as long as only one of the two is used,it
    should not matter, but returning the same hash for both would be magic.
    The table should possibly also have included DDR, SU to name just two countries that no longer exist. Are there more, don't really know...

    Robert


    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Prins@21:1/5 to Terje Mathisen on Fri Feb 24 02:04:14 2023
    On 2023-02-23 20:59, Terje Mathisen wrote:
    Robert, are you intentionally trying to annoy me? :-)

    My own N, Norway is missing!

    Please look again! My program would crash, in 1982 I hitched all the way up to Nordkapp, that wonderful place where people only go to to wrongly tell their friends that they've been at the northernmost point of Europe. ;)

    I'll take a look. In the meantime, have you considered an extremely simple single-entry cache: Last country used?

    That might very well be the best solution

    I am very confident that it is likely that any given user will tend to pick up
    subsequent rides from another car in the same area, right?

    That's certainly true

    If you want to be really fancy then you back that single entry up with 3 or 7 more most recently used, pushing the oldest out when you find a new country. You
    resort on each new country

    That's what I used in 1996 while working for KLM while building a prototype payload prediction system (in PL/I). There I didn't sort, but just added a counter and threw out the item with the lowest value, and moved the new value into its location, and, obviously, in location 0 of the array.

    In this case the extra overhead might be more than doing the binary lookup? Probably the only way to find out would be to do something in pure DOS using the
    full set of data that's now going into the binary lookup, an extra "writeln()" should produce it, or I could use the "COUNT" option of the old z/OS PL/I compiler to give me the, what else, counts of executed PL/I statements, which would give at least a rough idea as to what's the way to go.

    I'll give it a try, once I've managed to figure out what I'm doing wrong with this mainframe PL/I version. Give me some time, family things are right now sadly taking up rather a lot of time, and don't spoil me with a, if there's one,
    ready made solution!

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    Robert Prins wrote:
    On 2023-02-15 21:15, Terje Mathisen wrote:
    Can you please post the full list? comma, space or newline separated?

    (sign: 'A   '; cnty: 'Austria                                 '   ),
    (sign: 'AFG '; cnty: 'Afghanistan                             '   ),
    (sign: 'AG  '; cnty: 'Antigua and Barbuda                     '   ),
    (sign: 'AL  '; cnty: 'Albania                                 '   ),
    (sign: 'AM  '; cnty: 'Armenia                                 '   ),
    (sign: 'AND '; cnty: 'Andorra                                 '   ),
    (sign: 'ANG '; cnty: 'Angola                                  '   ),
    (sign: 'ARK '; cnty: 'Antarctica                              '   ),
    (sign: 'ARU '; cnty: 'Aruba                                   '   ),
    (sign: 'AS  '; cnty: 'Asturias                                '   ),
    (sign: 'AUA '; cnty: 'Aruba                                   '   ),
    (sign: 'AUS '; cnty: 'Australia                               '   ),
    (sign: 'AX  '; cnty: 'Aland Islands                           '   ),
    (sign: 'AXA '; cnty: 'Anguilla                                '   ),
    (sign: 'AZ  '; cnty: 'Azerbaijan                              '   ),
    (sign: 'B   '; cnty: 'Belgium                                 '   ),
    (sign: 'BD  '; cnty: 'Bangladesh                              '   ),
    (sign: 'BDS '; cnty: 'Barbados                                '   ),
    (sign: 'BF  '; cnty: 'Burkina Faso                            '   ),
    (sign: 'BG  '; cnty: 'Bulgaria                                '   ),
    (sign: 'BH  '; cnty: 'Belize                                  '   ),
    (sign: 'BHT '; cnty: 'Bhutan                                  '   ),
    (sign: 'BIH '; cnty: 'Bosnia and Herzegovina                  '   ),
    (sign: 'BOL '; cnty: 'Bolivia                                 '   ),
    (sign: 'BR  '; cnty: 'Brazil                                  '   ),
    (sign: 'BRN '; cnty: 'Bahrain                                 '   ),
    (sign: 'BRU '; cnty: 'Brunei                                  '   ),
    (sign: 'BS  '; cnty: 'Bahamas                                 '   ),
    (sign: 'BUR '; cnty: 'Myanmar                                 '   ),
    (sign: 'BVI '; cnty: 'British Virgin Islands                  '   ),
    (sign: 'BW  '; cnty: 'Botswana                                '   ),
    (sign: 'BY  '; cnty: 'Belarus                                 '   ),
    (sign: 'BZ  '; cnty: 'Belize                                  '   ),
    (sign: 'C   '; cnty: 'Cuba                                    '   ),
    (sign: 'CAM '; cnty: 'Cameroon                                '   ),
    (sign: 'CDN '; cnty: 'Canada                                  '   ),
    (sign: 'CGO '; cnty: 'Democratic Republic of the Congo        '   ),
    (sign: 'CH  '; cnty: 'Switzerland                             '   ),
    (sign: 'CHN '; cnty: 'People''s Republic of China              '   ),
    (sign: 'CI  '; cnty: 'Ivory Coast (Cote d''Ivoire)             '   ),
    (sign: 'CL  '; cnty: 'Sri Lanka                               '   ),
    (sign: 'CO  '; cnty: 'Colombia                                '   ),
    (sign: 'COM '; cnty: 'Comoros                                 '   ),
    (sign: 'CR  '; cnty: 'Costa Rica                              '   ),
    (sign: 'CV  '; cnty: 'Cape Verde                              '   ),
    (sign: 'CY  '; cnty: 'Cyprus                                  '   ),
    (sign: 'CYM '; cnty: 'Wales                                   '   ),
    (sign: 'CZ  '; cnty: 'Czech Republic                          '   ),
    (sign: 'D   '; cnty: 'Germany                                 '   ),
    (sign: 'DJI '; cnty: 'Djibouti                                '   ),
    (sign: 'DK  '; cnty: 'Denmark                                 '   ),
    (sign: 'DOM '; cnty: 'Dominican Republic                      '   ),
    (sign: 'DY  '; cnty: 'Benin                                   '   ),
    (sign: 'DZ  '; cnty: 'Algeria                                 '   ),
    (sign: 'E   '; cnty: 'Spain                                   '   ),
    (sign: 'EAK '; cnty: 'Kenya                                   '   ),
    (sign: 'EAT '; cnty: 'Tanzania                                '   ),
    (sign: 'EAU '; cnty: 'Uganda                                  '   ),
    (sign: 'EAZ '; cnty: 'Zanzibar                                '   ),
    (sign: 'EC  '; cnty: 'Ecuador                                 '   ),
    (sign: 'ENG '; cnty: 'England                                 '   ),
    (sign: 'ER  '; cnty: 'Eritrea                                 '   ),
    (sign: 'ES  '; cnty: 'El Salvador                             '   ),
    (sign: 'EST '; cnty: 'Estonia                                 '   ),
    (sign: 'ET  '; cnty: 'Egypt                                   '   ),
    (sign: 'ETH '; cnty: 'Ethiopia                                '   ),
    (sign: 'F   '; cnty: 'France                                  '   ),
    (sign: 'FIN '; cnty: 'Finland                                 '   ),
    (sign: 'FJI '; cnty: 'Fiji                                    '   ),
    (sign: 'FL  '; cnty: 'Liechtenstein                           '   ),
    (sign: 'FO  '; cnty: 'Faroe Islands                           '   ),
    (sign: 'FSM '; cnty: 'Federated States of Micronesia          '   ),
    (sign: 'G   '; cnty: 'Gabon                                   '   ),
    (sign: 'GB  '; cnty: 'United Kingdom (UK)                     '   ),
    (sign: 'GBA '; cnty: 'Alderney                                '   ),
    (sign: 'GBG '; cnty: 'Guernsey                                '   ),
    (sign: 'GBJ '; cnty: 'Jersey                                  '   ),
    (sign: 'GBM '; cnty: 'Isle of Man                             '   ),
    (sign: 'GBZ '; cnty: 'Gibraltar                               '   ),
    (sign: 'GCA '; cnty: 'Guatemala                               '   ),
    (sign: 'GE  '; cnty: 'Georgia                                 '   ),
    (sign: 'GH  '; cnty: 'Ghana                                   '   ),
    (sign: 'GQ  '; cnty: 'Equatorial Guinea                       '   ),
    (sign: 'GR  '; cnty: 'Greece                                  '   ),
    (sign: 'GUY '; cnty: 'Guyana                                  '   ),
    (sign: 'GW  '; cnty: 'Guinea-Bissau                           '   ),
    (sign: 'H   '; cnty: 'Hungary                                 '   ),
    (sign: 'HK  '; cnty: 'Hong Kong                               '   ),
    (sign: 'HKJ '; cnty: 'Jordan                                  '   ),
    (sign: 'HN  '; cnty: 'Honduras                                '   ),
    (sign: 'HR  '; cnty: 'Croatia                                 '   ),
    (sign: 'I   '; cnty: 'Italy                                   '   ),
    (sign: 'IL  '; cnty: 'Israel                                  '   ),
    (sign: 'IND '; cnty: 'India                                   '   ),
    (sign: 'IR  '; cnty: 'Iran                                    '   ),
    (sign: 'IRL '; cnty: 'Ireland                                 '   ),
    (sign: 'IRQ '; cnty: 'Iraq                                    '   ),
    (sign: 'IS  '; cnty: 'Iceland                                 '   ),
    (sign: 'J   '; cnty: 'Japan                                   '   ),
    (sign: 'JA  '; cnty: 'Jamaica                                 '   ),
    (sign: 'K   '; cnty: 'Cambodia                                '   ),
    (sign: 'KAN '; cnty: 'Saint Kitts and Nevis                   '   ),
    (sign: 'KGZ '; cnty: 'Kyrgyzstan                              '   ),
    (sign: 'KIR '; cnty: 'Kiribati                                '   ),
    (sign: 'KN  '; cnty: 'Greenland                               '   ),
    (sign: 'KP  '; cnty: 'Democratic People''s Republic of Korea   '   ), >> (sign: 'KS  '; cnty: 'Kyrgyzstan                              '   ),
    (sign: 'KSA '; cnty: 'Saudi Arabia                            '   ),
    (sign: 'KWT '; cnty: 'Kuwait                                  '   ),
    (sign: 'KZ  '; cnty: 'Kazakhstan                              '   ),
    (sign: 'L   '; cnty: 'Luxembourg                              '   ),
    (sign: 'LAO '; cnty: 'Laos                                    '   ),
    (sign: 'LAR '; cnty: 'Libya                                   '   ),
    (sign: 'LB  '; cnty: 'Liberia                                 '   ),
    (sign: 'LS  '; cnty: 'Lesotho                                 '   ),
    (sign: 'LT  '; cnty: 'Lithuania                               '   ),
    (sign: 'LV  '; cnty: 'Latvia                                  '   ),
    (sign: 'M   '; cnty: 'Malta                                   '   ),
    (sign: 'MA  '; cnty: 'Morocco                                 '   ),
    (sign: 'MAL '; cnty: 'Malaysia                                '   ),
    (sign: 'MC  '; cnty: 'Monaco                                  '   ),
    (sign: 'MD  '; cnty: 'Moldova                                 '   ),
    (sign: 'MEX '; cnty: 'Mexico                                  '   ),
    (sign: 'MGL '; cnty: 'Mongolia                                '   ),
    (sign: 'MH  '; cnty: 'Marshall Islands                        '   ),
    (sign: 'MK  '; cnty: 'Macedonia (NMK)                         '   ),
    (sign: 'MNE '; cnty: 'Montenegro                              '   ),
    (sign: 'MO  '; cnty: 'Macau                                   '   ),
    (sign: 'MOC '; cnty: 'Mozambique                              '   ),
    (sign: 'MS  '; cnty: 'Mauritius                               '   ),
    (sign: 'MV  '; cnty: 'Maldives                                '   ),
    (sign: 'MW  '; cnty: 'Malawi                                  '   ),
    (sign: 'N   '; cnty: 'Norway                                  '   ),
    (sign: 'NA  '; cnty: 'Netherlands Antilles                    '   ),
    (sign: 'NAM '; cnty: 'Namibia                                 '   ),
    (sign: 'NAU '; cnty: 'Nauru                                   '   ),
    (sign: 'NC  '; cnty: 'New Caledonia                           '   ),
    (sign: 'NEP '; cnty: 'Nepal                                   '   ),
    (sign: 'NI  '; cnty: 'Northern Ireland                        '   ),
    (sign: 'NIC '; cnty: 'Nicaragua                               '   ),
    (sign: 'NL  '; cnty: 'Netherlands                             '   ),
    (sign: 'NMK '; cnty: 'North Macedonia                         '   ),
    (sign: 'NZ  '; cnty: 'New Zealand                             '   ),
    (sign: 'OM  '; cnty: 'Oman                                    '   ),
    (sign: 'P   '; cnty: 'Portugal                                '   ),
    (sign: 'PA  '; cnty: 'Panama                                  '   ),
    (sign: 'PAL '; cnty: 'Palau                                   '   ),
    (sign: 'PE  '; cnty: 'Peru                                    '   ),
    (sign: 'PK  '; cnty: 'Pakistan                                '   ),
    (sign: 'PL  '; cnty: 'Poland                                  '   ),
    (sign: 'PMR '; cnty: 'Transnistria                            '   ),
    (sign: 'PNG '; cnty: 'Papua New Guinea                        '   ),
    (sign: 'PR  '; cnty: 'Puerto Rico                             '   ),
    (sign: 'PS  '; cnty: 'Palestine                               '   ),
    (sign: 'PY  '; cnty: 'Paraguay                                '   ),
    (sign: 'Q   '; cnty: 'Qatar                                   '   ),
    (sign: 'RA  '; cnty: 'Argentina                               '   ),
    (sign: 'RB  '; cnty: 'Botswana                                '   ),
    (sign: 'RC  '; cnty: 'Republic of China (Taiwan)              '   ),
    (sign: 'RCA '; cnty: 'Central African Republic                '   ),
    (sign: 'RCB '; cnty: 'Republic of the Congo                   '   ),
    (sign: 'RCH '; cnty: 'Chile                                   '   ),
    (sign: 'RG  '; cnty: 'Guinea                                  '   ),
    (sign: 'RGB '; cnty: 'Guinea-Bissau                           '   ),
    (sign: 'RH  '; cnty: 'Haiti                                   '   ),
    (sign: 'RI  '; cnty: 'Indonesia                               '   ),
    (sign: 'RIM '; cnty: 'Mauritania                              '   ),
    (sign: 'RKS '; cnty: 'Kosovo                                  '   ),
    (sign: 'RL  '; cnty: 'Lebanon                                 '   ),
    (sign: 'RM  '; cnty: 'Madagascar                              '   ),
    (sign: 'RMM '; cnty: 'Mali                                    '   ),
    (sign: 'RN  '; cnty: 'Niger                                   '   ),
    (sign: 'RO  '; cnty: 'Romania                                 '   ),
    (sign: 'ROK '; cnty: 'Republic of Korea                       '   ),
    (sign: 'ROU '; cnty: 'Uruguay (UY)                            '   ),
    (sign: 'RP  '; cnty: 'Philippines                             '   ),
    (sign: 'RSM '; cnty: 'San Marino                              '   ),
    (sign: 'RU  '; cnty: 'Burundi                                 '   ),
    (sign: 'RUS '; cnty: 'Russia                                  '   ),
    (sign: 'RWA '; cnty: 'Rwanda                                  '   ),
    (sign: 'S   '; cnty: 'Sweden                                  '   ),
    (sign: 'SCO '; cnty: 'Scotland                                '   ),
    (sign: 'SCV '; cnty: 'Vatican City                            '   ),
    (sign: 'SD  '; cnty: 'Swaziland                               '   ),
    (sign: 'SF  '; cnty: 'Finland (FIN)                           '   ),
    (sign: 'SGP '; cnty: 'Singapore                               '   ),
    (sign: 'SK  '; cnty: 'Slovakia                                '   ),
    (sign: 'SLE '; cnty: 'Sierra Leone                            '   ),
    (sign: 'SLO '; cnty: 'Slovenia                                '   ),
    (sign: 'SME '; cnty: 'Suriname                                '   ),
    (sign: 'SMO '; cnty: 'Sovereign Military Order of Malta       '   ), >> (sign: 'SN  '; cnty: 'Senegal                                 '   ),
    (sign: 'SO  '; cnty: 'Somalia                                 '   ),
    (sign: 'SOL '; cnty: 'Solomon Islands                         '   ),
    (sign: 'SRB '; cnty: 'Serbia                                  '   ),
    (sign: 'STP '; cnty: 'S o Tom‚ and Pr¡ncipe                   '   ),
    (sign: 'SUD '; cnty: 'Sudan                                   '   ),
    (sign: 'SY  '; cnty: 'Seychelles                              '   ),
    (sign: 'SYR '; cnty: 'Syria                                   '   ),
    (sign: 'T   '; cnty: 'Thailand                                '   ),
    (sign: 'TCH '; cnty: 'Chad                                    '   ),
    (sign: 'TD  '; cnty: 'Chad                                    '   ),
    (sign: 'TG  '; cnty: 'Togo                                    '   ),
    (sign: 'TJ  '; cnty: 'Tajikistan                              '   ),
    (sign: 'TL  '; cnty: 'Timor-Leste                             '   ),
    (sign: 'TM  '; cnty: 'Turkmenistan                            '   ),
    (sign: 'TN  '; cnty: 'Tunisia                                 '   ),
    (sign: 'TO  '; cnty: 'Tonga                                   '   ),
    (sign: 'TR  '; cnty: 'Turkey                                  '   ),
    (sign: 'TT  '; cnty: 'Trinidad and Tobago                     '   ),
    (sign: 'TUV '; cnty: 'Tuvalu                                  '   ),
    (sign: 'UA  '; cnty: 'Ukraine                                 '   ),
    (sign: 'UAE '; cnty: 'United Arab Emirates                    '   ),
    (sign: 'USA '; cnty: 'United States                           '   ),
    (sign: 'UY  '; cnty: 'Uruguay                                 '   ),
    (sign: 'UZ  '; cnty: 'Uzbekistan                              '   ),
    (sign: 'V   '; cnty: 'Vatican City                            '   ),
    (sign: 'VN  '; cnty: 'Vietnam                                 '   ),
    (sign: 'VU  '; cnty: 'Vanuatu                                 '   ),
    (sign: 'WAG '; cnty: 'Gambia                                  '   ),
    (sign: 'WAL '; cnty: 'Sierra Leone                            '   ),
    (sign: 'WAN '; cnty: 'Nigeria                                 '   ),
    (sign: 'WD  '; cnty: 'Dominica                                '   ),
    (sign: 'WG  '; cnty: 'Grenada                                 '   ),
    (sign: 'WL  '; cnty: 'Saint Lucia                             '   ),
    (sign: 'WS  '; cnty: 'Samoa                                   '   ),
    (sign: 'WSA '; cnty: 'Western Sahara                          '   ),
    (sign: 'WV  '; cnty: 'Saint Vincent and the Grenadines        '   ),
    (sign: 'YAR '; cnty: 'Yemen                                   '   ),
    (sign: 'YU  '; cnty: 'Yugoslavia                              '   ),
    (sign: 'YV  '; cnty: 'Venezuela                               '   ),
    (sign: 'Z   '; cnty: 'Zambia                                  '   ),
    (sign: 'ZA  '; cnty: 'South Africa                            '   ),
    (sign: 'ZW  '; cnty: 'Zimbabwe                                '   ),

    And as you can see, there are multiple entries for four countries,

    GB/UK,
    MK/NMK,
    ROU/UY, and
    SF/FIN

    which have decided to change their country codes after I had a first ride with
    a national of them, the few other users of my program might be using the new >> codes, but as long as only one of the two is used,it should not matter, but >> returning the same hash for both would be magic. The table should possibly >> also have included DDR, SU to name just two countries that no longer exist. >> Are there more, don't really know...

    Robert



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Robert Prins on Sat Feb 25 10:33:26 2023
    Robert Prins wrote:
    On 2023-02-23 20:59, Terje Mathisen wrote:
    Robert, are you intentionally trying to annoy me? :-)

    My own N, Norway is missing!

    Please look again! My program would crash, in 1982 I hitched all the way
    up to Nordkapp, that wonderful place where people only go to to wrongly
    tell their friends that they've been at the northernmost point of
    Europe. ;)

    Yeah, by my guess at least 99.99+% of all visitors to Nordkapp think
    they have been to said northernmost point in Europe:

    If you want "continental Europe", then the old ferry, now undersea
    tunnel to the Magerya island would disqualify Nordkapp. If you otoh
    allow this, like I would do as well since there is a small city on the
    island, then your map should show you the 3-5 hour hike needed to get
    from the main road to the real tip of the island:

    https://mapant.no/?#7/71.184/25.674 Knivskjrodden

    You could also make a very good case for

    https://mapant.no/?#8/71.130/27.648 Kinnarodden which is on the actual mainland, just a bit further east and slightly longer south.

    The difference of 0.054 degrees corresponds to about 6 km.

    Terje


    I'll take a look. In the meantime, have you considered an extremely
    simple single-entry cache: Last country used?

    That might very well be the best solution

    I am very confident that it is likely that any given user will tend to
    pick up subsequent rides from another car in the same area, right?

    That's certainly true

    If you want to be really fancy then you back that single entry up with
    3 or 7 more most recently used, pushing the oldest out when you find a
    new country. You resort on each new country

    That's what I used in 1996 while working for KLM while building a
    prototype payload prediction system (in PL/I). There I didn't sort, but
    just added a counter and threw out the item with the lowest value, and
    moved the new value into its location, and, obviously, in location 0 of
    the array.

    In this case the extra overhead might be more than doing the binary
    lookup? Probably the only way to find out would be to do something in
    pure DOS using the full set of data that's now going into the binary
    lookup, an extra "writeln()" should produce it, or I could use the
    "COUNT" option of the old z/OS PL/I compiler to give me the, what else, counts of executed PL/I statements, which would give at least a rough
    idea as to what's the way to go.

    I'll give it a try, once I've managed to figure out what I'm doing wrong
    with this mainframe PL/I version. Give me some time, family things are
    right now sadly taking up rather a lot of time, and don't spoil me with
    a, if there's one, ready made solution!

    Robert


    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Prins@21:1/5 to Terje Mathisen on Sat Feb 25 13:21:26 2023
    On 2023-02-25 09:33, Terje Mathisen wrote:
    Robert Prins wrote:
    On 2023-02-23 20:59, Terje Mathisen wrote:
    Robert, are you intentionally trying to annoy me? :-)

    My own N, Norway is missing!

    Please look again! My program would crash, in 1982 I hitched all the way up to
    Nordkapp, that wonderful place where people only go to to wrongly tell their >> friends that they've been at the northernmost point of Europe. ;)

    Yeah, by my guess at least 99.99+% of all visitors to Nordkapp think they have
    been to said northernmost point in Europe:

    If you want "continental Europe", then the old ferry, now undersea tunnel to the
    Magerøya island would disqualify Nordkapp. If you otoh allow this, like I would
    do as well since there is a small city on the island, then your map should show
    you the 3-5 hour hike needed to get from the main road to the real tip of the island:

    https://mapant.no/?#7/71.184/25.674  Knivskjærodden

    I know, although didn't in 1982, but that year hiking would have been all but impossible, as the place was still covered in a thick layer of snow, even though
    I got there on 20 June and left on 21 June, without having seen the midsummer sun, as the place was also covered in dense fog. :(

    You could also make a very good case for

    https://mapant.no/?#8/71.130/27.648 Kinnarodden which is on the actual mainland,
    just a bit further east and slightly longer south.

    The difference of 0.054 degrees corresponds to about 6 km.
    And after this touristic break, let's return to the lookup ;)

    Here's what comes out of the current PL/I version, where I actually cache the last country/nationality, and still get a 90% cache-miss, ouch.

    /********************************************************
    * SRCH_CNTY: *
    * *
    * Find the name of a country, given the license plate * ********************************************************/
    SRCH_CNTY: PROC(PLATE) RETURNS(FIXED BIN (31)); 3,885
    DCL PLATE CHAR (4); 3,885

    DCL #L FIXED BIN (31); 3,885
    DCL #H FIXED BIN (31); 3,885
    DCL #M FIXED BIN (31); 3,885
    DCL #P FIXED BIN (31) INIT (1) STATIC; 3,885

    /*+-----------------------------------------------------+
    | Start of executable code for srch_cnty |
    +----------------------------------------------------*/
    IF CPLATES(#P).PLATE = PLATE THEN 3,885
    DO;
    RETURN(#P); 395
    END; 0
    ELSE 3,490
    DO;
    #L = LBOUND(CPLATES.PLATE, 1); 3,490
    #H = HBOUND(CPLATES.PLATE, 1); 3,490

    DO WHILE(#L <= #H); 24,726
    #M = (#L + #H) / 2; 24,726

    SELECT; 24,726
    WHEN (CPLATES(#M).PLATE > PLATE) #H = #M - 1 24,726
    WHEN (CPLATES(#M).PLATE < PLATE) #L = #M + 1 13,242
    OTHER 3,490
    DO;
    #P = #M; 3,490
    RETURN(#M); 3,490
    END; 0
    END; 21,236
    END; 21,236
    END; 0

    PUT SKIP LIST('Erroneous plate: <' || PLATE || '>'); 0
    SIGNAL ERROR; 0
    END SRCH_CNTY; 0

    Swapping the two "WHEN" statements makes a bit of difference, the total of executed statements for the procedure goes up from 203148 to 204880, I guess that in a grey past I already looked at all "select-when-other" statements to put them in the (then) best order, it's probably something one would have to do every now and then, which of course rarely happens - the current versions of the
    IBM PL/I compiler sadly no longer have an option to count executed statements, even if these counts should obviously only serve as a very rough guide to what's
    going on, as the above 21,236 times "executed" statement that closes the SELECT doesn't actually generate any code.

    Anyway, I'll double(!) the cache, and see what difference that makes. ;)

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html


    I'll take a look. In the meantime, have you considered an extremely simple >>> single-entry cache: Last country used?

    That might very well be the best solution

    I am very confident that it is likely that any given user will tend to pick >>> up subsequent rides from another car in the same area, right?

    That's certainly true

    If you want to be really fancy then you back that single entry up with 3 or 7
    more most recently used, pushing the oldest out when you find a new country.
    You resort on each new country

    That's what I used in 1996 while working for KLM while building a prototype >> payload prediction system (in PL/I). There I didn't sort, but just added a >> counter and threw out the item with the lowest value, and moved the new value
    into its location, and, obviously, in location 0 of the array.

    In this case the extra overhead might be more than doing the binary lookup? >> Probably the only way to find out would be to do something in pure DOS using >> the full set of data that's now going into the binary lookup, an extra
    "writeln()" should produce it, or I could use the "COUNT" option of the old >> z/OS PL/I compiler to give me the, what else, counts of executed PL/I
    statements, which would give at least a rough idea as to what's the way to go.

    I'll give it a try, once I've managed to figure out what I'm doing wrong with
    this mainframe PL/I version. Give me some time, family things are right now >> sadly taking up rather a lot of time, and don't spoil me with a, if there's >> one, ready made solution!

    Robert



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Prins@21:1/5 to Robert Prins on Mon Feb 27 14:10:56 2023
    On 2023-02-25 13:21, Robert Prins wrote:
    On 2023-02-25 09:33, Terje Mathisen wrote:
    Robert Prins wrote:
    On 2023-02-23 20:59, Terje Mathisen wrote:
    Robert, are you intentionally trying to annoy me? :-)

    My own N, Norway is missing!

    Please look again! My program would crash, in 1982 I hitched all the way up >>> to Nordkapp, that wonderful place where people only go to to wrongly tell >>> their friends that they've been at the northernmost point of Europe. ;)

    Yeah, by my guess at least 99.99+% of all visitors to Nordkapp think they have
    been to said northernmost point in Europe:

    If you want "continental Europe", then the old ferry, now undersea tunnel to >> the Magerøya island would disqualify Nordkapp. If you otoh allow this, like I
    would do as well since there is a small city on the island, then your map
    should show you the 3-5 hour hike needed to get from the main road to the real
    tip of the island:

    https://mapant.no/?#7/71.184/25.674  Knivskjærodden

    I know, although didn't in 1982, but that year hiking would have been all but impossible, as the place was still covered in a thick layer of snow, even though
    I got there on 20 June and left on 21 June, without having seen the midsummer sun, as the place was also covered in dense fog. :(

    You could also make a very good case for

    https://mapant.no/?#8/71.130/27.648 Kinnarodden which is on the actual
    mainland, just a bit further east and slightly longer south.

    The difference of 0.054 degrees corresponds to about 6 km.
    And after this touristic break, let's return to the lookup ;)

    Here's what comes out of the current PL/I version, where I actually cache the last country/nationality, and still get a 90% cache-miss, ouch.

    <snip PL/I>

    Anyway, I'll double(!) the cache, and see what difference that makes. ;)

    Not surprisingly, that doubles the hits, but I still miss 80%.

    Any suggestions as to how efficiently quadruple the cache and still make it more
    efficient to look up and discard values than the current cache of just one entry
    and binary lookup would be welcome, but please don't spend/waste a lot of time on it.

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Robert Prins on Mon Feb 27 21:51:37 2023
    Since you have a way to measure hit rates, you obviously have some real
    (or synthetic?) access order test data, right?

    In that case you should give us a useful sample of that (maybe 10-25%)
    and let us try various alternatives, then we can measure average access
    time, along with max, min and various percentiles.

    I have a very hard time seeing how this could ever take more than maybe
    20-30 clock cycles, using a combination of an initial table lookup to
    hit the first entry starting with a given letter and then a scan for the
    actual country:

    The worst case is R with 24 entries, so naively

    movzx edi,al
    mov edx,firstletter[edi*4 - 'A'*4]
    mov ecx,25
    repne scasd

    Assuming REPNE is too slow we can convert to simpler code:

    movzx edi,al
    mov edx,firstletter[edi*4 - 'A'*4]
    next:
    cmp eax,[edi]
    lea edi,[edi+4]
    jb next
    je found
    jmp not_found

    This will normally run in a single cycle/iteration, but with two cycles/iteration plus a branch miss at the end, it could take 50-60
    cycles to determine that RZZ does not exist.

    Looking at the country codes I'm tempted to use all 5 bits from the
    first char and augment that with the 3 higher bits from the second, for
    a 256-entry lookup table where the correct entry will be found in a
    maximum of 6 tries (RA, RB, RC, RCA, RCB, RCH). All other country codes
    seems to hit in less than this, so with a 16-bit index into the country
    table I get something like this:

    ; EAX has country code in space-filled field
    mov ebx,eax
    mov ecx,eax
    and ebx,31 ;; 5 bits
    shr ecx,10 ;; Bottom char
    and ecx,7 ;; 3 bits
    lea ebx,[ecx+eax*8] ;; Combine them
    movzx ebx,word ptr country_index[ebx*2] ;; 5-8 cycle prefix
    next:
    cmp eax,country_table[ebx] ;; 4-char country code
    lea ebx,[ebx+COUNTRY_RECORD_SIZE]
    jb next ;; 1-2 cycle iteration
    je found ;; 4-10 cycle branch miss
    jmp invalid_country_code

    This looks like a guaranteed hit in less than 25 cycles, including a
    single branch miss, as long as the country table and country_index[]
    (512 bytes) are in $L1.

    Terje

    Robert Prins wrote:
    On 2023-02-25 13:21, Robert Prins wrote:
    On 2023-02-25 09:33, Terje Mathisen wrote:
    Robert Prins wrote:
    On 2023-02-23 20:59, Terje Mathisen wrote:
    Robert, are you intentionally trying to annoy me? :-)

    My own N, Norway is missing!

    Please look again! My program would crash, in 1982 I hitched all the
    way up to Nordkapp, that wonderful place where people only go to to
    wrongly tell their friends that they've been at the northernmost
    point of Europe. ;)

    Yeah, by my guess at least 99.99+% of all visitors to Nordkapp think
    they have been to said northernmost point in Europe:

    If you want "continental Europe", then the old ferry, now undersea
    tunnel to the Magerøya island would disqualify Nordkapp. If you otoh
    allow this, like I would do as well since there is a small city on
    the island, then your map should show you the 3-5 hour hike needed to
    get from the main road to the real tip of the island:

    https://mapant.no/?#7/71.184/25.674  Knivskjærodden

    I know, although didn't in 1982, but that year hiking would have been
    all but impossible, as the place was still covered in a thick layer of
    snow, even though I got there on 20 June and left on 21 June, without
    having seen the midsummer sun, as the place was also covered in dense
    fog. :(

    You could also make a very good case for

    https://mapant.no/?#8/71.130/27.648 Kinnarodden which is on the
    actual mainland, just a bit further east and slightly longer south.

    The difference of 0.054 degrees corresponds to about 6 km.
    And after this touristic break, let's return to the lookup ;)

    Here's what comes out of the current PL/I version, where I actually
    cache the last country/nationality, and still get a 90% cache-miss, ouch.

    <snip PL/I>

    Anyway, I'll double(!) the cache, and see what difference that makes. ;)

    Not surprisingly, that doubles the hits, but I still miss 80%.

    Any suggestions as to how efficiently quadruple the cache and still make
    it more efficient to look up and discard values than the current cache
    of just one entry and binary lookup would be welcome, but please don't spend/waste a lot of time on it.

    Robert


    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Prins@21:1/5 to Terje Mathisen on Tue Feb 28 13:15:58 2023
    On 2023-02-27 20:51, Terje Mathisen wrote:
    Since you have a way to measure hit rates, you obviously have some real (or synthetic?) access order test data, right?

    I run the PL/I version of the program, with the "COUNT" compiler option on a(n emulated) z/OS system, and that gives me the number of times each statement is executed, and with two cached values in two separate "return(cache(1|2));" statements, I know that my cache hits double when I keep the two last most recently used values, evicting the one with the lowest hit counter, probably not
    very sophisticated, but it works.

    In that case you should give us a useful sample of that (maybe 10-25%) and let
    us try various alternatives, then we can measure average access time, along with
    max, min and various percentiles.

    I can give you the full set of data, I'll just add a writeln() to the Pascal version (I don't want to re-IPL z/OS at the moment), zip it up and put it in the
    temp directory of my website, and have done just now. The redirected output of the Pascal version "lift" can be found @ <https://prino.neocities.org/@temp/lift-cache.zip>, with the table of countries in <https://prino.neocities.org/@temp/hhcommon.zip>

    5,408 not cached values, 4,387 cached values, need to figure out why the PL/I figures are so different.

    I have a very hard time seeing how this could ever take more than maybe 20-30 clock cycles, using a combination of an initial table lookup to hit the first entry starting with a given letter and then a scan for the actual country:

    The worst case is R with 24 entries, so naively

      movzx edi,al
      mov edx,firstletter[edi*4 - 'A'*4]
      mov ecx,25
      repne scasd

    Assuming REPNE is too slow we can convert to simpler code:

      movzx edi,al
      mov edx,firstletter[edi*4 - 'A'*4]
    next:
      cmp eax,[edi]
      lea edi,[edi+4]
      jb next
      je found
      jmp not_found

    This will normally run in a single cycle/iteration, but with two cycles/iteration plus a branch miss at the end, it could take 50-60 cycles to determine that RZZ does not exist.

    From a performance point of view, I could/should of course only keep the current 95 countries/nationalities, and let the program warn me with an "UD2" that I need to add a new one.

    Looking at the country codes I'm tempted to use all 5 bits from the first char
    and augment that with the 3 higher bits from the second, for a 256-entry lookup
    table where the correct entry will be found in a maximum of 6 tries (RA, RB, RC,
    RCA, RCB, RCH). All other country codes seems to hit in less than this, so with
    a 16-bit index into the country table I get something like this:

    ; EAX has country code in space-filled field
      mov ebx,eax
      mov ecx,eax
      and ebx,31    ;; 5 bits
      shr ecx,10    ;; Bottom char
      and ecx,7    ;; 3 bits
      lea ebx,[ecx+eax*8]    ;; Combine them
      movzx ebx,word ptr country_index[ebx*2]    ;; 5-8 cycle prefix
    next:
      cmp eax,country_table[ebx]    ;; 4-char country code
      lea ebx,[ebx+COUNTRY_RECORD_SIZE]
       jb next                ;; 1-2 cycle iteration
       je found                ;; 4-10 cycle branch miss
       jmp invalid_country_code

    This looks like a guaranteed hit in less than 25 cycles, including a single branch miss, as long as the country table and country_index[] (512 bytes) are in
    $L1.

    I have no clue as how to figure out how to determine that, I know the CPU (an i7-4710MQ) has 6Gb cache, and I also now that the z/OS PL/I version uses about 20Mb of heap to keep the input data in various linked lists, and that the x86 versions is likely to use more, as I've rounded up many of the list structures to 32n bytes, to allow them to be initialised by VMOVDQU instruction.

    I'll give your code a try, thanks,

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    Terje

    Robert Prins wrote:
    On 2023-02-25 13:21, Robert Prins wrote:
    On 2023-02-25 09:33, Terje Mathisen wrote:
    Robert Prins wrote:
    On 2023-02-23 20:59, Terje Mathisen wrote:
    Robert, are you intentionally trying to annoy me? :-)

    My own N, Norway is missing!

    Please look again! My program would crash, in 1982 I hitched all the way up
    to Nordkapp, that wonderful place where people only go to to wrongly tell >>>>> their friends that they've been at the northernmost point of Europe. ;) >>>>
    Yeah, by my guess at least 99.99+% of all visitors to Nordkapp think they >>>> have been to said northernmost point in Europe:

    If you want "continental Europe", then the old ferry, now undersea tunnel to
    the Magerøya island would disqualify Nordkapp. If you otoh allow this, like
    I would do as well since there is a small city on the island, then your map
    should show you the 3-5 hour hike needed to get from the main road to the >>>> real tip of the island:

    https://mapant.no/?#7/71.184/25.674  Knivskjærodden

    I know, although didn't in 1982, but that year hiking would have been all but
    impossible, as the place was still covered in a thick layer of snow, even >>> though I got there on 20 June and left on 21 June, without having seen the >>> midsummer sun, as the place was also covered in dense fog. :(

    You could also make a very good case for

    https://mapant.no/?#8/71.130/27.648 Kinnarodden which is on the actual >>>> mainland, just a bit further east and slightly longer south.

    The difference of 0.054 degrees corresponds to about 6 km.
    And after this touristic break, let's return to the lookup ;)

    Here's what comes out of the current PL/I version, where I actually cache the
    last country/nationality, and still get a 90% cache-miss, ouch.

    <snip PL/I>

    Anyway, I'll double(!) the cache, and see what difference that makes. ;)

    Not surprisingly, that doubles the hits, but I still miss 80%.

    Any suggestions as to how efficiently quadruple the cache and still make it >> more efficient to look up and discard values than the current cache of just >> one entry and binary lookup would be welcome, but please don't spend/waste a >> lot of time on it.

    Robert



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Prins@21:1/5 to Terje Mathisen on Wed Mar 1 20:16:33 2023
    On 2023-02-27 20:51, Terje Mathisen wrote:
    Since you have a way to measure hit rates, you obviously have some real (or synthetic?) access order test data, right?

    In that case you should give us a useful sample of that (maybe 10-25%) and let
    us try various alternatives, then we can measure average access time, along with
    max, min and various percentiles.

    I have a very hard time seeing how this could ever take more than maybe 20-30 clock cycles, using a combination of an initial table lookup to hit the first entry starting with a given letter and then a scan for the actual country:

    The worst case is R with 24 entries, so naively

      movzx edi,al
      mov edx,firstletter[edi*4 - 'A'*4]
      mov ecx,25
      repne scasd

    This will work, as it's a straight compare for equality

    Assuming REPNE is too slow we can convert to simpler code:

      movzx edi,al
      mov edx,firstletter[edi*4 - 'A'*4]
    next:
      cmp eax,[edi]
      lea edi,[edi+4]
      jb next
      je found
      jmp not_found

    I may be wrong, but should you not have included BSWAP instructions, like in my original binary lookup? (I cannot use MOVBE)

    This will normally run in a single cycle/iteration, but with two cycles/iteration plus a branch miss at the end, it could take 50-60 cycles to determine that RZZ does not exist.

    Looking at the country codes I'm tempted to use all 5 bits from the first char
    and augment that with the 3 higher bits from the second, for a 256-entry lookup
    table where the correct entry will be found in a maximum of 6 tries (RA, RB, RC,
    RCA, RCB, RCH). All other country codes seems to hit in less than this, so with
    a 16-bit index into the country table I get something like this:

    ; EAX has country code in space-filled field
      mov ebx,eax
      mov ecx,eax
      and ebx,31    ;; 5 bits
      shr ecx,10    ;; Bottom char
      and ecx,7    ;; 3 bits
      lea ebx,[ecx+eax*8]    ;; Combine them

    lea ebx,[ecx+ebx*8] ; I presume?

      movzx ebx,word ptr country_index[ebx*2]    ;; 5-8 cycle prefix
    next:
      cmp eax,country_table[ebx]    ;; 4-char country code
      lea ebx,[ebx+COUNTRY_RECORD_SIZE]
       jb next                ;; 1-2 cycle iteration
       je found                ;; 4-10 cycle branch miss
       jmp invalid_country_code

    This looks like a guaranteed hit in less than 25 cycles, including a single branch miss, as long as the country table and country_index[] (512 bytes) are in
    $L1.

    OK, now I just have to figure out what's in those two tables.

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Robert Prins on Thu Mar 2 09:06:48 2023
    To make this work with 32-bit compares we need the most significant
    letter (i.e. the first) to be located in the last byte, i.e. the
    4-letter country codes must be stored in BE order in the table! We could
    also use a secondary table consisting of just the reversed country codes
    and a pointer to the actual record: With ~400 countries this would be
    3.2KB, then we add 512 bytes for the ~1.6 letter lookup table which
    contains 16-bit offsets into the country table. We end up with less than
    4 KB total plus the 32/64 bytes per country for the actual table.

    It is much better to require a BSWAP when using it later for other stuff
    (or simply storing both forms as in that extra helper table) than to
    require a BSWAP in the critical path.

    BTW, if you actually implement this idea, then I would also compress the country table by stripping out all the extra spaces: The inverted
    country lookup table would contain the actual address of each packed
    record. Doing this means that you might halve the size of the table and
    avoid the potentially bad issue of having all records aligned on a
    larger pwoer of two, since this can easily lead to cache trashing via
    limited associativity. (Normally not an issue for anything that fits in
    $L1!)

    Terje

    Robert Prins wrote:
    On 2023-02-27 20:51, Terje Mathisen wrote:
    Since you have a way to measure hit rates, you obviously have some
    real (or synthetic?) access order test data, right?

    In that case you should give us a useful sample of that (maybe 10-25%)
    and let us try various alternatives, then we can measure average
    access time, along with max, min and various percentiles.

    I have a very hard time seeing how this could ever take more than
    maybe 20-30 clock cycles, using a combination of an initial table
    lookup to hit the first entry starting with a given letter and then a
    scan for the actual country:

    The worst case is R with 24 entries, so naively

       movzx edi,al
       mov edx,firstletter[edi*4 - 'A'*4]
       mov ecx,25
       repne scasd

    This will work, as it's a straight compare for equality

    Assuming REPNE is too slow we can convert to simpler code:

       movzx edi,al
       mov edx,firstletter[edi*4 - 'A'*4]
    next:
       cmp eax,[edi]
       lea edi,[edi+4]
       jb next
       je found
       jmp not_found

    I may be wrong, but should you not have included BSWAP instructions,
    like in my original binary lookup? (I cannot use MOVBE)

    This will normally run in a single cycle/iteration, but with two
    cycles/iteration plus a branch miss at the end, it could take 50-60
    cycles to determine that RZZ does not exist.

    Looking at the country codes I'm tempted to use all 5 bits from the
    first char and augment that with the 3 higher bits from the second,
    for a 256-entry lookup table where the correct entry will be found in
    a maximum of 6 tries (RA, RB, RC, RCA, RCB, RCH). All other country
    codes seems to hit in less than this, so with a 16-bit index into the
    country table I get something like this:

    ; EAX has country code in space-filled field
       mov ebx,eax
       mov ecx,eax
       and ebx,31    ;; 5 bits
       shr ecx,10    ;; Bottom char
       and ecx,7    ;; 3 bits
       lea ebx,[ecx+eax*8]    ;; Combine them

    lea ebx,[ecx+ebx*8] ; I presume?

       movzx ebx,word ptr country_index[ebx*2]    ;; 5-8 cycle prefix
    next:
       cmp eax,country_table[ebx]    ;; 4-char country code
       lea ebx,[ebx+COUNTRY_RECORD_SIZE]
        jb next                ;; 1-2 cycle iteration
        je found                ;; 4-10 cycle branch miss
        jmp invalid_country_code

    This looks like a guaranteed hit in less than 25 cycles, including a
    single branch miss, as long as the country table and country_index[]
    (512 bytes) are in $L1.

    OK, now I just have to figure out what's in those two tables.

    Robert


    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Prins@21:1/5 to Terje Mathisen on Wed Mar 8 23:49:34 2023
    OK, I've gone for what I think is a pretty good solution:

    My first try?

    I added an array['A'..'Z'] of array [0..1], which contained [low, high] for each
    entry, reducing the entries in the "do-while low <= high" loop from a current 40,364 to 14,909, or about 63%.

    But then I thought, "Mm, some countries starting with letter "?" seem to occur a
    lot more than other countries starting with letter "?", but will only be found after a few binary chops... So why not expand the ['A'..'Z'] array to an array of [0..3] arrays, containing [low, high, most-used, 0] and on a no-match first try the "most-used" index?

    Result? Only 2,823 entries into the "do-while low <= high" loop, a reduction of just over 93%.

    Do I make sense, and does this come anywhere close to your (Terje) "almost all programming can be viewed as an exercise in caching" mantra?

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    On 2023-03-02 08:06, Terje Mathisen wrote:
    To make this work with 32-bit compares we need the most significant letter (i.e.
    the first) to be located in the last byte, i.e. the 4-letter country codes must
    be stored in BE order in the table! We could also use a secondary table consisting of just the reversed country codes and a pointer to the actual record: With ~400 countries this would be 3.2KB, then we add 512 bytes for the
    ~1.6 letter lookup table which contains 16-bit offsets into the country table.
    We end up with less than 4 KB total plus the 32/64 bytes per country for the actual table.

    It is much better to require a BSWAP when using it later for other stuff (or simply storing both forms as in that extra helper table) than to require a BSWAP
    in the critical path.

    BTW, if you actually implement this idea, then I would also compress the country
    table by stripping out all the extra spaces: The inverted country lookup table
    would contain the actual address of each packed record. Doing this means that you might halve the size of the table and avoid the potentially bad issue of having all records aligned on a larger pwoer of two, since this can easily lead
    to cache trashing via limited associativity. (Normally not an issue for anything
    that fits in $L1!)

    Terje

    Robert Prins wrote:
    On 2023-02-27 20:51, Terje Mathisen wrote:
    Since you have a way to measure hit rates, you obviously have some real (or >>> synthetic?) access order test data, right?

    In that case you should give us a useful sample of that (maybe 10-25%) and >>> let us try various alternatives, then we can measure average access time, >>> along with max, min and various percentiles.

    I have a very hard time seeing how this could ever take more than maybe 20-30
    clock cycles, using a combination of an initial table lookup to hit the first
    entry starting with a given letter and then a scan for the actual country: >>>
    The worst case is R with 24 entries, so naively

       movzx edi,al
       mov edx,firstletter[edi*4 - 'A'*4]
       mov ecx,25
       repne scasd

    This will work, as it's a straight compare for equality

    Assuming REPNE is too slow we can convert to simpler code:

       movzx edi,al
       mov edx,firstletter[edi*4 - 'A'*4]
    next:
       cmp eax,[edi]
       lea edi,[edi+4]
       jb next
       je found
       jmp not_found

    I may be wrong, but should you not have included BSWAP instructions, like in >> my original binary lookup? (I cannot use MOVBE)

    This will normally run in a single cycle/iteration, but with two
    cycles/iteration plus a branch miss at the end, it could take 50-60 cycles to
    determine that RZZ does not exist.

    Looking at the country codes I'm tempted to use all 5 bits from the first >>> char and augment that with the 3 higher bits from the second, for a 256-entry
    lookup table where the correct entry will be found in a maximum of 6 tries >>> (RA, RB, RC, RCA, RCB, RCH). All other country codes seems to hit in less >>> than this, so with a 16-bit index into the country table I get something like
    this:

    ; EAX has country code in space-filled field
       mov ebx,eax
       mov ecx,eax
       and ebx,31    ;; 5 bits
       shr ecx,10    ;; Bottom char
       and ecx,7    ;; 3 bits
       lea ebx,[ecx+eax*8]    ;; Combine them

    lea ebx,[ecx+ebx*8] ; I presume?

       movzx ebx,word ptr country_index[ebx*2]    ;; 5-8 cycle prefix
    next:
       cmp eax,country_table[ebx]    ;; 4-char country code
       lea ebx,[ebx+COUNTRY_RECORD_SIZE]
        jb next                ;; 1-2 cycle iteration
        je found                ;; 4-10 cycle branch miss
        jmp invalid_country_code

    This looks like a guaranteed hit in less than 25 cycles, including a single >>> branch miss, as long as the country table and country_index[] (512 bytes) are
    in $L1.

    OK, now I just have to figure out what's in those two tables.

    Robert



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Robert Prins on Thu Mar 9 11:19:33 2023
    This is much better!

    I still think you should pay more attention to the cost of branches,
    i.e. a sequential scan of a shortish segment of the country array is
    very cheap, on the order of 2-3 clock cycles/country, while a single mispredicted branch cost 8-20 cycles depending upon the CPU you are using.

    Assuming this really is so important, why not "waste" a bit more lookup
    table RAM for a full 2-char index?

    This is still only 26x27 (including space)->722 entries, each needing 16
    bits so 1444 bytes.

    Looking at your country list I see the following long IDs that need a
    short scan after the lookup:

    AND,ANG
    ARK,ARU
    AUA,AUS
    AX,AXA

    BD,BDS
    BH,BHT
    BR,BRN,BRU

    CH,CHN
    CO,COM
    CY,CYM

    EAK,EAT,EAU,EAZ
    ES,EST
    ET,ETH

    GB,GBA,GBG,GBJ,GBM,GBZ

    HK,HKJ

    IT,IRL,IRQ

    LAO,LAR

    MA,MAL
    MO,MOC

    NA,NAM,NAU
    NI,NIC

    PA,PAL

    RC,RCA,RCB,RCH
    RG,RGB
    RI,RIM
    RM,RMM
    RO,ROK,ROU
    RU,RUS

    SLE,SLO
    SME,SMO
    SO,SOL
    SY,SYR

    UA,UAE

    WAG,WAL,WAN
    WS,WSA

    All the remaining countries will point to a single entry, and for the
    worst case here, starting with GB which has 6 entries, the first one
    (GB) is by far the most common one, right? There are only 48 countries
    which are not unique or the first entry for a given combination!

    mov ebx,[country] ;; LE byte order, so first byte in AL
    mov eax,ebx
    and ebx,1f1fh
    xor ecx,ecx
    imul cl,bl,27
    shr ebx,8 ;; key = (name[1] & 31) + ((name[0] & 31) * 27)
    add ebx,ecx
    bswap eax ;; Big endian for faster compares

    movzx ebx,word ptr twobyte_table[ebx*2]
    next:
    cmp eax,country_table[ebx] ;; Each entry starts with BE name
    je found ;; This will happen immediately for
    ;; all except 48 countries
    lea ebx,[ebx+COUNTRY_TABLE_ENTRY_SIZE]
    jb next
    not_found: ;; Invalid/missing country code

    BTW, from the size of the worst case entry, GB*, it is obvious that 26*27*6=4212 entries in a hash/lookup table will be enough to find all
    entries directly, with zero searching.

    I really cannot see any circumstance that would require such
    speed/optimization for this trivial problem!

    Terje

    Robert Prins wrote:
    OK, I've gone for what I think is a pretty good solution:

    My first try?

    I added an array['A'..'Z'] of array [0..1], which contained [low, high]
    for each entry, reducing the entries in the "do-while low <= high" loop
    from a current 40,364 to 14,909, or about 63%.

    But then I thought, "Mm, some countries starting with letter "?" seem to occur a lot more than other countries starting with letter "?", but will
    only be found after a few binary chops... So why not expand the
    ['A'..'Z'] array to an array of [0..3] arrays, containing [low, high, most-used, 0] and on a no-match first try the "most-used" index?

    Result? Only 2,823 entries into the "do-while low <= high" loop, a
    reduction of just over 93%.

    Do I make sense, and does this come anywhere close to your (Terje)
    "almost all programming can be viewed as an exercise in caching" mantra?

    Robert


    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Prins@21:1/5 to Terje Mathisen on Thu Mar 9 15:28:52 2023
    On 2023-03-09 10:19, Terje Mathisen wrote:
    This is much better!

    Thank you!

    I still think you should pay more attention to the cost of branches, i.e. a sequential scan of a shortish segment of the country array is very cheap, on the
    order of 2-3 clock cycles/country, while a single mispredicted branch cost 8-20
    cycles depending upon the CPU you are using.

    An AMD FX8150 and an Intel i7-4710MQ.

    Assuming this really is so important, why not "waste" a bit more lookup table RAM for a full 2-char index?

    It's Utterly Unimportant (with two huge "U"s), the program as-is runs in 0.32 seconds clock-time, but it's just interesting to see how far I can go in pure Pascal, to translate that subsequently into in-line assembler, where I'm in many
    cases harking back to Turbo Pascal 3 times, by coding post-Pentium instructions as "db" sequences, the original code dates back to 1994, and pretty amazingly, many of my original record structures are very AVX friendly!

    Anyway, given that my ['A'..'Z'] array had one position open, I've used that to replace the general last-matched country with a per-initial character last-matched country, and that had cut the chop-count even more:

    Original full range : 40,362
    Low/high per initial character (pic): 14,907
    As previous + first most used pic : 2,871
    As previous + cache pic : 1,747

    And now I'm going to think slightly outside the box, but will have to knock up a
    bit of REXX to actually test this, by what might be counter-intuitive, upping the low and/or high value of some ranges into the previous/next (or, unlikely, but never say never, pre-previous or over-next) range, so that an initial first lookup immediately finds the second most-used country, without generating an excessive number of additional lookups for even less used countries.

    E.g. for countries starting with "L" I currently only have three, in order of use, "LT", "LV" (74 lookups), and "L" (45 lookups), both requiring three chops for a total of 357 chops. If I were to extend the "L" range into the "M" range so that a first chop returns "LV" I save 148 chops, which means that even if I need up to three more chops for "L", I still save time, and for this specific case, extending the "Lx" range to "MEX" gives me single-chop hits on "LV" and still the same three-chop hits on "L", so reducing the number of chops to 209. In this specific case, I could go even further, and take the lo-range back into the "K" countries and get to "L" in just two chops.

    Doing the same for the also easy to try three additional (after "GB") "G*" countries saves me another 96 chops if I put "GR", the second most used, in the initial middle of the range (by extending the high value to "IR"), without affecting "GE" and "GH"chop counts.

    I'm going to see how this works out before doing anything else, as this change would be easy to implement in Pascal and in my PL/I version of the same on z/OS,
    I keep the two and they should give the same output, if they don't I know that I've screwed up somewhere!

    This is still only 26x27 (including space)->722 entries, each needing 16 bits so
    1444 bytes.

    Looking at your country list I see the following long IDs that need a short scan
    after the lookup:

    AND,ANG
    ARK,ARU
    AUA,AUS
    AX,AXA

    BD,BDS
    BH,BHT
    BR,BRN,BRU

    CH,CHN
    CO,COM
    CY,CYM

    EAK,EAT,EAU,EAZ
    ES,EST
    ET,ETH

    GB,GBA,GBG,GBJ,GBM,GBZ

    HK,HKJ

    IT,IRL,IRQ

    LAO,LAR

    MA,MAL
    MO,MOC

    NA,NAM,NAU
    NI,NIC

    PA,PAL

    RC,RCA,RCB,RCH
    RG,RGB
    RI,RIM
    RM,RMM
    RO,ROK,ROU
    RU,RUS

    SLE,SLO
    SME,SMO
    SO,SOL
    SY,SYR

    UA,UAE

    WAG,WAL,WAN
    WS,WSA

    …and most of them would not even be required. The most optimal solution would be
    to write a pre-processor that actually builds the plates and index-to-the-plates
    tables from the input file, potentially even with the range extensions, for only
    the countries/nationalities that are used. And of course right now I already could throw out duplicates (GB/UK, MK/NMK, ROU/UY, SF/FIN) and various regions like ENG/SCO/NI/CYM (not the various GBx'es), and stick to the 193 UN countries,
    and the 6 non-official ones.

    <snip code, look at it RSN…>

    As I already wrote, there are abso-(fill in your favourite strong word)-lutely no circumstances whatsoever to do this, other than and it's still too early/cold
    to start work in the garden and greenhouse for the summer. ;) However, it keeps my mind sharp(ish), just in case that elusive z/OS PL/I job comes along sometime
    in the next two or three years, before I retire.

    Anyway thanks for your feedback, I'll keep you posted with the results of my further experiments.

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    Terje

    Robert Prins wrote:
    OK, I've gone for what I think is a pretty good solution:

    My first try?

    I added an array['A'..'Z'] of array [0..1], which contained [low, high] for >> each entry, reducing the entries in the "do-while low <= high" loop from a >> current 40,364 to 14,909, or about 63%.

    But then I thought, "Mm, some countries starting with letter "?" seem to occur
    a lot more than other countries starting with letter "?", but will only be >> found after a few binary chops... So why not expand the ['A'..'Z'] array to an
    array of [0..3] arrays, containing [low, high, most-used, 0] and on a no-match
    first try the "most-used" index?

    Result? Only 2,823 entries into the "do-while low <= high" loop, a reduction >> of just over 93%.

    Do I make sense, and does this come anywhere close to your (Terje) "almost all
    programming can be viewed as an exercise in caching" mantra?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)