• German collation routines for YottaDB UTF-8 mode

    From K.S. Bhaskar@21:1/5 to All on Sun Nov 21 12:20:45 2021
    Characters in Unicode order are often not the linguistically or culturally correct order. For example, from YottaDB in UTF-8 mode:

    set sz="ß",SZ="ẞ" write $ascii(sz)," ",$ascii(SZ)
    223 7838
    set umch="äëïöüÿÄËÏÖÜŸ" for i=1:1:$length(umch) write $ascii($extract(umch,i))," "
    228 235 239 246 252 255 196 203 207 214 220 376
    write "Öhman"]"Pfaff"," ","Ohman"]"Pfaff"
    1 0
    write "Öhman"]]"Pfaff"," ","Ohman"]]"Pfaff"
    1 0


    Has anyone developed collation routines (https://docs.yottadb.com/ProgrammersGuide/internatn.html#creating-the-alternate-collation-routines) so that YottaDB can correctly display German words and names? Thank you very much.

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From ed de moel@21:1/5 to All on Mon Nov 22 12:13:16 2021
    I don't have any code for this "on the shelf", but I'd start by going through the strings, and replacing all the compound characters with their components, i.e. translate "ä" into "ae", "ß" into "sz", etc., and then comparing them in the "old-fashioned"
    way.
    (which would work for most cases, my German isn't too good, but I am aware that "ß" sometimes should become "sz" and sometimes "ss"...)

    Hope this works as a starting point,
    Ed

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From K.S. Bhaskar@21:1/5 to ed de moel on Mon Nov 22 12:39:33 2021
    On Monday, November 22, 2021 at 3:13:17 PM UTC-5, ed de moel wrote:
    I don't have any code for this "on the shelf", but I'd start by going through the strings, and replacing all the compound characters with their components, i.e. translate "ä" into "ae", "ß" into "sz", etc., and then comparing them in the "old-
    fashioned" way.
    (which would work for most cases, my German isn't too good, but I am aware that "ß" sometimes should become "sz" and sometimes "ss"...)

    Hope this works as a starting point,
    Ed

    Thanks Ed. That's a good suggestion, but for performance reasons, the database engine doesn't quite work that way. It requires a forward transformation, which should be fairly straightforward (e.g., ä→ae), but the reverse transformation is not always
    clear (e.g., should all occurrences of ae in subscripts be converted to ä)? But this gives me something to think about.

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jens@21:1/5 to All on Tue Nov 23 05:13:09 2021
    I'm german, but I wasn't sure about the correct sort-order.
    It seems that there are two options:

    1. ä=a, ö=o, ü=u, ß=ss ------ Example: Bäcker->Bader->Bäder->Busse->Buße
    2. ä=ae,ö=oe,ü=ue, ß=ss ---- Example: Bader->Bäcker->Bäder->Busse->Buße

    Both versions are used in some cases. MS Word uses option 1, Phonebooks are sorted like option 2.

    Hope, this helps.

    Jens

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From K.S. Bhaskar@21:1/5 to Jens on Tue Nov 23 07:16:15 2021
    On Tuesday, November 23, 2021 at 8:13:10 AM UTC-5, Jens wrote:
    I'm german, but I wasn't sure about the correct sort-order.
    It seems that there are two options:

    1. ä=a, ö=o, ü=u, ß=ss ------ Example: Bäcker->Bader->Bäder->Busse->Buße
    2. ä=ae,ö=oe,ü=ue, ß=ss ---- Example: Bader->Bäcker->Bäder->Busse->Buße

    Both versions are used in some cases. MS Word uses option 1, Phonebooks are sorted like option 2.

    Hope, this helps.

    Jens

    Thanks Jens. I was trying to help a German friend who uses YottaDB to index some text as a personal, post-retirement project. It seems that things are not as simple as I thought! Out of curiosity, what sorting order do dictionaries use?

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jens@21:1/5 to K.S. Bhaskar on Tue Nov 23 07:30:32 2021
    K.S. Bhaskar schrieb am Dienstag, 23. November 2021 um 16:16:16 UTC+1:
    On Tuesday, November 23, 2021 at 8:13:10 AM UTC-5, Jens wrote:
    I'm german, but I wasn't sure about the correct sort-order.
    It seems that there are two options:

    1. ä=a, ö=o, ü=u, ß=ss ------ Example: Bäcker->Bader->Bäder->Busse->Buße
    2. ä=ae,ö=oe,ü=ue, ß=ss ---- Example: Bader->Bäcker->Bäder->Busse->Buße

    Both versions are used in some cases. MS Word uses option 1, Phonebooks are sorted like option 2.

    Hope, this helps.

    Jens
    Thanks Jens. I was trying to help a German friend who uses YottaDB to index some text as a personal, post-retirement project. It seems that things are not as simple as I thought! Out of curiosity, what sorting order do dictionaries use?

    Regards
    – Bhaskar
    I just looked into a German/English dictionary and this is sorted like option 1

    Regards Jens

    PS: if I can help your friend in any way, I would do so. I still like coding in M
    PSS: Just working on the Visual Studio Code extension to check correct NEWing of M local variables. :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From K.S. Bhaskar@21:1/5 to Jens on Tue Nov 23 11:48:08 2021
    On Tuesday, November 23, 2021 at 10:30:33 AM UTC-5, Jens wrote:
    K.S. Bhaskar schrieb am Dienstag, 23. November 2021 um 16:16:16 UTC+1:
    On Tuesday, November 23, 2021 at 8:13:10 AM UTC-5, Jens wrote:
    I'm german, but I wasn't sure about the correct sort-order.
    It seems that there are two options:

    1. ä=a, ö=o, ü=u, ß=ss ------ Example: Bäcker->Bader->Bäder->Busse->Buße
    2. ä=ae,ö=oe,ü=ue, ß=ss ---- Example: Bader->Bäcker->Bäder->Busse->Buße

    Both versions are used in some cases. MS Word uses option 1, Phonebooks are sorted like option 2.

    Hope, this helps.

    Jens
    Thanks Jens. I was trying to help a German friend who uses YottaDB to index some text as a personal, post-retirement project. It seems that things are not as simple as I thought! Out of curiosity, what sorting order do dictionaries use?

    Regards
    – Bhaskar
    I just looked into a German/English dictionary and this is sorted like option 1

    Regards Jens

    PS: if I can help your friend in any way, I would do so. I still like coding in M
    PSS: Just working on the Visual Studio Code extension to check correct NEWing of M local variables. :-)

    Jens –

    My friend would be glad of any assistance. Would you please send your e-mail address to me: bhaskar at yottadb dot com? Thank you very much in advance.

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)