libera/#maemo/ Wednesday, 2020-10-14

brolin_empeyMaxdamantus: OK, thank you for the informative answer.  I will try to get around to opening your two links.  I only know how to write in languages that use an alphabet so was curious how text written in a language such as Chinese or Japanese that does not use an alphabet is sorted, assuming that it can be sorted.  My friend from Beijing showed me how he uses an IME to write in Chinese on Android but I do not know enough about a language that does not use an00:04
brolin_empeyalphabet to write in the language.00:04
brolin_empeyHe said he thinks the stroke order or number of strokes, do not remember which one he said, is used for sorting text but at least one other person I asked said they do not think text written in Chinese can be sorted.  I kept meaning to try using software that supports Chinese text to sort Chinese text to see what it does but I ran out of time then forgot about it or had other, higher priority things to do.00:08
MaxdamantusIt will likely be ad-hoc to the writing system. I'm not sure how Chinese logograms work exactly, but in general I would expect a writing system to be made of a relatively small number of primitive concepts.00:25
Maxdamantuseg, if you look at Hangul, you might have thousands of "characters", but each one is really just a combination of up to three primitive symbols denoting any start/middle/end sounds for a syllable.00:26
Maxdamantus(Japanese kana are similar, but with the exception of "-n", their syllables all consist of one vowel, possibly preceded by a consonant, so only two primitive concepts in each glyph)00:28
L29Ahno?00:28
Maxdamantusand since that combination in Japanese kana only leads to around 50 symbols (5*10), it doesn't need to be as regular as Hangul.00:29
MaxdamantusNo what?00:29
L29Ahah nvm, for the ordering reason it's ok00:30
L29Ahthere's ゃ, ゅ and ょ to have a little fun with00:30
L29Ahanyway though i don't see why don't you just grab unicode code points and be done with it00:31
MaxdamantusBecause Unicode code point ordering might not follow a well-understood pattern. It just depends on who designed the layout for that script in Unicode.00:43
MaxdamantusEven in Latin-based scripts, you don't have that. An obvious example would be 'ı' in Turkish.00:43
Maxdamantusor simply 'ü' in German.00:44
L29Ahi think it can even change between languages using the same character set00:44
MaxdamantusI imagine there are languages using Latin-based scripts that have orders that are inconsistent with English.00:45
Maxdamantusalso, I know that in Arabic there are at least two well-known orderings of letters (one starts with "alef, ba, gim, dal" like in Greek, the other starts with "alef, ba, ta, tha")00:46
Maxdamantusand people use those different Arabic orders in different contexts.00:46
* enyc meows00:47
brolin_empey$ cat /dev/urandom >enyc00:47
CcxWrkYou don't sort by codepoints, there's whole Unicode Collation Algorithm: https://www.unicode.org/reports/tr10/15:55
L29Ah> Siniform ideographs — most notably modern CJK (Han) ideographs — and Hangul syllables are not explicitly mentioned in the default table. Ideographs are mapped to collation elements that are derived from their Unicode code point value as described in Section 10.1.3, Implicit Weights.15:56
CcxWrkHm, even libicu pages on this seems to be full of TODOs http://site.icu-project.org/design/collation/script-reordering16:02
CcxWrkHeh and the official document on Collation points to … PowerPoint file? :]16:05
CcxWrkBut no, we better focus on adding more emoji combinations /s16:05
KotCzarnysticking to those funny chars is like keeping ebcdic around16:07
KotCzarnysure, some legacy code uses it, but whole thing should be deprecated16:07
L29Ahindeed, latin should be deprecated in favour of han16:08
KotCzarnyi think you've meant emojis16:10
L29Ahnah emojis are ideographs like han, they're fine16:11

Generated by irclog2html.py 2.17.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!