libera/#maemo/ Wednesday, 2020-10-14

brolin_empey	Maxdamantus: OK, thank you for the informative answer. I will try to get around to opening your two links. I only know how to write in languages that use an alphabet so was curious how text written in a language such as Chinese or Japanese that does not use an alphabet is sorted, assuming that it can be sorted. My friend from Beijing showed me how he uses an IME to write in Chinese on Android but I do not know enough about a language that does not use an	00:04
brolin_empey	alphabet to write in the language.	00:04
brolin_empey	He said he thinks the stroke order or number of strokes, do not remember which one he said, is used for sorting text but at least one other person I asked said they do not think text written in Chinese can be sorted. I kept meaning to try using software that supports Chinese text to sort Chinese text to see what it does but I ran out of time then forgot about it or had other, higher priority things to do.	00:08
Maxdamantus	It will likely be ad-hoc to the writing system. I'm not sure how Chinese logograms work exactly, but in general I would expect a writing system to be made of a relatively small number of primitive concepts.	00:25
Maxdamantus	eg, if you look at Hangul, you might have thousands of "characters", but each one is really just a combination of up to three primitive symbols denoting any start/middle/end sounds for a syllable.	00:26
Maxdamantus	(Japanese kana are similar, but with the exception of "-n", their syllables all consist of one vowel, possibly preceded by a consonant, so only two primitive concepts in each glyph)	00:28
L29Ah	no?	00:28
Maxdamantus	and since that combination in Japanese kana only leads to around 50 symbols (5*10), it doesn't need to be as regular as Hangul.	00:29
Maxdamantus	No what?	00:29
L29Ah	ah nvm, for the ordering reason it's ok	00:30
L29Ah	there's ゃ, ゅ and ょ to have a little fun with	00:30
L29Ah	anyway though i don't see why don't you just grab unicode code points and be done with it	00:31
Maxdamantus	Because Unicode code point ordering might not follow a well-understood pattern. It just depends on who designed the layout for that script in Unicode.	00:43
Maxdamantus	Even in Latin-based scripts, you don't have that. An obvious example would be 'ı' in Turkish.	00:43
Maxdamantus	or simply 'ü' in German.	00:44
L29Ah	i think it can even change between languages using the same character set	00:44
Maxdamantus	I imagine there are languages using Latin-based scripts that have orders that are inconsistent with English.	00:45
Maxdamantus	also, I know that in Arabic there are at least two well-known orderings of letters (one starts with "alef, ba, gim, dal" like in Greek, the other starts with "alef, ba, ta, tha")	00:46
Maxdamantus	and people use those different Arabic orders in different contexts.	00:46
* enyc meows		00:47
brolin_empey	$ cat /dev/urandom >enyc	00:47
CcxWrk	You don't sort by codepoints, there's whole Unicode Collation Algorithm: https://www.unicode.org/reports/tr10/	15:55
L29Ah	> Siniform ideographs — most notably modern CJK (Han) ideographs — and Hangul syllables are not explicitly mentioned in the default table. Ideographs are mapped to collation elements that are derived from their Unicode code point value as described in Section 10.1.3, Implicit Weights.	15:56
CcxWrk	Hm, even libicu pages on this seems to be full of TODOs http://site.icu-project.org/design/collation/script-reordering	16:02
CcxWrk	Heh and the official document on Collation points to … PowerPoint file? :]	16:05
CcxWrk	But no, we better focus on adding more emoji combinations /s	16:05
KotCzarny	sticking to those funny chars is like keeping ebcdic around	16:07
KotCzarny	sure, some legacy code uses it, but whole thing should be deprecated	16:07
L29Ah	indeed, latin should be deprecated in favour of han	16:08
KotCzarny	i think you've meant emojis	16:10
L29Ah	nah emojis are ideographs like han, they're fine	16:11

Generated by irclog2html.py 2.17.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!