Page 2 of 3

Re: System names generation - Italian variant

Posted: Mon Aug 24, 2020 9:17 pm
by WKFO
Could help with Turkish if you are interested.

Re: System names generation - Italian variant

Posted: Wed Aug 26, 2020 2:18 pm
by testadilegno
Great :) so the main steps are:
  • Get a list of cities/towns in your language. For example I got the Italian and French ones from a site of the interior ministry of the two governments. It should be reasonably large because we need some statistics, it would be great to have at least about a thousand names.
  • Clean up the list. This is the tricky part: some names may contain geographical (valley of xxx) or other name (saint xxx in yyy). You can have a look at the names and try to figure out patterns to remove the data (we don't want to do it by hand). You can make a file, say, whenever this word is at the beginning of the list, drop it; or skip everything after this word, and so on. I can then adapt a script to do it on the full dataset, or if you want I can share my scripts so you can play with them.
  • Split into syllables: this is a pain to do by hand, and hard to program even with a grammar handbook on your desk. I found some resources online that have syllabication engines for a few languages, you should look around if some exist for your chosen one.
  • Then we split the names, and make three sets: one with all the first syllables, one with the last ones, and one with the middle ones. Then we pick the top X more common ones and it's done. This again is easy and I already got all the scripts that can do this.
Let me know if there are other questions and how you'd like to proceed.
Thanks :)

Re: System names generation - Italian variant

Posted: Wed Aug 26, 2020 5:48 pm
by WKFO
I assume we don't want special characters such as Ğ, Ş, İ, ı, Ç etc. because they will make searching systems by name hell for non-natives. If so, I will change them with G, Sh, I, i, Ch... or whatever is closest.

Re: System names generation - Italian variant

Posted: Thu Aug 27, 2020 7:32 am
by testadilegno
Honestly, I kept the French accents and "cedille" ç. If you work with utf8 files there should be no problem, so do as you seem fit (font rendering code aside). Maybe in 1200 years they will reform orthography... or not :)

Re: System names generation - Italian variant

Posted: Thu Aug 27, 2020 7:49 am
by impaktor
Question is if our font supports those characters. I guess you'll notice.

Re: System names generation - Italian variant

Posted: Thu Aug 27, 2020 10:58 am
by WKFO
My main concern is that even if our font supports it, at some point someone with a standard US English keyboard will try to make a trade run to a system that starts with Ş for example, and they won't be able to search for it.

Maybe a drop down special characters list under the search bar? Could we use this if this happens to be a rare case? It would take nearly no space when closed.

Re: System names generation - Italian variant

Posted: Thu Aug 27, 2020 1:31 pm
by nozmajner
How complex it would be to make the search function equating certain kind of letters? Like it would treat that Ğ as G for example. Hungarian names would appreciate that as well (áéíóöőűú are pretty common)

Re: System names generation - Italian variant

Posted: Thu Aug 27, 2020 1:54 pm
by WKFO
nozmajner's idea sounds more plausible

Re: System names generation - Italian variant

Posted: Thu Aug 27, 2020 6:47 pm
by impaktor
This depends on what is being discussed, I've assumed this would be a mod, e.g. for italians, or <x> with propper keyboard layout.

Anything for inclusion in master would have to be typeable and prnouncable on a standard UK/US keyboard.

Re: System names generation - Italian variant

Posted: Sat Aug 29, 2020 3:05 pm
by testadilegno
I understand @impaktor 's stance. However, the naming system requires an overhaul. If the dev team feels that the little things we're working on move in the desired direction, it might be worth putting in a little effort to get this into master.

I have little experience in the c++ side, but maybe the solution discussed here:

https://www.codeproject.com/tips/131667 ... ode-string

would be helpful.

If I understand it correctly, it would convert utf8 encoded letters to a simple standard English letter. This transliteration could be used to compare the system name with the string typed in, using a standard en-us keyboard. The modifications to the search function look minimal this way.

Let me know what's your opinion on this. Regards :)