aliens

Author Topic: Refactoring translation  (Read 11844 times)

Offline karvanit

  • Captain
  • ***
  • Posts: 94
    • View Profile
Refactoring translation
« on: November 20, 2012, 02:55:54 am »
I want to change the translation support but before I go through, I think it's better to ask for opinions and the general approval of the developers.
So, I am thinking of doing the following changes:
  • Create a Translator singleton. Translation is needed everywhere, no need pass an extra pointer around.
  • Allow parameters in the translated string  (see below for sample code). This will make translating whole sentences with information easier, and avoid concatenating parts in code (which is not correct form in many languages).
  • Allow translation of plural cases. This will require extra translation text in the language files.

Code comparison:
Code: (Simple case) [Select]
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));

https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));
Code: (Argument substitution) [Select]
https:// Current code
std::wsostream sstr;
std::wstring part1 = _game->getLanguage()->getText("STR_PART1");
std::wstring part2 = _game->getLanguage()->getText("STR_PART2");
sstr << part1 << someValue << part2 << someOtherValue;
obj->setText(sstr.str());

https:// New version
obj->setText(Translator::get().getText("STR_FULL_SENTENCE").arg(someValue).arg(someOtherValue));
https:// STR_FULL_SENTENCE contains %1 and %2 which get replaced.

Code: (Plural form) [Select]
https:// Current code
std::wsostream sstr;
std::wstring one = _game->getLanguage()->getText("STR_CASE_ONE");
std::wstring many = _game->getLanguage()->getText("STR_CASE_MANY");
if (someValue == 1)
  sstr << one;
else
  sstr << many;
sstr << someValue;
obj->setText(sstr.str());

https:// New version
obj->setText(Translator::get().getText("STR_SENTENCE", someValue));
https:// Language specific variants of STR_SENTENCE are searched.
https:// They contain %n which is replaced by someValue.

So, what do you think? Is this something you are willing to merge into OpenXcom?
PS: I will obviously handle all changes to existing code, and place proper strings in the English language files.
PS2: I can only guess at the proper argument placement for other languages though.

Offline kkmic

  • Commander
  • *****
  • Posts: 582
  • Undefined
    • View Profile
Re: Refactoring translation
« Reply #1 on: November 20, 2012, 12:39:46 pm »
OpenTTD has a quite ingenious system hooked up with a web translator interface that makes translating stuff quite easily.

What's interesting is the system used for plurals.

Basically, a translator translates a text like this:
Code: [Select]
This clip has {VALUE} shot{P 0 "" s} remaining
The {VALUE} gets replaced with the actual number.

The {P 0 "" s} functions as following:
  • P is a marker for "plural" (I am not sure if there are any other types of markers, I don't remember encountering any other)
  • 0 is the index of the value you want to consider when deciding if the displayed string should use plural or not (indexes are calculated from left to right, counting from 0 only the {VALUE} keys)
  • "" is a blank string (in this case, the "shot" word is already singular, so there is nothing to add there)
  • s is the text to be displayed for plural

Regarding the double quotes, the text above can be written as:
Code: [Select]
This clip has {VALUE} {P 0 shot shots} remaining
Both options will display the same text:
Code: [Select]
This clip has 1 shot remainingor
Code: [Select]
This clip has 7 shots remaining
However the first variant is preferred amongst the translators.

Now that the indications are over :), is it worth implementing such a system (no necessarily the one described by me) or something similar for OXC? I meant... there are not that many strings there...

Offline karvanit

  • Captain
  • ***
  • Posts: 94
    • View Profile
Re: Refactoring translation
« Reply #2 on: November 20, 2012, 01:37:25 pm »
The translation system I propose is based upon the API for Qt translation, although modified to use IDs instead of default strings. So, although the translation code will change, the language file can be kept with minor changes.

I now have code that compiles but is never used. Once I have replaced the old API in a few places, I'll post an update, so people can review the new system.

Offline michal

  • Commander
  • *****
  • Posts: 629
    • View Profile
Re: Refactoring translation
« Reply #3 on: November 20, 2012, 02:19:05 pm »
Code comparison:
Code: (Simple case) [Select]
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));

https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));

What about providing simple macro to make code even shorter and more readable? For example:

Code: (Simple case) [Select]
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));

https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));

https:// New version with macro
obj->setText(_T("STR_SAMPLE"));

Offline karvanit

  • Captain
  • ***
  • Posts: 94
    • View Profile
Re: Refactoring translation
« Reply #4 on: November 20, 2012, 02:28:30 pm »
I decided against doing a singleton, and modifying Language instead, so no nice macros, until Supsuper allows globals!
On the other hand, most of the code will remain the same, only the cases with arguments will (eventually) change.

Offline michal

  • Commander
  • *****
  • Posts: 629
    • View Profile
Re: Refactoring translation
« Reply #5 on: November 20, 2012, 02:41:52 pm »
You can always write _T method in State class ;) At least it will work in states :P

Offline SupSuper

  • Lazy Developer
  • Administrator
  • Commander
  • *****
  • Posts: 2160
    • View Profile
Re: Refactoring translation
« Reply #6 on: November 21, 2012, 12:22:47 am »
What about providing simple macro to make code even shorter and more readable? For example:

Code: (Simple case) [Select]
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));

https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));

https:// New version with macro
obj->setText(_T("STR_SAMPLE"));
I presume you got the idea off gettext, but using such "simple" macros can be pretty dangerous. For example _T already has a special meaning in Windows compilers: https://msdn.microsoft.com/en-us/library/dybsewaf(v=vs.80).aspx

Offline karvanit

  • Captain
  • ***
  • Posts: 94
    • View Profile
Re: Refactoring translation
« Reply #7 on: November 21, 2012, 01:01:55 pm »
Ok, I have a few questions to ask the main developers:
  • Is it acceptable to just move the translation stuff over to gettext? I mean, if I do the work, is it ever going to get merged
  • In Language::loadLng(), why are we not using std::getline(stream, string) and we read character by character? Was there a proper reason or just inertia?
  • I'm assuming Language::toHtml() is just old and rusted code, and safe to remove. No one is calling it. Do we want to keep it?
  • How much leeway is there in the Language file format? For now I have specific suffixes (_0 ... _3) and extra tags ({ALT}, {1} ... {N}).

Obviously, if gettext is acceptable, then (2) and (4) become moot.

Offline karvanit

  • Captain
  • ***
  • Posts: 94
    • View Profile
Re: Refactoring translation
« Reply #8 on: November 21, 2012, 06:50:48 pm »
I have turned it into a regular pull request, awaiting review and approval.

Most of the code is trivial changes:
  • Changed _game->getLanguage()->getString to tr if called in a State-derived class.
  • String addition is not supported, use streaming instead.

Most of the real changes are in LocalizedText, which avoids string copy as long as possible.
And I have also changed the code to output 'N day(s)' and 'Not enough _ammo_ to equip HWP.' as example / test of the new LocalizedText API.
« Last Edit: November 22, 2012, 03:21:55 pm by karvanit »

Offline SupSuper

  • Lazy Developer
  • Administrator
  • Commander
  • *****
  • Posts: 2160
    • View Profile
Re: Refactoring translation
« Reply #9 on: November 22, 2012, 08:19:57 pm »
Ok, I have a few questions to ask the main developers:
  • Is it acceptable to just move the translation stuff over to gettext? I mean, if I do the work, is it ever going to get merged
  • In Language::loadLng(), why are we not using std::getline(stream, string) and we read character by character? Was there a proper reason or just inertia?
  • I'm assuming Language::toHtml() is just old and rusted code, and safe to remove. No one is calling it. Do we want to keep it?
  • How much leeway is there in the Language file format? For now I have specific suffixes (_0 ... _3) and extra tags ({ALT}, {1} ... {N}).

Obviously, if gettext is acceptable, then (2) and (4) become moot.

1. The only reason I didn't use gettext is because it's not ID-based, which is important for OpenXcom's design.
2. Probably historical reasons, a lot of the load mechanisms are byte-based becuase X-Com formats often have specific meanings for specific bytes. I think the proper way is to just load the whole file at once (and let STL worry about buffering) and then process it.
3. It's used to generate this quick reference: https://openxcom.org/docs/language_id.html
4. As long as you don't break it. :P The format is kinda designed as we go.

Offline Hythlodaeus

  • Colonel
  • ****
  • Posts: 276
    • View Profile
Re: Refactoring translation
« Reply #10 on: November 23, 2012, 01:18:05 am »
As someone who's (still) working on a translation for OpenXcom, could you maybe explain the advantages of such system in layman's terms?

Offline karvanit

  • Captain
  • ***
  • Posts: 94
    • View Profile
Re: Refactoring translation
« Reply #11 on: November 23, 2012, 07:25:23 am »
Right now a lot of the messages are created by stringing pieces together. Using complete sentences with markers means that the translator has more control over the sentence structure, and is also able to see the context better.
eg: Instead of 'STR_NOT_ENOUGH' and 'STR_TO_ARM_HWP' we will now have 'STR_NOT_ENOUGH_ammotype_TO_ARM_HWP', with a placeholder for the ammo type text.
This means that the translator can change the whole sentence, so it can appear more natural.

Using the new system for plural forms means that we are not confined to the English rules any more, each language uses their proper forms, and, as a bonus, we get special treatment for 0. Again, more context for the translator.

In addition, the new system requires less work from the programmer, since the code does not have to string together sentences any more, or use conditions on variable values to decide dictionary keys.

Offline karvanit

  • Captain
  • ***
  • Posts: 94
    • View Profile
Re: Refactoring translation
« Reply #12 on: December 02, 2012, 10:51:53 am »
Ok, this is just a request for information, not me being pushy.
What would it take for the PR to be accepted? Are there problems / suggestions for doing things differently or not doing something?

Offline SupSuper

  • Lazy Developer
  • Administrator
  • Commander
  • *****
  • Posts: 2160
    • View Profile
Re: Refactoring translation
« Reply #13 on: December 03, 2012, 02:53:37 am »
To be honest, once I see over 3000 changes for just a small feature, I don't really have the stomach to thoroughly review it. It's probably fine, though I imagine hardcoding the syntactical differences for each language will get unwieldy, and the real work will be reviewing and changing all the language strings to actually use your new functionality.

Offline karvanit

  • Captain
  • ***
  • Posts: 94
    • View Profile
Re: Refactoring translation
« Reply #14 on: December 03, 2012, 08:42:30 am »
Indeed, the real work will be changing the way the code handles sentences, But this can incremental, once the mechanisms are in.

Most of these 3k lines are changing '_game->getLanguage()->getString' with 'tr'!