OpenXcom Forum

Contributions => Programming => Topic started by: karvanit on November 20, 2012, 02:55:54 am

Title: Refactoring translation
Post by: karvanit on November 20, 2012, 02:55:54 am
I want to change the translation support but before I go through, I think it's better to ask for opinions and the general approval of the developers.
So, I am thinking of doing the following changes:

Code comparison:
Code: (Simple case) [Select]
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));

https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));
Code: (Argument substitution) [Select]
https:// Current code
std::wsostream sstr;
std::wstring part1 = _game->getLanguage()->getText("STR_PART1");
std::wstring part2 = _game->getLanguage()->getText("STR_PART2");
sstr << part1 << someValue << part2 << someOtherValue;
obj->setText(sstr.str());

https:// New version
obj->setText(Translator::get().getText("STR_FULL_SENTENCE").arg(someValue).arg(someOtherValue));
https:// STR_FULL_SENTENCE contains %1 and %2 which get replaced.

Code: (Plural form) [Select]
https:// Current code
std::wsostream sstr;
std::wstring one = _game->getLanguage()->getText("STR_CASE_ONE");
std::wstring many = _game->getLanguage()->getText("STR_CASE_MANY");
if (someValue == 1)
  sstr << one;
else
  sstr << many;
sstr << someValue;
obj->setText(sstr.str());

https:// New version
obj->setText(Translator::get().getText("STR_SENTENCE", someValue));
https:// Language specific variants of STR_SENTENCE are searched.
https:// They contain %n which is replaced by someValue.

So, what do you think? Is this something you are willing to merge into OpenXcom?
PS: I will obviously handle all changes to existing code, and place proper strings in the English language files.
PS2: I can only guess at the proper argument placement for other languages though.
Title: Re: Refactoring translation
Post by: kkmic on November 20, 2012, 12:39:46 pm
OpenTTD has a quite ingenious system hooked up with a web translator interface that makes translating stuff quite easily.

What's interesting is the system used for plurals.

Basically, a translator translates a text like this:
Code: [Select]
This clip has {VALUE} shot{P 0 "" s} remaining
The {VALUE} gets replaced with the actual number.

The {P 0 "" s} functions as following:

Regarding the double quotes, the text above can be written as:
Code: [Select]
This clip has {VALUE} {P 0 shot shots} remaining
Both options will display the same text:
Code: [Select]
This clip has 1 shot remainingor
Code: [Select]
This clip has 7 shots remaining
However the first variant is preferred amongst the translators.

Now that the indications are over :), is it worth implementing such a system (no necessarily the one described by me) or something similar for OXC? I meant... there are not that many strings there...
Title: Re: Refactoring translation
Post by: karvanit on November 20, 2012, 01:37:25 pm
The translation system I propose is based upon the API for Qt translation, although modified to use IDs instead of default strings. So, although the translation code will change, the language file can be kept with minor changes.

I now have code that compiles but is never used. Once I have replaced the old API in a few places, I'll post an update, so people can review the new system.
Title: Re: Refactoring translation
Post by: michal on November 20, 2012, 02:19:05 pm
Code comparison:
Code: (Simple case) [Select]
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));

https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));

What about providing simple macro to make code even shorter and more readable? For example:

Code: (Simple case) [Select]
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));

https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));

https:// New version with macro
obj->setText(_T("STR_SAMPLE"));
Title: Re: Refactoring translation
Post by: karvanit on November 20, 2012, 02:28:30 pm
I decided against doing a singleton, and modifying Language instead, so no nice macros, until Supsuper allows globals!
On the other hand, most of the code will remain the same, only the cases with arguments will (eventually) change.
Title: Re: Refactoring translation
Post by: michal on November 20, 2012, 02:41:52 pm
You can always write _T method in State class ;) At least it will work in states :P
Title: Re: Refactoring translation
Post by: SupSuper on November 21, 2012, 12:22:47 am
What about providing simple macro to make code even shorter and more readable? For example:

Code: (Simple case) [Select]
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));

https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));

https:// New version with macro
obj->setText(_T("STR_SAMPLE"));
I presume you got the idea off gettext, but using such "simple" macros can be pretty dangerous. For example _T already has a special meaning in Windows compilers: https://msdn.microsoft.com/en-us/library/dybsewaf(v=vs.80).aspx
Title: Re: Refactoring translation
Post by: karvanit on November 21, 2012, 01:01:55 pm
Ok, I have a few questions to ask the main developers:

Obviously, if gettext is acceptable, then (2) and (4) become moot.
Title: Re: Refactoring translation
Post by: karvanit on November 21, 2012, 06:50:48 pm
I have turned it into a regular pull request (https://github.com/SupSuper/OpenXcom/pull/209), awaiting review and approval.

Most of the code is trivial changes:

Most of the real changes are in LocalizedText, which avoids string copy as long as possible.
And I have also changed the code to output 'N day(s)' and 'Not enough _ammo_ to equip HWP.' as example / test of the new LocalizedText API.
Title: Re: Refactoring translation
Post by: SupSuper on November 22, 2012, 08:19:57 pm
Ok, I have a few questions to ask the main developers:
  • Is it acceptable to just move the translation stuff over to gettext? I mean, if I do the work, is it ever going to get merged
  • In Language::loadLng(), why are we not using std::getline(stream, string) and we read character by character? Was there a proper reason or just inertia?
  • I'm assuming Language::toHtml() is just old and rusted code, and safe to remove. No one is calling it. Do we want to keep it?
  • How much leeway is there in the Language file format? For now I have specific suffixes (_0 ... _3) and extra tags ({ALT}, {1} ... {N}).

Obviously, if gettext is acceptable, then (2) and (4) become moot.

1. The only reason I didn't use gettext is because it's not ID-based, which is important for OpenXcom's design.
2. Probably historical reasons, a lot of the load mechanisms are byte-based becuase X-Com formats often have specific meanings for specific bytes. I think the proper way is to just load the whole file at once (and let STL worry about buffering) and then process it.
3. It's used to generate this quick reference: https://openxcom.org/docs/language_id.html
4. As long as you don't break it. :P The format is kinda designed as we go.
Title: Re: Refactoring translation
Post by: Hythlodaeus on November 23, 2012, 01:18:05 am
As someone who's (still) working on a translation for OpenXcom, could you maybe explain the advantages of such system in layman's terms?
Title: Re: Refactoring translation
Post by: karvanit on November 23, 2012, 07:25:23 am
Right now a lot of the messages are created by stringing pieces together. Using complete sentences with markers means that the translator has more control over the sentence structure, and is also able to see the context better.
eg: Instead of 'STR_NOT_ENOUGH' and 'STR_TO_ARM_HWP' we will now have 'STR_NOT_ENOUGH_ammotype_TO_ARM_HWP', with a placeholder for the ammo type text.
This means that the translator can change the whole sentence, so it can appear more natural.

Using the new system for plural forms means that we are not confined to the English rules any more, each language uses their proper forms, and, as a bonus, we get special treatment for 0. Again, more context for the translator.

In addition, the new system requires less work from the programmer, since the code does not have to string together sentences any more, or use conditions on variable values to decide dictionary keys.
Title: Re: Refactoring translation
Post by: karvanit on December 02, 2012, 10:51:53 am
Ok, this is just a request for information, not me being pushy.
What would it take for the PR to be accepted? Are there problems / suggestions for doing things differently or not doing something?
Title: Re: Refactoring translation
Post by: SupSuper on December 03, 2012, 02:53:37 am
To be honest, once I see over 3000 changes for just a small feature, I don't really have the stomach to thoroughly review it. It's probably fine, though I imagine hardcoding the syntactical differences for each language will get unwieldy, and the real work will be reviewing and changing all the language strings to actually use your new functionality.
Title: Re: Refactoring translation
Post by: karvanit on December 03, 2012, 08:42:30 am
Indeed, the real work will be changing the way the code handles sentences, But this can incremental, once the mechanisms are in.

Most of these 3k lines are changing '_game->getLanguage()->getString' with 'tr'!
Title: Re: Refactoring translation
Post by: karvanit on December 03, 2012, 12:50:35 pm
I have now added a new PR, containing just the minimum code changes, without changing all the states to use the 'tr()' functions.
Title: Re: Refactoring translation
Post by: SupSuper on December 04, 2012, 10:57:15 pm
Ok... I'm still really confused. :? Could you walk me through your thought process behind this. For example:
- Why did you encapsulate strings inside a LocalizedText?
- What's OX_REQUIRED_RESULT?
- Why should the program crash on a blank string?
- Should PluralityRules really be a programmer concern instead of a translator concern? It's gonna have to be maintained everytime a new language comes in, all this for one plural word.
Title: Re: Refactoring translation
Post by: karvanit on December 04, 2012, 11:50:28 pm
- Why did you encapsulate strings inside a LocalizedText?

Because this allows me to add the 'arg()' function and allow call chaining for argument substitution. It also makes it clear what is translated and what isn't. To take full advantage of this we should make all user-visible functions accept LocalizedText arguments so that translator-unfriendly code would trigger compilation errors. But that's a very big change for a small gain in the way the game is written right now.
The proposed code has no overhead and allows readable translation code, without changing too much. It just stops string concatenation with '+', requiring wstringstream instead. Which is good, since this indicates that proper translation-aware code should be substituted there (using LocalizedText::arg for most cases).

- What's OX_REQUIRED_RESULT?
It is hides compiler specific code to warn about unused results of "expensive" functions, eg code like
Code: [Select]
_game->getLanguage()->getString("STR_ID").arg("Example"); https://Nothing is done with this.

VS

setText(_game->getLanguage()->getString("STR_ID").arg("Example")); https:// This is ok, we are using the result.
- Why should the program crash on a blank string?
Ok, crashing is kind of overkill, but that assert only triggered when someone wanted a translation for the empty ID, so it is a logical error in the code, or in the rule set.

- Should PluralityRules really be a programmer concern instead of a translator concern? It's gonna have to be maintained everytime a new language comes in, all this for one plural word.
PluralityRules is used for the generic code to support all different types of plural forms (not just 2 as in English).
It allows the programmers to call a single function and have the translators provide the proper forms.
And the implementation is such that for most languages only a single line needs to be added (if they are the same as another supported language). Only a language that has different rules needs more code, and even then it's a simple enough class that can be written by anyone, if he/she understands the language in question (or has the rules provided by someone who does).

PS: I agree that the current game code does not allow the new translation code to show it's true potential, and it may seem overkill. But I think that using it will allow better translations as the rest of the game eventually gets re-written to take advantage of it. If we don't have the translation code, these possibilities will never be taken, and the translations will always be sub-par, no matter how hard the translators try.
Title: Re: Refactoring translation
Post by: SupSuper on December 05, 2012, 03:47:51 am
I guess that makes sense. I've always been more of a fan of simplicity over robustness, probably would've just written something like this for args and leave it at that (pseudo-code):
Code: [Select]
Language::getString(id, ...)
{
s = _strings.find(id)
va_list args;
va_start(args);
for (int i = 0; i < count; i++)
{
stringstream key << "{" << i << "}";
stringstream arg << args[i];
replace(s, key, arg);
}
va_end(args);
return s;
}

As for the empty string dilemma, I think there's cases where you just wanna add an empty string, since that "if" was added later. Like imagine you're looping through an array of IDs and adding corresponding strings to a list, and you just wanna introduce some blank lines in there, or something.