OpenXcom Forum
Contributions => Programming => Topic started by: karvanit on November 20, 2012, 02:55:54 am
-
I want to change the translation support but before I go through, I think it's better to ask for opinions and the general approval of the developers.
So, I am thinking of doing the following changes:
- Create a Translator singleton. Translation is needed everywhere, no need pass an extra pointer around.
- Allow parameters in the translated string (see below for sample code). This will make translating whole sentences with information easier, and avoid concatenating parts in code (which is not correct form in many languages).
- Allow translation of plural cases. This will require extra translation text in the language files.
Code comparison:
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));
https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));
https:// Current code
std::wsostream sstr;
std::wstring part1 = _game->getLanguage()->getText("STR_PART1");
std::wstring part2 = _game->getLanguage()->getText("STR_PART2");
sstr << part1 << someValue << part2 << someOtherValue;
obj->setText(sstr.str());
https:// New version
obj->setText(Translator::get().getText("STR_FULL_SENTENCE").arg(someValue).arg(someOtherValue));
https:// STR_FULL_SENTENCE contains %1 and %2 which get replaced.
https:// Current code
std::wsostream sstr;
std::wstring one = _game->getLanguage()->getText("STR_CASE_ONE");
std::wstring many = _game->getLanguage()->getText("STR_CASE_MANY");
if (someValue == 1)
sstr << one;
else
sstr << many;
sstr << someValue;
obj->setText(sstr.str());
https:// New version
obj->setText(Translator::get().getText("STR_SENTENCE", someValue));
https:// Language specific variants of STR_SENTENCE are searched.
https:// They contain %n which is replaced by someValue.
So, what do you think? Is this something you are willing to merge into OpenXcom?
PS: I will obviously handle all changes to existing code, and place proper strings in the English language files.
PS2: I can only guess at the proper argument placement for other languages though.
-
OpenTTD has a quite ingenious system hooked up with a web translator interface that makes translating stuff quite easily.
What's interesting is the system used for plurals.
Basically, a translator translates a text like this:
This clip has {VALUE} shot{P 0 "" s} remaining
The {VALUE} gets replaced with the actual number.
The {P 0 "" s} functions as following:
- P is a marker for "plural" (I am not sure if there are any other types of markers, I don't remember encountering any other)
- 0 is the index of the value you want to consider when deciding if the displayed string should use plural or not (indexes are calculated from left to right, counting from 0 only the {VALUE} keys)
- "" is a blank string (in this case, the "shot" word is already singular, so there is nothing to add there)
- s is the text to be displayed for plural
Regarding the double quotes, the text above can be written as:
This clip has {VALUE} {P 0 shot shots} remaining
Both options will display the same text:
This clip has 1 shot remaining
or
This clip has 7 shots remaining
However the first variant is preferred amongst the translators.
Now that the indications are over :), is it worth implementing such a system (no necessarily the one described by me) or something similar for OXC? I meant... there are not that many strings there...
-
The translation system I propose is based upon the API for Qt translation, although modified to use IDs instead of default strings. So, although the translation code will change, the language file can be kept with minor changes.
I now have code that compiles but is never used. Once I have replaced the old API in a few places, I'll post an update, so people can review the new system.
-
Code comparison:
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));
https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));
What about providing simple macro to make code even shorter and more readable? For example:
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));
https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));
https:// New version with macro
obj->setText(_T("STR_SAMPLE"));
-
I decided against doing a singleton, and modifying Language instead, so no nice macros, until Supsuper allows globals!
On the other hand, most of the code will remain the same, only the cases with arguments will (eventually) change.
-
You can always write _T method in State class ;) At least it will work in states :P
-
What about providing simple macro to make code even shorter and more readable? For example:
https:// Current code
obj->setText(_game->getLanguage()->getText("STR_SAMPLE"));
https:// New version
obj->setText(Translator::get().getText("STR_SAMPLE"));
https:// New version with macro
obj->setText(_T("STR_SAMPLE"));
I presume you got the idea off gettext, but using such "simple" macros can be pretty dangerous. For example _T already has a special meaning in Windows compilers: https://msdn.microsoft.com/en-us/library/dybsewaf(v=vs.80).aspx
-
Ok, I have a few questions to ask the main developers:
- Is it acceptable to just move the translation stuff over to gettext? I mean, if I do the work, is it ever going to get merged
- In Language::loadLng(), why are we not using std::getline(stream, string) and we read character by character? Was there a proper reason or just inertia?
- I'm assuming Language::toHtml() is just old and rusted code, and safe to remove. No one is calling it. Do we want to keep it?
- How much leeway is there in the Language file format? For now I have specific suffixes (_0 ... _3) and extra tags ({ALT}, {1} ... {N}).
Obviously, if gettext is acceptable, then (2) and (4) become moot.
-
I have turned it into a regular pull request (https://github.com/SupSuper/OpenXcom/pull/209), awaiting review and approval.
Most of the code is trivial changes:
- Changed _game->getLanguage()->getString to tr if called in a State-derived class.
- String addition is not supported, use streaming instead.
Most of the real changes are in LocalizedText, which avoids string copy as long as possible.
And I have also changed the code to output 'N day(s)' and 'Not enough _ammo_ to equip HWP.' as example / test of the new LocalizedText API.
-
Ok, I have a few questions to ask the main developers:
- Is it acceptable to just move the translation stuff over to gettext? I mean, if I do the work, is it ever going to get merged
- In Language::loadLng(), why are we not using std::getline(stream, string) and we read character by character? Was there a proper reason or just inertia?
- I'm assuming Language::toHtml() is just old and rusted code, and safe to remove. No one is calling it. Do we want to keep it?
- How much leeway is there in the Language file format? For now I have specific suffixes (_0 ... _3) and extra tags ({ALT}, {1} ... {N}).
Obviously, if gettext is acceptable, then (2) and (4) become moot.
1. The only reason I didn't use gettext is because it's not ID-based, which is important for OpenXcom's design.
2. Probably historical reasons, a lot of the load mechanisms are byte-based becuase X-Com formats often have specific meanings for specific bytes. I think the proper way is to just load the whole file at once (and let STL worry about buffering) and then process it.
3. It's used to generate this quick reference: https://openxcom.org/docs/language_id.html
4. As long as you don't break it. :P The format is kinda designed as we go.
-
As someone who's (still) working on a translation for OpenXcom, could you maybe explain the advantages of such system in layman's terms?
-
Right now a lot of the messages are created by stringing pieces together. Using complete sentences with markers means that the translator has more control over the sentence structure, and is also able to see the context better.
eg: Instead of 'STR_NOT_ENOUGH' and 'STR_TO_ARM_HWP' we will now have 'STR_NOT_ENOUGH_ammotype_TO_ARM_HWP', with a placeholder for the ammo type text.
This means that the translator can change the whole sentence, so it can appear more natural.
Using the new system for plural forms means that we are not confined to the English rules any more, each language uses their proper forms, and, as a bonus, we get special treatment for 0. Again, more context for the translator.
In addition, the new system requires less work from the programmer, since the code does not have to string together sentences any more, or use conditions on variable values to decide dictionary keys.
-
Ok, this is just a request for information, not me being pushy.
What would it take for the PR to be accepted? Are there problems / suggestions for doing things differently or not doing something?
-
To be honest, once I see over 3000 changes for just a small feature, I don't really have the stomach to thoroughly review it. It's probably fine, though I imagine hardcoding the syntactical differences for each language will get unwieldy, and the real work will be reviewing and changing all the language strings to actually use your new functionality.
-
Indeed, the real work will be changing the way the code handles sentences, But this can incremental, once the mechanisms are in.
Most of these 3k lines are changing '_game->getLanguage()->getString' with 'tr'!
-
I have now added a new PR, containing just the minimum code changes, without changing all the states to use the 'tr()' functions.
-
Ok... I'm still really confused. :? Could you walk me through your thought process behind this. For example:
- Why did you encapsulate strings inside a LocalizedText?
- What's OX_REQUIRED_RESULT?
- Why should the program crash on a blank string?
- Should PluralityRules really be a programmer concern instead of a translator concern? It's gonna have to be maintained everytime a new language comes in, all this for one plural word.
-
- Why did you encapsulate strings inside a LocalizedText?
Because this allows me to add the 'arg()' function and allow call chaining for argument substitution. It also makes it clear what is translated and what isn't. To take full advantage of this we should make all user-visible functions accept LocalizedText arguments so that translator-unfriendly code would trigger compilation errors. But that's a very big change for a small gain in the way the game is written right now.
The proposed code has no overhead and allows readable translation code, without changing too much. It just stops string concatenation with '+', requiring wstringstream instead. Which is good, since this indicates that proper translation-aware code should be substituted there (using LocalizedText::arg for most cases).
- What's OX_REQUIRED_RESULT?
It is hides compiler specific code to warn about unused results of "expensive" functions, eg code like
_game->getLanguage()->getString("STR_ID").arg("Example"); https://Nothing is done with this.
VS
setText(_game->getLanguage()->getString("STR_ID").arg("Example")); https:// This is ok, we are using the result.
- Why should the program crash on a blank string?
Ok, crashing is kind of overkill, but that assert only triggered when someone wanted a translation for the empty ID, so it is a logical error in the code, or in the rule set.
- Should PluralityRules really be a programmer concern instead of a translator concern? It's gonna have to be maintained everytime a new language comes in, all this for one plural word.
PluralityRules is used for the generic code to support all different types of plural forms (not just 2 as in English).
It allows the programmers to call a single function and have the translators provide the proper forms.
And the implementation is such that for most languages only a single line needs to be added (if they are the same as another supported language). Only a language that has different rules needs more code, and even then it's a simple enough class that can be written by anyone, if he/she understands the language in question (or has the rules provided by someone who does).
PS: I agree that the current game code does not allow the new translation code to show it's true potential, and it may seem overkill. But I think that using it will allow better translations as the rest of the game eventually gets re-written to take advantage of it. If we don't have the translation code, these possibilities will never be taken, and the translations will always be sub-par, no matter how hard the translators try.
-
I guess that makes sense. I've always been more of a fan of simplicity over robustness, probably would've just written something like this for args and leave it at that (pseudo-code):
Language::getString(id, ...)
{
s = _strings.find(id)
va_list args;
va_start(args);
for (int i = 0; i < count; i++)
{
stringstream key << "{" << i << "}";
stringstream arg << args[i];
replace(s, key, arg);
}
va_end(args);
return s;
}
As for the empty string dilemma, I think there's cases where you just wanna add an empty string, since that "if" was added later. Like imagine you're looping through an array of IDs and adding corresponding strings to a list, and you just wanna introduce some blank lines in there, or something.