aliens

Author Topic: Converting OXCE to rapidyaml  (Read 3079 times)

Offline Delian

  • Colonel
  • ****
  • Posts: 470
    • View Profile
Converting OXCE to rapidyaml
« on: October 21, 2024, 09:41:19 pm »
Everyone knows that saving and loading games in OXC and OXCE gets progressively slow as the game goes on. Well, it's not much of a problem in OXC, because vanilla games aren't large and don't last that long, so the issue doesn't become super noticeable. Since this isn't an OXC issue, it doesn't make much sense to fix it there.

Anyway, the main culprit is the yaml-cpp library that the game uses. The simplest solution is to ditch that library and use a faster one. So, I've decided to take on a monumental task of converting OXCE from yaml-cpp to rapidyaml.

I'll use this thread to document some of the observations and pitfalls I've encountered during the coding.

Offline Delian

  • Colonel
  • ****
  • Posts: 470
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #1 on: October 21, 2024, 09:41:40 pm »
Let's start with a simple parsing speed demo of a 5.8MB .rul file 4 times in debug mode
With yaml-cpp:
Code: [Select]
auto start = high_resolution_clock::now();
for (size_t i = 0; i < 4; i++)
{
auto stream = CrossPlatform::readFile("C:\\Dioxine_XPiratez\\user\\mods\\Piratez\\Ruleset\\Piratez.rul");
YAML::Node doc = YAML::Load(*stream);
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
duration: 216.2 seconds (216,277,714 microseconds)

With rapidyaml:
Code: [Select]
auto start = high_resolution_clock::now();
for (size_t i = 0; i < 4; i++)
{
size_t size;
char* data = CrossPlatform::ReadFileRaw("C:\\Dioxine_XPiratez\\user\\mods\\Piratez\\Ruleset\\Piratez.rul", &size);
ryml::Tree tree = ryml::parse_in_arena(ryml::csubstr(data, size)); //copy data to internal data buffer (arena) and parse it
free(data);
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
duration: 3.2 seconds (3,246,784 microseconds)

67 times faster, not bad.
« Last Edit: October 21, 2024, 09:46:12 pm by Delian »

Offline Yankes

  • Moderator
  • Commander
  • ***
  • Posts: 3331
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #2 on: October 21, 2024, 10:15:49 pm »
I would like to speed up load & save but yaml-cpp is bit entrenched in OXC.
Loading some rules sets in theory would be easy to swap to different "provider" but there some classes
that have custom converters. Another is that rule sets some times copy yaml-cpp nodes and keep them longer
after rule loading. It could be required lot of work to keep current sematic of yaml-cpp when using faster parser.

Another question, do `ryml` load all nodes? or do it lazy? only when you request data from it?
yaml-cpp is eager in converting file into nodes in memory.

Offline Delian

  • Colonel
  • ****
  • Posts: 470
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #3 on: October 21, 2024, 10:33:33 pm »
I would like to speed up load & save but yaml-cpp is bit entrenched in OXC.
Loading some rules sets in theory would be easy to swap to different "provider" but there some classes
that have custom converters. Another is that rule sets some times copy yaml-cpp nodes and keep them longer
after rule loading. It could be required lot of work to keep current sematic of yaml-cpp when using faster parser.

Another question, do `ryml` load all nodes? or do it lazy? only when you request data from it?
yaml-cpp is eager in converting file into nodes in memory.
>>keeping the same semantics
I've completely abandoned the thought of keeping the yaml-cpp semantics, I'm just going to replace all the code that interacts with yaml objects. There's just too many differences in the APIs. Most yaml-cpp function return a Node object, but most rapidyaml do not, and instead require passing objects as pointers through function calls.

>>custom converters
For converters, here's an example:
Before:
Code: [Select]
template<>
struct convert<OpenXcom::UnitStats>
{
static bool decode(const Node& node, OpenXcom::UnitStats& rhs)
{
if (!node.IsMap())
return false;
rhs.tu = node["tu"].as<int>(rhs.tu);
rhs.stamina = node["stamina"].as<int>(rhs.stamina);
rhs.health = node["health"].as<int>(rhs.health);
rhs.bravery = node["bravery"].as<int>(rhs.bravery);
rhs.reactions = node["reactions"].as<int>(rhs.reactions);
rhs.firing = node["firing"].as<int>(rhs.firing);
rhs.throwing = node["throwing"].as<int>(rhs.throwing);
rhs.strength = node["strength"].as<int>(rhs.strength);
rhs.psiStrength = node["psiStrength"].as<int>(rhs.psiStrength);
rhs.psiSkill = node["psiSkill"].as<int>(rhs.psiSkill);
rhs.melee = node["melee"].as<int>(rhs.melee);
rhs.mana = node["mana"].as<int>(rhs.mana);
return true;
}
static Node encode(const OpenXcom::UnitStats& rhs)
{
Node node;
node["tu"] = rhs.tu;
node["stamina"] = rhs.stamina;
node["health"] = rhs.health;
node["bravery"] = rhs.bravery;
node["reactions"] = rhs.reactions;
node["firing"] = rhs.firing;
node["throwing"] = rhs.throwing;
node["strength"] = rhs.strength;
node["psiStrength"] = rhs.psiStrength;
node["psiSkill"] = rhs.psiSkill;
node["melee"] = rhs.melee;
node["mana"] = rhs.mana;
return node;
}
};

After:
Code: [Select]
//overload for parsing UnitStats
bool read(ryml::ConstNodeRef const& n, UnitStats* val)
{
// because the order is always the same, we can safely use next_subling, which is fast
auto node = n.first_child(); node >> val->tu;
node = node.next_sibling(); node >> val->stamina;
node = node.next_sibling(); node >> val->health;
node = node.next_sibling(); node >> val->bravery;
node = node.next_sibling(); node >> val->reactions;
node = node.next_sibling(); node >> val->firing;
node = node.next_sibling(); node >> val->throwing;
node = node.next_sibling(); node >> val->strength;
node = node.next_sibling(); node >> val->psiStrength;
node = node.next_sibling(); node >> val->psiSkill;
node = node.next_sibling(); node >> val->melee;
node = node.next_sibling(); node >> val->mana;
return true;
}

void write(ryml::NodeRef* n, UnitStats const& val)
{
*n |= ryml::MAP;
n->append_child() << ryml::key("tu") << val.tu;
n->append_child() << ryml::key("stamina") << val.stamina;
n->append_child() << ryml::key("health") << val.health;
n->append_child() << ryml::key("bravery") << val.bravery;
n->append_child() << ryml::key("reactions") << val.reactions;
n->append_child() << ryml::key("firing") << val.firing;
n->append_child() << ryml::key("throwing") << val.throwing;
n->append_child() << ryml::key("strength") << val.strength;
n->append_child() << ryml::key("psiStrength") << val.psiStrength;
n->append_child() << ryml::key("psiSkill") << val.psiSkill;
n->append_child() << ryml::key("melee") << val.melee;
n->append_child() << ryml::key("mana") << val.mana;
}

>>copy yaml-cpp nodes and keep them longer
That's a problem, because nodes in rapidyaml are just pointers to tree data. You can't copy them or move them around because they don't hold any data. If you tried to copy anything, you'd have to copy all the data - the whole tree hehe. Don't worry, if it becomes a problem I'll fix it!

>do `ryml` load all nodes
The file load functions that we use already load all the data from hard drive into memory, the whole file, so the streams that we use aren't lazy-loaded. If you mean parsing the said string data, it's lazy-loaded. Converting from string to int only happens on demand, like in the above example, with the >> operator.

Btw, rapidyaml uses two different types of objects, depending if you want to just read, or if you want to write. For reading, it's the read-only node called ryml::ConstNodeRef, but for writing it's ryml::NodeRef. You can't convert them from one to another, so it makes spotting errors a lot easier. However, it's incompatible with yaml-cpp, because yaml-cpp doesn't distinct between read-only and read-write nodes.

Hmm, anyway.
If I want to add a wrapper a library, where to put it? How to name it? For now, instead of #including rapidyaml directly, I'm including #rapidyamlwrapper.h (which I put in /src) so that I can put some defines and pragmas in there, but I wonder what's the normal practice...

In the CrossPlatform.cpp I had to move most of the code from readFile to a new readFileRaw function. That's because I don't want an istream to the yaml data, I just want the data, and it's slow converting from stream back to data. So readFile calls readFileRaw and wraps it into stream. But for parsing yaml I'll just be calling readFileRaw directly.
« Last Edit: October 21, 2024, 10:38:10 pm by Delian »

Offline Delian

  • Colonel
  • ****
  • Posts: 470
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #4 on: October 21, 2024, 11:06:04 pm »
Some common differences between yaml-cpp and rapidyaml:

yaml-cpp:
YAML::Node childNode = node["key"];
if (childNode) {
    doStuff(childNode);
}


rapidyaml:
// ryml::ConstNodeRef childNode = node["key"]; //crashes if key doesn't exist!
ryml::ConstNodeRef childNode = node.find_child("key"); //doesn't crash if node doesn't exist, instead returns an invalid node
if (!childNode.invalid()) { //you also can't do "if (node)"
    doStuff(childNode);
}


or
if (node.has_child("key")) { //this is a slower option because you search for the key twice, and searching children keys is slow in large objects - O(n)
    doStuff(node["key"]);
}
« Last Edit: November 09, 2024, 09:02:16 pm by Delian »

Offline Yankes

  • Moderator
  • Commander
  • ***
  • Posts: 3331
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #5 on: October 21, 2024, 11:23:05 pm »
converting
Code: [Select]
rhs.tu = node["tu"].as<int>(rhs.tu);
into
Code: [Select]
node = node.next_sibling(); node >> val->stamina;
is nonstarter. Nowhere we guarantee order or existence of nodes.
Many save objects save optional values only if they do not have default values.
Even if current code work fine, you need consider different versions (past and future).
If we make loading too fast then changing format become hard as is in binary files.

For code, "shape" its irreverent, I many times use helpers like `mod->load*` to override yaml-cpp behavior.
If you could replace all with `load(node, "name", value);` I would be for it.

Best case could be your pull request: https://github.com/MeridianOXC/OpenXcom/pull/132
instead of using `to_string` each time you want avoid slow yaml-cpp load we should add dedicated loan & save functions
that use most optimal code (like `_vec = node["list"].as<std::vector<int>>(_vec)` copy all data each time is called, I try avoid code like this)
This will improve current loading speed and allow future changes.

One place where I did things like this, was `LoadYaml.h` and `Mod.h`. Probably better is first case as more general.

I think probably best way I would see to start all this is add new layer of indirection.
Create `OpenXcom::Yaml` and it will now store `YAML::Node` but then it could be easy `#ifdef` to store `ryml::NodeRef` or `ryml::ConstNodeRef`
(we could make dedicated objects for save and load to better match ryml behavior) .



Btw there are other yaml libs, could have less performance as rapidyaml but could have closer semantic to yaml and easier to swich

Offline Yankes

  • Moderator
  • Commander
  • ***
  • Posts: 3331
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #6 on: October 22, 2024, 12:29:56 am »
Some common differences between yaml-cpp and rapidyaml:

yaml-cpp:
YAML::Node childNode = node["key"];
if (childNode) {
    doStuff(childNode);
}


rapidyaml:
// ryml::ConstNodeRef childNode = node["key"]; //crashes if key doesn't exist!
ryml::ConstNodeRef childNode = node.find_child("key"); //doesn't crash if node doesn't exist, instead returns an invalid node
if (!childNode.invalid()) { //you also can't do "if (node)" because it's a struct, not a class
    doStuff(childNode);
}


or
if (node.has_child("key")) { //this is a slower option because you search for the key twice, and searching children keys is slow in large objects - O(n)
    doStuff(node["key"]);
}


I think lot better would be wrapper `OpenXcom::Yaml`.
In case of rapid it could store first, last and current nodes. each time you check for `"X"` it check if its current node, if not it  loop `next_sibling()` to next one and check if its same, this loop until it reach end, when its reach end, it start from beginning unit reach previous current node.
This reduce cost to one `O(n)` and in "best" case it could have cost `O(1)` if loading order is same as saving.
For rulesets we could make "random" version that allocate flat buffer that sort all nodes based on it.

Offline Delian

  • Colonel
  • ****
  • Posts: 470
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #7 on: October 22, 2024, 01:43:10 am »
rapidyaml has difficulties with Enums.

You can do
node << (int)myEnumVar;

but you can't do:
node >> (int)myEnumVar;

Or maybe I'm stupid and I just don't know how lol... only thing I figured out is:
int temp;
node >> temp;
myEnumVar = (MyEnum)temp;


So for now I'm just specifying a custom read overload:
bool read(ryml::ConstNodeRef const& n, MyEnum* val){ int temp; n >> temp; *val = (MyEnum)temp; return true; }
and then "node >> myEnumVar"; works fine

rapidyaml already has a read function defined for all integral types, also for vector and map, so parsing most types is easy. But for custom types like enums, you gotta specify your own read(), and maybe write overload.

>next_sibling() is nonstarter
Sorry, this was one of the exceptional places where I used next_sibling(), I don't normally use it. Forget you saw that! I'll change it to use keys instead.

>>Many save objects save optional values only if they do not have default values.
Yeah, don't worry about it. I'm careful, which is why converting to rapidyaml is taking me so long. I've already been working on it for two weeks!

BattleUnit::load()
Before:
Code: [Select]
void BattleUnit::load(const YAML::Node &node, const Mod *mod, const ScriptGlobal *shared)
{
_id = node["id"].as<int>(_id);
_faction = (UnitFaction)node["faction"].as<int>(_faction);
_status = (UnitStatus)node["status"].as<int>(_status);
_wantsToSurrender = node["wantsToSurrender"].as<bool>(_wantsToSurrender);
_isSurrendering = node["isSurrendering"].as<bool>(_isSurrendering);
_pos = node["position"].as<Position>(_pos);
_direction = _toDirection = node["direction"].as<int>(_direction);
_directionTurret = _toDirectionTurret = node["directionTurret"].as<int>(_directionTurret);
_tu = node["tu"].as<int>(_tu);
_health = node["health"].as<int>(_health);
_mana = node["mana"].as<int>(_mana);
        ...
After:
Code: [Select]
void BattleUnit::load(const ryml::ConstNodeRef &node, const Mod *mod, const ScriptGlobal *shared)
{
const ryml::Tree tree = *node.tree();
std::unordered_map<ryml::csubstr, ryml::id_type> index(node.num_children()); //build and use an index to avoid [] operator's O(n) complexity
for (const ryml::ConstNodeRef& childNode : node.children())
index[childNode.key()] = childNode.id();

tree.cref(index["id"]) >> _id;
tree.cref(index["faction"]) >> _faction;
tree.cref(index["status"]) >> _status;
if (index.count("wantsToSurrender"))
tree.cref(index["wantsToSurrender"]) >> _wantsToSurrender;
if (index.count("isSurrendering"))
tree.cref(index["isSurrendering"]) >> _isSurrendering;
tree.cref(index["position"]) >> _pos;
tree.cref(index["direction"]) >> _direction;
_toDirection = _direction;
tree.cref(index["directionTurret"]) >> _directionTurret;
_toDirectionTurret = _directionTurret;
tree.cref(index["tu"]) >> _tu;
tree.cref(index["health"]) >> _health;
tree.cref(index["mana"]) >> _mana;
        ...
I checked out BattleUnit::save() to find out which keys are only saved if not default value, so I know which keys I have to be careful about. It's a lot of work...

>Even if current code work fine, you need consider different versions (past and future).
I'm being as careful as I can be~
The real test will be after I'm finished. I'll make sure that all the mods and saved games serialize and deserialize correctly, so they produce identical files.

>If we make loading too fast then changing format become hard as is in binary files.
Haha, I know what you mean. While I am trying to be optimize things here and there, I'm not going to sacrifice readability (ignore that next_sibling case). The code won't look worse than it is after I'm done, I promise!

>If you could replace all with `load(node, "name", value);` I would be for it.
I would be for it also, but, right now there is no abstraction layer. Creating an abstraction layer would be a lot of work, but then it would be even more work to convert all the code to use the abstraction layer. Basically, converting to abstraction layer is more work than direct conversion to rapidyaml.

>Best case could be your pull request: https://github.com/MeridianOXC/OpenXcom/pull/132
I've closed that pull request because it won't matter after I'm done with this conversion.

>that use most optimal code
Even the most optimal code with yaml-cpp is very slow, so any possible improvements are marginal at best. If you'd seen the internal code of rapidyaml, you'd know what I'm talking about. For instance, ryml::to_chars() is faster than even std::to_chars()

>I think probably best way I would see to start all this is add new layer of indirection.
I agree with you, that conversion to an common API would be the best but, that would be a lot of extra work.
Also, I'm already over half way done! In the beginning I was only trying out stuff to see how compatible the libraries were, but I got pretty absorbed into the coding. So the best I can do is finish a direct conversion before I lose the will to code.

>Btw there are other yaml libs, could have less performance as rapidyaml but could have closer semantic to yaml and easier to swich
I couldn't find any. Either they were similar to rapidyaml, or they had bad performance, or no performance data. It seems like writing a yaml library is something people did as a high school project. If the library works, you got a good grade in school!

I think lot better would be wrapper `OpenXcom::Yaml`.
In case of rapid it could store first, last and current nodes. each time you check for `"X"` it check if its current node, if not it  loop `next_sibling()` to next one and check if its same, this loop until it reach end, when its reach end, it start from beginning unit reach previous current node.
This reduce cost to one `O(n)` and in "best" case it could have cost `O(1)` if loading order is same as saving.
For rulesets we could make "random" version that allocate flat buffer that sort all nodes based on it.

That's a pretty nifty algorithm. But what if we can guarantee O(1)? In the above code example I loop through children to build an index, and then all use it to get O(1). This is what rapidyaml author wrote we should do when we want fast access. I only build index for large objects tho due to overhead. In general rapidyaml has a lot of options to speed up parsing, reusing parsers, reusing trees and buffers... stuff that is specific to rapidyaml which can't be used with an abstraction layer.

Offline Delian

  • Colonel
  • ****
  • Posts: 470
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #8 on: October 22, 2024, 02:06:25 am »
Some more yaml-cpp differences:

rapidyaml works on the basis of "no extra luggage".
Features that would slow down parsing or serializing are disabled by default, and you only enable them if you need them. Of course, enabling them slows down parsing, which is why they're best left disabled until necessary.

One of such features is position-in-document.
In yaml-cpp:
Log(LOG_ERROR) << "Error in line " << node.Mark().line;
In rapidyaml:
? ? ?
Nodes don't keep locations of where a specific node is in a document. Which line and column certain node is at. That's parser's job. But we don't have the parser anymore, and even if we had it, it parsed the original yaml without location logging enabled to speed up the parsing. So the solutions is:
Code: [Select]
// reparse the original yaml but with locations data enabled
ryml::csubstr yaml = node.tree()->arena();
ryml::EventHandlerTree handler = {};
ryml::Parser parser(&handler, ryml::ParserOptions().locations(true));
const ryml::Tree newTree = ryml::parse_in_arena(&parser, yaml);
ryml::Location loc = parser.location(newTree, node.id());
Log(LOG_ERROR) << "Error in line " << loc.line;
The good news is that errors are exceptional behavior, and after the error the game will close anyway, so it doesn't matter if we spend a bit of time parsing the yaml to find out the offending location. Other than that, we generally don't need to know location of nodes in yaml document.

Offline Yankes

  • Moderator
  • Commander
  • ***
  • Posts: 3331
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #9 on: October 22, 2024, 11:55:46 am »
Some more yaml-cpp differences:

rapidyaml works on the basis of "no extra luggage".
Features that would slow down parsing or serializing are disabled by default, and you only enable them if you need them. Of course, enabling them slows down parsing, which is why they're best left disabled until necessary.

One of such features is position-in-document.
In yaml-cpp:
Log(LOG_ERROR) << "Error in line " << node.Mark().line;
In rapidyaml:
? ? ?
Nodes don't keep locations of where a specific node is in a document. Which line and column certain node is at. That's parser's job. But we don't have the parser anymore, and even if we had it, it parsed the original yaml without location logging enabled to speed up the parsing. So the solutions is:
Code: [Select]
// reparse the original yaml but with locations data enabled
ryml::csubstr yaml = node.tree()->arena();
ryml::EventHandlerTree handler = {};
ryml::Parser parser(&handler, ryml::ParserOptions().locations(true));
const ryml::Tree newTree = ryml::parse_in_arena(&parser, yaml);
ryml::Location loc = parser.location(newTree, node.id());
Log(LOG_ERROR) << "Error in line " << loc.line;
The good news is that errors are exceptional behavior, and after the error the game will close anyway, so it doesn't matter if we spend a bit of time parsing the yaml to find out the offending location. Other than that, we generally don't need to know location of nodes in yaml document.
Problem is we need know it, maybe not in save file but in rulefile is critical, without it any error reporting is pointless, consider mod like XPZ and now get errors from file that have 10000 lines, can you say what `prop:` break mod?

Code: [Select]
I agree with you, that conversion to an common API would be the best but, that would be a lot of extra work.
Also, I'm already over half way done! In the beginning I was only trying out stuff to see how compatible the libraries were, but I got pretty absorbed into the coding. So the best I can do is finish a direct conversion before I lose the will to code.
Yes, this will take more time but in long run it will take less time.
You can not even try make common API, only hide most of boilerplate needed by rapid, and more impartiality hide complexity and error handling.

Best example is your code:
Code: [Select]

const ryml::Tree tree = *node.tree();
std::unordered_map<ryml::csubstr, ryml::id_type> index(node.num_children()); //build and use an index to avoid [] operator's O(n) complexity
for (const ryml::ConstNodeRef& childNode : node.children())
index[childNode.key()] = childNode.id();

tree.cref(index["id"]) >> _id;
tree.cref(index["faction"]) >> _faction;
tree.cref(index["status"]) >> _status;
if (index.count("wantsToSurrender"))
tree.cref(index["wantsToSurrender"]) >> _wantsToSurrender;
if (index.count("isSurrendering"))
tree.cref(index["isSurrendering"]) >> _isSurrendering;
tree.cref(index["position"]) >> _pos;
tree.cref(index["direction"]) >> _direction;
_toDirection = _direction;
tree.cref(index["directionTurret"]) >> _directionTurret;
_toDirectionTurret = _directionTurret;
tree.cref(index["tu"]) >> _tu;
tree.cref(index["health"]) >> _health;
tree.cref(index["mana"]) >> _mana;

when you try load save from OXC then line like `tree.cref(index["mana"])` how will behave? probably you should use `if (index.count("wantsToSurrender"))` every where to be 100% sure we do not crash here.
This make whole code mess and harder to maintain, but if you add new class `YamlReader` that hide all this complexity you could speed up lot of changes in code:

Code: [Select]
void BattleUnit::load(const ryml::ConstNodeRef &nodeRyml, const Mod *mod, const ScriptGlobal *shared)
{
    YamlReader node{ nodeRyml }; //make all allocations and other complex operations

    node.load("id", _id);
    node.load("faction", _faction);
    node.load("status", _status);
    node.load("wantsToSurrender", _wantsToSurrender);
    ...
}
less code to type, and better error handling (like someone put `"abc"` instead of `123`).
This will require less typing, you could use regexp to mass update files or some CodePilot or some thing like this to speed up updates.
This is because you do not need to think what is required or not (this could change too in future!), yes this will cost more but
if we ditch yaml-cpp and reduce time by 99% then increase it by 5 times for error handling and comparability, then will still have net grain as it will be overall 95% reduction.

Overall I look forward to see overall results, how much better OXCE will behave after your change.

Offline Delian

  • Colonel
  • ****
  • Posts: 470
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #10 on: October 22, 2024, 01:46:43 pm »
Ok. In what file should I put YamlReader class, and other yaml stuff? \Engine\Yaml.cpp ?

Offline Yankes

  • Moderator
  • Commander
  • ***
  • Posts: 3331
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #11 on: October 22, 2024, 01:48:56 pm »
Yes, this is good place for it.

[ps]

I asked Meridian what his option about changing yaml parser, and he said he do not opposite it as long it will work correctly on all supported platforms.
« Last Edit: October 22, 2024, 01:52:14 pm by Yankes »

Offline Delian

  • Colonel
  • ****
  • Posts: 470
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #12 on: October 23, 2024, 05:35:51 pm »
I learned a bit more of C++ and I was able to find a solution to the enums problem, so now I don't need to add overload for every Enum we have, yay
Code: [Select]
// Deserialization template for enums. We have to overload from_chars() instead of read(), because read() already has a template for all types which leads to template conflict
template <typename EnumType>
typename std::enable_if<std::is_enum<EnumType>::value, bool>::type inline from_chars(ryml::csubstr buf, EnumType* v) noexcept
{
return ryml::atoi(buf, (int*)v);
}

>keeping yaml nodes longer
For the problem with copying nodes and keeping them longer, I've decided to convert those nodes back to strings. So for instance, Unit._spawnedSoldier is now std::string instead of YAML::Node. So when we want to use the spawnedSoldierTemplate, the string is parsed to nodes as necessary.
If you want, I can keep ryml::Tree (generated from the node we want to keep) instead of std::string, so performance is better because the string doesn't need to be parsed whenever the spawned soldier template is used. However, I think that those templates are rarely used, and parsing with ryml is fast, so keeping strings is safer.

>errors from file that have 10000 lines
Hmm, the largest .rul file, Piratez.rul has 230k lines so reparsing it could take a long time...
You wanna know how long it takes? 40ms in release build.
Parsing with line logging enabled takes 50ms. So in the worst case, if someone crashes because of an error in Piratez.rul, they will have to wait 50 milliseconds extra due to reparsing.
I think the reparsing is an ok solution because people normally don't crash due to errors in .rul files. Mod authors don't provide users with faulty mod files~

>YamlReader
For now, the normal code usage looks like this:
Code: [Select]
void BattleUnit::load(const ryml::ConstNodeRef &node, const Mod *mod, const ScriptGlobal *shared)
{
YAML::YamlNodeReader reader(node, true); //useIndex = true
reader.tryRead("id", _id);
reader.tryRead("faction", _faction);
reader.tryRead("status", _status);
reader.tryRead("wantsToSurrender", _wantsToSurrender);
...
std::string spawnUnitType;
reader.tryRead("spawnUnit", spawnUnitType);
if (_spawnUnit = mod->getUnit(spawnUnitType, false)) // ignore bugged types
{
reader.tryRead("respawn", _respawn);
reader.tryRead("spawnUnitFaction", _spawnUnitFaction);
}
reader.tryRead("motionPoints", _motionPoints);
reader.tryRead("customMarker", _customMarker);
reader.tryRead("alreadyRespawned", _alreadyRespawned);
reader.tryRead("activeHand", _activeHand);
reader.tryRead("preferredHandForReactions", _preferredHandForReactions);
reader.tryRead("reactionsDisabledForLeftHand", _reactionsDisabledForLeftHand);
reader.tryRead("reactionsDisabledForRightHand", _reactionsDisabledForRightHand);
if (reader.hasChild("tempUnitStatistics"))
_statistics->load(reader.getChild("tempUnitStatistics"));
reader.tryRead("murdererId", _murdererId);
...

I decided to go with read. The english word load doesn't fit well.
- load is best used when bringing data into memory from an external source
- read is best used when processing some sort of input and may or may not involve interpreting data
- parse is best used when interpreting raw data
What we're doing is, first we load file data into a string, we parse the raw data string into a node tree, and then process individual nodes, which involves reading the data from nodes, parsing it to the correct type (could also say "converting", but convert is used in C#, not in C++), and saving it into individual variables. We could use the term "parse" both for parsing into node tree and for parsing nodes, but using the same term for two things is confusing. Anyway, based on what the function is doing, the precise function name would be... FindChildNodeAndReadItsValueAndParseItIntoOutputVariable, but that would be too long hehe, so I think shortening it to "read" is the best option.
Hmm. Names like loadBool are confusing...

I made read and tryRead. tryRead fits well because it signifies that reading isn't guaranteed. The read version guarantees read, or crash. Sometimes we want to crash if a value is missing. But I also added an overload that takes a default value. If you provide a default value, then it doesn't crash.

I have a question. Which code is easier to use?
1. reader.tryRead("faction", _faction); - Simple, but not super intuitive due to passing output as a function parameter
2. _faction = reader.read("reaction", _faction); - Read guarantees reading if we provide a default value, but coding is annoying because _faction is duplicate
3. _faction = reader.tryRead("faction"); - Looks nice, doesn't it? It works the same as the 1st option. However, it requires tag dispatching.
« Last Edit: October 24, 2024, 04:22:03 am by Delian »

Offline Yankes

  • Moderator
  • Commander
  • ***
  • Posts: 3331
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #13 on: October 24, 2024, 11:30:57 am »
Ok, look good, for `reader.tryRead("faction", _faction);` is probably best as could easy avoid allocations when `_faction` is bigger object like vectors.
Last version could need lot of "magic" to avoid overriding values if node do not exists. I not sure if its even possible as assignment operator need be class member
and we can't add it to `int` or `std::vector`.

For `read` vs `load` this is bikeshedding, I go with `load` but not as english word but more a "keyword" in OXC codebase. But you should not concern yourself with it as it take 1s to change it in whole code base, you could even make `foo` and `bar`.

For replacing `YAML::Node` with `std::string`, yes and no, we can store it as string but it should not be string, it should be dedicated struct that contains this data and probably nothing else. Consider:
Code: [Select]
std::string foo;
YamlCopy bar;
can you explain what is `foo` and what is `bar`? in first case you can't aside to say its some text, in second you see in type itself that this is yaml data.


For speed, if base time take 40ms and "full" take 40+50ms I would enable it by default. At least for rulesets but saves could be useful for Meridian as he need debug broken saves by manual edits sometimes. We already order of magnitude faster and we should not trade all "easy to use" and "code simplicity" to be even faster as loading is only small fraction of whole user playtime.

Offline Delian

  • Colonel
  • ****
  • Posts: 470
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #14 on: October 24, 2024, 12:42:14 pm »
>you see in type itself that this is yaml data
If I understand correctly, you're saying I should define:
typedef std::string YamlString;

That's possible. Well, normally the variable name itself should be self-explanatory enough about what it is. What is better then?
std::string _spawnedSoldierYaml; or
YamlString _spawnedSoldier; ?