aliens

Author Topic: Converting OXCE to rapidyaml  (Read 4978 times)

Offline Meridian

  • Global Moderator
  • Commander
  • ***
  • Posts: 9078
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #75 on: November 11, 2024, 11:25:12 am »
Anything else I should work on as far as this migration goes?

Next steps (not necessarily in this order):

0. Delian: test the ziploader, and the embedded loader (=common/standard resources embedded in the executable directly)... not sure if embedded can be tested on Windows or only on Android (as far as I know nobody is using this, but we should test it anyway)

1. Meridian(+Delian): test compilation on all supported platforms -- I may need help merging code and/or updating compilation scripts

2. Meridian: style review (just some OCD stuff, nothing too important)

3. Meridian: manual completeness review (manually check source code if something was forgotten during refactoring, etc.)

4. Meridian: very quick load/save/startup tests on all supported platforms

5. Meridian: create a documentation post with incompatibilities and migration guide for modders

6. Meridian+Yankes: decide, which incompatibilities we want to fix and which not (I will tend to do as few RYML code changes as possible)

7. Yankes: actual functional and architectural review (I am just a hobby-programmer these days, so I can't do that), Yankes needs to review, because he will be the one supporting it in the future

8. Meridian: create a new HowToCompileOXCE thread with updated instructions for OXCE 8.0+

9. Meridian: prepare a preview build for modders to update their mods to OXCE 8.0, potentially help with PRs to FMP, XCF and other mods on github

10. all: merry Xmas

Offline Yankes

  • Global Moderator
  • Commander
  • ***
  • Posts: 3347
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #76 on: November 11, 2024, 03:30:12 pm »
@Delian
I see you go with local `libs` approach in distributing rapid, question is if this "pristine" version of rapid?
I recall that you mentions some corner cases that was not handled correctly by it, did you fix in its source code?
Another question, what exact version of rapid was used?

Right now I have plan to tweak your commits a bit.
First I will split main commit in two, one that add lib itself, and second that update game logic.
If there was some fix to lib, I would add it in separated commit to not mix "pristine" version with our fixed one.

Online Delian

  • Colonel
  • ****
  • Posts: 492
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #77 on: November 11, 2024, 07:24:01 pm »
The rapidyaml source files in my commit are the original unmodified files from github, the latest 0.7.2 version. After Meridian said to avoid custom fixes, I didn't use modified rapidyaml anymore, so all the testing I did on the mods was with the original src files. As mentioned before, I copied them from 3 places:
rapidyaml/src/
rapidyaml/ext/ -> c4core/src/c4/
rapidyaml/ext/ -> c4code/src/c4/ext/ -> debugbreak/debugbreak.h

You can modify the file structure, function names, commits, whatever you want hehe. Hmm, in the VS project, it would probably be better if the rapidyaml folder was inside the Engine folder.

Oh, there's a couple of old unrelated commits in my repository which you must NOT include. The changes to ‎src/Geoscape/ConfirmDestinationState.cpp and ‎src/Geoscape/CraftNotEnoughPilotsState.cpp . You should discard those.

Btw, guess what I've been doing all day long!
« Last Edit: November 12, 2024, 12:35:41 am by Delian »

Online Delian

  • Colonel
  • ****
  • Posts: 492
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #78 on: November 13, 2024, 01:22:37 am »
So what I did was, I used WinDbg to debug OXCE, break at a certain point in the Mod.cpp code (before AfterLoad calls, to avoid too much recursion), and then use WinDbg's command line to dump Mod object. I did a 7-level recursive traverse (with Natvis, because std::map and std::vector's data structures are otherwise too deep to dump), output to a file, which generated a 1.8GB 18M line text file. I did this both with the yaml-cpp and rapidyaml, which took a couple of hours each. I then edited the output files, censored pointer addresses and structure offsets, and compared the results. I found the following differences:

_mod->_scriptGlobal->_refList[xx].value.data
_Val and _Pad are different. Are these just pointers that randomly change on each startup?

_mod->_scriptGlobal->_events[xx][yy]._proc
Some bytes change from one run to the other. Is this because the compiled scripts contain pointers, and these changed bytes are just pointers that always change on each startup?

_mod->_globe._textures[xx].second._terrain[xx]
Some values (of type double) are different... well, this is complicated.
As we know, our rules allow mods to add on top of one another. Two mods can define the same rule, where properties of the 2nd definition add to, or overwrite properties of the 1st definition.
rapidyaml, coincidentally, happens to match behavior. If we try to deserialize a yaml sequence into an existing std::vector, each yaml sequence element will be deserialized into an existing vector element, updating the fields.
yaml-cpp however, clears the std::vector before trying to deserialize into it.
So this difference is (probably a bug) caused by:
- Dio not deleting existing globe textures before updating them with his own.
- rapidyaml in-place deseralizing into existing std::vector

I'm not sure which behavior is correct. For now I've disabled rapidyaml's support for std::map and std::vector and copied the implementation into our Yaml.h, where I modified it to clear vector/map before deserializing into them. This replicates the yaml-cpp's behavior, and while it is now backwards-compatible, one could argue that the behavior is incorrect.

_mod->_transparencies[xx].second[yy]
This one was my fault. In old Mod.cpp line 2543 the for loop was increasing two counters at the same time and I missed the 2nd one. Fixed, tested, and now it works.

_mod->_invs[xx].second._costs[xx].second
Similar problem like with the std::vector, except this time it's a std::map. But it's worse. If rapidyaml tries to deserialize into an existing std::map, which isn't empty, the deserialization for the elements with matching keys quietly fails. The previous solution of clearing the map before deserializing into it solved the problem. I've also reported the inconsistency issue between vector and map to library author.

_mod->_armors[xx].second._moveCostFlyWalk.TimePercent
This one was also my fault. In the ArmorMoveCost::load() I put in UInt8 as deserialize type. I don't know how that happened. Changed it to int and problem fixed.

YAML::Node changed to YAML::YamlString
As expected, this causes some changes in the memory layout.

There was also another problem related to deserializing into map, but I fixed it in the custom map deserialize method.

I pushed the fixes to the remote repo.
« Last Edit: November 13, 2024, 10:56:24 am by Delian »

Offline Yankes

  • Global Moderator
  • Commander
  • ***
  • Posts: 3347
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #79 on: November 13, 2024, 02:26:13 am »
_Val and _Pad are different. Are these just pointers that randomly change on each startup?
I do not know what `_Val` or `_Pad` is. As name suggest this is c++ lib members, ask them what its means.

But for whole `_scriptGlobal`, yes it contains lot of self referencing pointers that are different on each load.

Quote
yaml-cpp however, clears the std::vector before trying to deserialize into it.
You need preserve this behavior, this is how near all list/maps work as default.
This is why I created all `loadNames` and other similar functions to override this behavior.
We can't change as it will break near every mod.

If you add custom function that clear vector before load, add `!add` and `!info` tag support and allow
preserving original vector when someone use `!add` tag on this node.

Quote
I don't know how that happened. Changed it to int and problem fixed.
`char` happens, this is not numeric type but characters, I would even force all `char` or `unsigned char` to load as `int`.


btw I added some comments to your repo do you look on them?

Online Delian

  • Colonel
  • ****
  • Posts: 492
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #80 on: November 13, 2024, 08:43:46 pm »
Yeah, some ScriptRawMemory = std::aligned_storage internals. Scripts work fine, so it's probably nothing.

>add `!add` and `!info` tag support
That would be new functionality that isn't a part of this migration hehe

I've looked through the comments and answered them all. Fixed what was recommended. I also found and fixed another bug which I missed yesterday. Dumping the whole Mod object was a really good idea haha.

Next I'll test the ziploader. And I'll see if I can get embedded loader to work.

Offline Yankes

  • Global Moderator
  • Commander
  • ***
  • Posts: 3347
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #81 on: November 13, 2024, 10:44:15 pm »
Yeah, some ScriptRawMemory = std::aligned_storage internals. Scripts work fine, so it's probably nothing.

ok, this is "container" that sometime have text pointers or `int` values
 
>add `!add` and `!info` tag support
That would be new functionality that isn't a part of this migration hehe

ok

I've looked through the comments and answered them all. Fixed what was recommended. I also found and fixed another bug which I missed yesterday. Dumping the whole Mod object was a really good idea haha.

Next I'll test the ziploader. And I'll see if I can get embedded loader to work.

I added one critical response about `untilLastIf` and some less critical for `std::vector` vs `[]`.

And probably in this week if I will have more time I will look again on this code.

Online Delian

  • Colonel
  • ****
  • Posts: 492
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #82 on: November 15, 2024, 12:42:29 am »
some less critical for `std::vector` vs `[]`.

Have you measured how many nanoseconds this saved? :D

I have 10 cores, but they're all lazy drunks :)

I think I figured out why your build times are so slow.

When I build the project (24 core threads), it takes 45s


But if I tab out during the building, it takes over 2min


What's causing the throttling to 33% of total CPU when VS window isn't active?
It's Windows. And the solution for me (Windows 11) was to go to Settings -> System -> Power, and changing the "Power Mode" from "Balanced" to "Best Performance". Now building remains fast for me even if I tab out.

Edit: If you do that, also make sure that in VS you go to Tools > Options > Projects and Solutions > Build and Run and turn on "Run build at low process priority" option, that way foreground tasks won't stutter even if building in the background is using 100% of CPU.
« Last Edit: November 15, 2024, 11:31:34 pm by Delian »

Offline Yankes

  • Global Moderator
  • Commander
  • ***
  • Posts: 3347
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #83 on: November 15, 2024, 02:35:29 am »
Have you measured how many nanoseconds this saved? :D
I do not know if one allocation can be measured but after year of changes like this game will be
faster by 0.5% :) Effective opposite of "Death by a Thousand Cuts". This is more a discipline that
direct optimization. Evert time I see some allocation I ask myself "do I need it?".
Author of `yaml-cpp` did not ask this question and result is apparent.
Another example of lack of "disciple" was my use of `std::stringstream` as "other parts of game use this too and this will be fine",
but as you show it was no fine when it take 4% of whole time.

Online Delian

  • Colonel
  • ****
  • Posts: 492
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #84 on: November 15, 2024, 12:28:54 pm »
Oh, that's minor if we compare it to the time you're wasting by calling std::sort() 42785 times, in the ModScript constructor.



Talk about a death by a thousand cuts hehe
(that makes up about 15-20% of the rules loading time)
« Last Edit: November 15, 2024, 01:09:34 pm by Delian »

Offline Yankes

  • Global Moderator
  • Commander
  • ***
  • Posts: 3347
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #85 on: November 15, 2024, 02:37:22 pm »
ok, but this is not allocation :D
And its not "wasting" as each operation add new element to "flat map", but indeed as this is "hot" path I can improve this.

Overall its impossible to replace this `std::vector` by `std::unordered_multimap` as I many times need rage of values using partial search by prefix.
Right now I can try use `insert` & `partition_point` to avoid sorting whole vector and move only elements only when I find correct spot.
Another more invasive fix is move sorting after all elements where added, but this would need update script "framework" to catch place where is safe
to add elements.

Online Delian

  • Colonel
  • ****
  • Posts: 492
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #86 on: November 15, 2024, 11:29:29 pm »
I many times need range of values using partial search by prefix.

Sounds like you should be using an Adaptive Radix Tree (an advanced form of Trie - data structure specifically designed for string prefix searches)). Too bad it's not among standard containers hehe.
Alternatively, you could use a hashmap of hashmaps, where the key is prefix, and value is a hashmap of entries that fall under that prefix.

But yeah, normally the sorting should only happen after all the mods and rules are loaded, so sorting only happens once.

0. Delian: test the ziploader, and the embedded loader (=common/standard resources embedded in the executable directly)

For now I've tested the ziploader, or at least I think that was the ziploader.
UFO.zip and TFTD.zip in the exe folder or user folder works fine.
common.zip and standard.zip in exe folder works.
common.zip and standard.zip in user folder doesn't work.
common.zip in user folder and standard.zip in user/mods folder works.
piratez.zip in user/mods folder works.
The only problem was that, if I used piratez.zip, the process would consume 1000MB of RAM instead of 600MB, which could be bad.

As for the embedded loader, I'm still trying to figure it out.

Offline Yankes

  • Global Moderator
  • Commander
  • ***
  • Posts: 3347
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #87 on: November 16, 2024, 12:05:00 am »
Sounds like you should be using an Adaptive Radix Tree (an advanced form of Trie - data structure specifically designed for string prefix searches)). Too bad it's not among standard containers hehe.
Alternatively, you could use a hashmap of hashmaps, where the key is prefix, and value is a hashmap of entries that fall under that prefix.

But yeah, normally the sorting should only happen after all the mods and rules are loaded, so sorting only happens once.
I did check my simple fix and it look it work fine, its reduce time around by 90%. Technically this was only change `O(n * log n)` to `O(n)` but cache
friendlies make most of work probably.
You can check how big improvements it was:
https://github.com/Yankes/OpenXcom/commit/02604ecb9b7fb07472ff50facd5a6068f6559b25
https://github.com/Yankes/OpenXcom/commit/784af7587fc9c7a3accf8f5ab86e1ce2b3662361 (cleanup commit)
https://github.com/Yankes/OpenXcom/commit/e490a469dcfdf14b81f8c5edd3df9b69e27d1492


Online Delian

  • Colonel
  • ****
  • Posts: 492
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #88 on: November 16, 2024, 03:14:27 am »
>O(n * log n)` to `O(n)
I think partition_point uses binary search, therefore it's O(n*log(n)) to O(log(n))

But yeah, it's much faster now. I can confirm that not calling std::sort() 42785 times makes a big difference! Another 15% faster startup time.

At this point, I don't see any real way to improve the startup anymore. In Release build, half of the startup time is consumed by Sound::load(); sdl_mixer.dll issues I suppose.
In Debug build, another 20% (>2s) speedup would be possible if RuleStatBonus::load() wasn't attaching a script to everything. Basically, 4000 items and armors, each attaching 6 scripts for the stat multipliers... it's a lot of script parsing.
« Last Edit: November 16, 2024, 10:03:25 am by Delian »

Offline Yankes

  • Global Moderator
  • Commander
  • ***
  • Posts: 3347
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #89 on: November 16, 2024, 01:11:16 pm »
>O(n * log n)` to `O(n)
I think partition_point uses binary search, therefore it's O(n*log(n)) to O(log(n))

But yeah, it's much faster now. I can confirm that not calling std::sort() 42785 times makes a big difference! Another 15% faster startup time.

At this point, I don't see any real way to improve the startup anymore. In Release build, half of the startup time is consumed by Sound::load(); sdl_mixer.dll issues I suppose.
In Debug build, another 20% (>2s) speedup would be possible if RuleStatBonus::load() wasn't attaching a script to everything. Basically, 4000 items and armors, each attaching 6 scripts for the stat multipliers... it's a lot of script parsing.
Is `O(n)` because I do `inseter` and this dwarf search complexity (as I could try push objects in reverse order in worst case).

This `RuleStatBonus` is simply cost of "robustness", we need pay somewhere for this, like make code lot more complex, drop some functionalists etc.
Only thing I see that is bit redundant is conversion of old node format to text representation and then parsing it but it would need new script representation.
Alterative all default scripts could share data in some way.