Author Topic: Converting OXCE to rapidyaml (Read 11065 times)

Yankes · « **Reply #165 on:** December 03, 2024, 12:56:24 am »

`mutable`?

[ps]

btw, for now we should keep all speedups to after deploy. Stash this change for later.
for now we need weed out all bugs that could linger in new version.

I planed today prepare new RC but I have very busy day and I did not find enough time to sit down and prepare all changes.

Delian · « **Reply #166 on:** December 03, 2024, 12:57:07 pm »

That's it! I didn't know that keyword existed lol. And it's a perfect fit for this use case. Well, I should've made _index mutable as well, so I wouldn't need to make a copy of the reader every time I needed to make one. Ok, I'll leave that change for later then, together with script caching, which I put on a separate branch. Although, doesn't it make sense to try to cram in as many changes as possible before everything is retested? I mean, if everything has to be retested anyways...
Oh, I also merged everything except for the alias stuff from the rc0 branch to mine.

Yankes · « **Reply #167 on:** December 03, 2024, 01:30:39 pm »

if too many changes will be done at once then multiple bugs can creep in at once. And in worse case make everything "work" but when you try fix some corner cases then all bugs will surface again in other places.
When we include biggest change and test it fully, then small optimization could be tested locally without worrying that big change could interfere with small one.

btw I do not think merging is good strategy in this case, as I make rebase of your changes, next RC1 will be another rebase (I will push changes on RC0 too but without rebase, for you to see what I changed). I do this mainly because I try with Meridian keep linear hisotry and atomic commits as this allow easy bisection of bugs (as we can find exactly what commit break some mods).

Yankes · « **Reply #168 on:** December 04, 2024, 02:27:31 am »

New version for testing: https://github.com/Yankes/OpenXcom/tree/stash/rapidyaml-rc1
I stashed all speed improvements and only keep basic yaml changes + clean warning.

@Delian
For now I renamed `sansRoot` to `toBase` as your name at least for me is not clear.
But more I think about this, do we need this for anything? Slicing is only dangerous
when we have polymorphic types but in our case this is not true, sliced or not behavior is same.

Do you have any case when implicit slice would break our code?

Delian · « **Reply #169 on:** December 04, 2024, 11:41:00 am »

sans = without. For instance, Sans-serif Font is a font without serifs.
YamlRootNodeReader without Root = YamlNodeReader. Get it? You're just removing the word "Root".
I know it's not a standard programming term, but I thought it was funny...

I don't think slicing causes any crashes, but Visual Studio is throwing warnings whenever it happens. So, it's mostly just to make the compiler/intellisense happy.

I tested rc1 and no issues for me. I added some comments to the rc0/rc1 commits tho.

Yankes · « **Reply #170 on:** December 06, 2024, 01:10:21 pm »

@Delian

Are you fine with current state of RC1? If yes, then Meridian will start preparing to create new 8.0 release with it.
All speedups and other improvements will go to nightly versions like 8.0.1, 8.0.2, ... and when all will be tested then 8.1 will be published.

Delian · « **Reply #171 on:** December 06, 2024, 01:35:30 pm »

Yeah. I'm not finding any issues that would be related to the migration itself. In other words, the parsing and round trips work, so it's fine and you can start.

Yankes · « **Reply #172 on:** December 19, 2024, 10:31:03 pm »

After deeply analyzing code I find couple place that could improve overall performance, some places are iterate tree times to allocate vector two times,
other times iterations are `O(n^2)` (not large list but still 10 vs 100 it a bit). Thanks to blazing fast and effective rapidyaml this places are hard to notice.
But if we add some helper structures I guess that overall performance could be improved two times (like reduce some operations to `O(n)` and avoid cache misses).

This will be done after main change get into OXCE.

Change will look like:

To `YamlRootNodeReader` add two vectors:
`std::vector`with struct that have `childsStartsPtr` and `childsSize`
And another
`std::vector` with struct `nodeKey` and `nodeId`
Size of both vectors will be equal to size of whole node list.
First vector points to sub-lists of nodes in second vector, we gather all node children and put then in second vector.
Now when we ask for children we check this both vectors instead of navigating nodes itself.
this should change this effective change current `list<node>` to `vector<node*>`
and it should have better cache characteristics.
We could to sort map keys to have `O(log(n))` performance too instead of current `O(n)` (unless we use index).

I need make test if my idea will have real benefits but I expect good grains from approach like this.

Delian · « **Reply #173 on:** December 20, 2024, 04:04:31 am »

Rapidyaml doesn't use list<node>. Internally it already uses vector<NodeData> (It's array*, same thing; NodeData = 72 byte struct), and "NodeId" is just offset in this array.

The cache-locality of this array is already great. It would be very difficult to improve it. You can try, but I think constructing your vector would take more time than the time gained from theoretically improved cache-locality. Sorting... that's worse than hashing, isn't it? That's why std::map never beats std::unordered_map.

I can give you some current timing data. Let's say game startup (debug build with XPZ; Options::mute = true) takes 5000ms
Of these 5000ms, operations related to rapidyaml take:
- 1300ms to parse raw yaml into node trees
- 333ms searching for node keys
- 263ms deserializing node values
- 87+35ms building+destroying unordered_map node key indexes
- 32+13ms building+destroying YamlNodeReader.children() vectors
- 20ms building+destroying YamlNodeReader objects
- other negligible stuff

Of the 333ms of node key searching there is:
- 220ms (10ms in release build) of indexed searching
- 72ms (4ms in release build) of unindexed searching (with stored child iterator optimization)

What about game loading? In the release build, loading a 12MB sav file, all the unindexed searching together takes <2ms.

If your proposed optimization is targeting unindexed key searching, well, there's very little left to gain there. The majority of search calls (3 to 1) are already indexed. At least for game startup. For game-loading, the majority is unindexed, however that's mostly (80%) inside BattleUnitKills::load(), where the stored child iterator optimization gives it O(1).

Btw, what are the places that you say you found where there's too much iteration?

Yankes · « **Reply #174 on:** December 20, 2024, 02:24:22 pm »

Quote from: Delian on December 20, 2024, 04:04:31 am

Rapidyaml doesn't use list<node>. Internally it already uses vector<NodeData> (It's array*, same thing; NodeData = 72 byte struct), and "NodeId" is just offset in this array.

The cache-locality of this array is already great. It would be very difficult to improve it. You can try, but I think constructing your vector would take more time than the time gained from theoretically improved cache-locality. Sorting... that's worse than hashing, isn't it? That's why std::map never beats std::unordered_map.

I can give you some current timing data. Let's say game startup (debug build with XPZ; Options::mute = true) takes 5000ms
Of these 5000ms, operations related to rapidyaml take:
- 1300ms to parse raw yaml into node trees
- 333ms searching for node keys
- 263ms deserializing node values
- 87+35ms building+destroying unordered_map node key indexes
- 32+13ms building+destroying YamlNodeReader.children() vectors
- 20ms building+destroying YamlNodeReader objects
- other negligible stuff

Of the 333ms of node key searching there is:
- 220ms (10ms in release build) of indexed searching
- 72ms (4ms in release build) of unindexed searching (with stored child iterator optimization)

What about game loading? In the release build, loading a 12MB sav file, all the unindexed searching together takes <2ms.

If your proposed optimization is targeting unindexed key searching, well, there's very little left to gain there. The majority of search calls (3 to 1) are already indexed. At least for game startup. For game-loading, the majority is unindexed, however that's mostly (80%) inside BattleUnitKills::load(), where the stored child iterator optimization gives it O(1).

It use "list" as nodes are not sequenced in memory, all exits in one big vector but you need jump between elements. it could be couple of cache lines between siblings. CPU can't preload next after sibling as it need know value in next sibling node first. With more "dense" layout, CPU can preload arbitrary sibling.

> That's why std::map never beats std::unordered_map

`std::flat_map` not `std::map`, even when we have unordered one, some people still need it and add it to standard:
https://en.cppreference.com/w/cpp/container/flat_map
There are use cases where it could be better.

overall thanks for detailed timings, at least I would need reevaluate my initial optimistic predictions.
Still I think that we can grain "more" time that even could be concluded from that only "yaml" use.
As memory cache is shared resource, improving one part can improve other unrelated parts too.

Delian · « **Reply #175 on:** December 20, 2024, 10:00:55 pm »

I did a small test. It's 100ms just visit all the nodes of all the trees (no vector building). So, good luck~

Author Topic: Converting OXCE to rapidyaml (Read 11065 times)

Yankes

Re: Converting OXCE to rapidyaml

Delian

Re: Converting OXCE to rapidyaml

Yankes

Re: Converting OXCE to rapidyaml

Yankes

Re: Converting OXCE to rapidyaml

Delian

Re: Converting OXCE to rapidyaml

Yankes

Re: Converting OXCE to rapidyaml

Delian

Re: Converting OXCE to rapidyaml

Yankes

Re: Converting OXCE to rapidyaml

Delian

Re: Converting OXCE to rapidyaml

Yankes

Re: Converting OXCE to rapidyaml

Delian

Re: Converting OXCE to rapidyaml