Author Topic: [Rejected] Game startup cache  (Read 4887 times)

Online Delian

  • Colonel
  • ****
  • Posts: 499
    • View Profile
[Rejected] Game startup cache
« on: August 02, 2022, 12:22:04 am »
Rejection reason: this is beyond/outside the scope of OXCE

Quote
If the YAML files don't change, why parse them every time you run the game?

Currently, when you start up the game, the majority of loading time comes from loading the YAML files and from using that data to produce Rule objects.

I suggest this be optimized using a simple cache. Here's the preliminary idea's process:

1. Start up the game
2. Produce a hash from the YAML files that the game intends to load. The hash is created from the YAML file names, file sizes, and modification dates, plus from the exe file
3. Compare the hash to the cache hash
4a. If the hashes don't match (YAML files changed):
- Clear the cache
- Start up the game normally. Parse the YAML files and load the Rule object
- Create a new cache. Serialize all the Rule objects into a binary format cache file, and save the hash as a new cache hash
4b. If the hashes match (YAML files didn't change):
- Deserialize all the Rule objects from the cache file

For de/serialization, I suggest using FlatBuffers due to its deserialization speed.

Depending on the mods used, this approach could reduce the game startup time by up to 90% (+5% if a similar cache is used for the language files).
« Last Edit: February 19, 2023, 02:16:17 pm by Meridian »

Offline Yankes

  • Global Moderator
  • Commander
  • *****
  • Posts: 3350
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #1 on: August 02, 2022, 02:35:32 am »
How you plan cache pointers? Or deep nested structures?
And how exactly fast is this deserialization? This will not be simply `memcpy` from file
as game use strings that could have arbitrary length.

I see this near impossible without rewriting every rule object from scratch to to be easy serializable.

Online Delian

  • Colonel
  • ****
  • Posts: 499
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #2 on: August 02, 2022, 11:36:24 am »
How you plan cache pointers?
A two-pass algorithm that uses an object-ID map.

Or deep nested structures?
You write a schema. It's all flat in the memory. Unless you're talking about pointers.

And how exactly fast is this deserialization? This will not be simply `memcpy` from file
as game use strings that could have arbitrary length.
It's very fast~
Strings are just arrays/vectors of bytes. Either you store the string length, or it's null-terminated.

I see this near impossible without rewriting every rule object from scratch to to be easy serializable.
It's neither impossible, nor is it dramatic that it would require full rewrite. Writing schemas can take a while tho.

Offline Meridian

  • Global Moderator
  • Commander
  • *****
  • Posts: 9088
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #3 on: August 02, 2022, 11:49:22 am »
Can you make a working prototype for review?

Online Delian

  • Colonel
  • ****
  • Posts: 499
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #4 on: August 02, 2022, 12:09:27 pm »
My C++ experience is rather poor, so it may take a while.

If it works as intended, you'd accept this feature?

Also, do you have any other comments or tips for implementation? Anything to watch out for?
« Last Edit: August 02, 2022, 12:12:12 pm by Delian »

Offline Yankes

  • Global Moderator
  • Commander
  • *****
  • Posts: 3350
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #5 on: August 02, 2022, 12:45:47 pm »
And versioning? I could create cache, then install new version of OXCE and then what? it will load incompatible byte data?

Beside I still hold my stance that if you do not refactor whole loading process you will not archive your goal.
For start I can say you do not success as one thing you would need serialize script code that can have embedded pointers.
Have fun reinterpreting byte array for search of them.

I think better way would be simply finding what take most time now, and try mitigate it. What exactly take most of time during load?
Maybe there is simply performance bug that do some stupid thing that make load so long?

Offline Meridian

  • Global Moderator
  • Commander
  • *****
  • Posts: 9088
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #6 on: August 02, 2022, 12:56:27 pm »
I do understand all the high-level concepts you mentioned, but I have little idea how to actually physically implement them myself within the openxcom framework.

I have no reason to doubt Yankes... for years now, everything he said turned out to be 100% true.
So, I am on the skeptical side atm.

(Offtopic: I also have personal experience of helping implementing and using a serializer/deserializer commercially at work, which turned out to be an unmaintainable, overgrown and unreliable monster... of course the scope of that was much bigger than what openxcom would require... but still, it's the only tangible experience I have).

That's why I give an option to show me more, so that I can form a more informed opinion.
Can't promise anything though.

If the change is not too invasive, it can be considered.
If we'd need to "rewrite" all the Rule* classes, then it's a no go.

Offline krautbernd

  • Commander
  • *****
  • Posts: 1108
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #7 on: August 02, 2022, 01:56:51 pm »
I think better way would be simply finding what take most time now, and try mitigate it. What exactly take most of time during load?
Maybe there is simply performance bug that do some stupid thing that make load so long?
I think this would be a far better and saner approach.

Online Delian

  • Colonel
  • ****
  • Posts: 499
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #8 on: August 02, 2022, 05:26:01 pm »
And versioning? I could create cache, then install new version of OXCE and then what? it will load incompatible byte data?
You're right. OXCE exe file info would also need to be included in the hash. So if any Rule object changes, the exe file changes, hash changes, and it invalidates the cache.

Beside I still hold my stance that if you do not refactor whole loading process you will not archive your goal.
For start I can say you do not success as one thing you would need serialize script code that can have embedded pointers.
Have fun reinterpreting byte array for search of them.
Hmm, that does sound like a problem. I guess the solution for that would be to postpone script compiling until after the Rule objects are loaded. So what would be serialized instead would the script text.
You know, after checking the code, the pointers might be less of a problem than I thought, because all the linking (pointer setting) is already happening in the afterLoad() functions of the objects. But yeah, a little bit of refactoring would be necessary to move script compilation from load() to afterLoad(). I might need your help with that!
For instance, in the RuleStatBonus::load(), first the script text is generated, and then _container.load(parentName, script, parser); is called to compile the script. How would I change this function so that script text is saved, so that I'm able to compile the script code later?

Btw, I think it's a bug that script compilation is already happening in load() instead of afterLoad()... the load() function was supposed to only extract data from YAML.

What exactly take most of time during load?

Inside the loadMods(), 70% of the CPU time is consumed by YAML::load() - just YAML parsing, and 30% by Rule generation. At least in large mods. So a much easier solution would be to de/serialize the parsed YAML::Node objects instead of the Rule objects. But then the end result would be only a 70% reduction of loading time...

I have personal experience of helping implementing and using a serializer/deserializer commercially at work
So do I, but that was a different language, different format, and much smaller in scale.

If the change is not too invasive, it can be considered.
If we'd need to "rewrite" all the Rule* classes, then it's a no go.
That doesn't sound very reassuring.

I think this would be a far better and saner approach.
But but... only 70%!

Offline Yankes

  • Global Moderator
  • Commander
  • *****
  • Posts: 3350
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #9 on: August 02, 2022, 05:51:37 pm »
Btw, I think it's a bug that script compilation is already happening in load() instead of afterLoad()... the load() function was supposed to only extract data from YAML.
Why bug? Where is said that `load` can only load data? Its already do other processing of loaded data.
`afterLoad` was added only because some data is not available when you load one file and you can't link objects to each other.

Inside the loadMods(), 70% of the CPU time is consumed by YAML::load() - just YAML parsing, and 30% by Rule generation. At least in large mods. So a much easier solution would be to de/serialize the parsed YAML::Node objects instead of the Rule objects. But then the end result would be only a 70% reduction of loading time...
Question is if we could simply change yaml parser? If this all could be simply fault of yaml-cpp not yaml in it self.
And this could have another benedict of improve speed of saving/loading.
This will be very invasive change but if grain will be reduction of 70% to 10%?

Offline Yankes

  • Global Moderator
  • Commander
  • *****
  • Posts: 3350
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #10 on: August 02, 2022, 06:00:56 pm »
https://github.com/biojppm/rapidyaml
This page clam that load is 70 time faster than yaml-cpp, this mean you will have 70% reduced to 1%.

Offline R1dO

  • Colonel
  • ****
  • Posts: 442
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #11 on: August 02, 2022, 07:16:10 pm »
Just in case you are seriously considering changing the yaml parser.

It is probably a good idea to keep an eye on the following bug: https://github.com/biojppm/rapidyaml/issues/289
That one might not play well with flow-style yaml ("{  element1,  element2, element3 }").

Online Delian

  • Colonel
  • ****
  • Posts: 499
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #12 on: August 02, 2022, 07:27:30 pm »
`afterLoad` was added only because some data is not available when you load one file and you can't link objects to each other.
Exactly. What if compiling a script needs data that isn't available yet? Even if it's not a problem right now, it could be in the future.

And this could have another benefit of improved speed of saving/loading.
This will be very invasive change but if gain will be reduction of 70% to 10%?
I like it, but Meridian won't haha... because there's 900 places in the code that would have to be changed to use the faster library. And it would probably have to be done in OXC as well.

Offline Yankes

  • Global Moderator
  • Commander
  • *****
  • Posts: 3350
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #13 on: August 02, 2022, 07:56:15 pm »
Just in case you are seriously considering changing the yaml parser.

It is probably a good idea to keep an eye on the following bug: https://github.com/biojppm/rapidyaml/issues/289
That one might not play well with flow-style yaml ("{  element1,  element2, element3 }").
I would prefer changing parser than adding brand new serialized/deserializer.

but yes, when we would switch it, we need check if it work better than old one.
and there could be some current use cases that will be hard to recreate in new parser.

Another problem is new dependency that could be not available in all systems.

Exactly. What if compiling a script needs data that isn't available yet? Even if it's not a problem right now, it could be in the future.
I simply try avoid cases like this, because delaying it only make mess in code,
because you need propagate partial results though couple of program layers.
Doable but need lot work to look good and work good.

I like it, but Meridian won't haha... because there's 900 places in the code that would have to be changed to use the faster library. And it would probably have to be done in OXC as well.
Yes, but this change is less fundamental than trying adding cache, some of 900 lines could be solved by global regex.
I would start all this by adding `OXCE_YAML` a namespace that replace `YAML` and re-implement all API we use there.
Implementation of it will be simple as forward to yaml-cpp, when its is done then I will put everything in big `#if` and add `#else`
that use different lib as yaml backend. This could be more complex as some features could be lacking in new lib (probably `as<std::vector<int>>()` will be problem) but nothing that could be solved in couple of days.

Online Delian

  • Colonel
  • ****
  • Posts: 499
    • View Profile
Re: [Suggestion] Game Startup Cache
« Reply #14 on: August 03, 2022, 10:02:23 am »
Actually I like your solution better than mine. I think the decrease in saving/loading time is a sticking point. But I also think it would be the hardest to implement, because most of the API is different. It's definitely too much for me.