aliens

Author Topic: Converting OXCE to rapidyaml  (Read 13959 times)

Online Meridian

  • Global Moderator
  • Commander
  • ***
  • Posts: 9235
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #180 on: January 09, 2025, 03:41:18 pm »
When trying out rosigma with the latest version, an assert in debug build fails. The release build works fine tho.

Fixed.

Merged.

Ok, I've done rebasing my branch to the 8.0 version.

The rest will be done by Yankes later.

Online Meridian

  • Global Moderator
  • Commander
  • ***
  • Posts: 9235
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #181 on: January 12, 2025, 01:54:09 pm »
I noticed some differences in the save files, see attachments.

Are these equivalent?
I.e. do they work both ways?

And even if yes, don't we want to use the old way?
(It was better readable.)

Would it have performance impact?
« Last Edit: January 12, 2025, 02:04:57 pm by Meridian »

Online Delian

  • Commander
  • *****
  • Posts: 589
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #182 on: January 12, 2025, 02:27:05 pm »
I noticed some differences in the save files, see attachments.

Lastly, the yaml standard says that certain non-printable characters should be escaped, as in, ascii code 00 should produce "\x00" in the yaml. It also states that it's not mandatory. Unlike yaml-cpp, rapidyaml does not respect this suggestion hehe. It simply outputs a control code 00 character raw. Subsequent parsing of these non-escaped characters works just fine however. It even works fine in yaml-cpp, so it's perfectly backwards compatible. As long as a string is in double quotes, everything goes.

Yes, they are equivalent. I'm not actually sure how to force rapidyaml to escape non-printable characters when serializing data.
It would have a noticable impact on serialization speed if we did it for each and every node, but if it's done only in a few places, there would be no impact. I'll see what I can do.

Online Meridian

  • Global Moderator
  • Commander
  • ***
  • Posts: 9235
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #183 on: January 12, 2025, 08:43:41 pm »
Good to know it's equivalent.
(I missed that older post.)

In that case, it's low prio, and should be done only for a few selected nodes, if at all.

Online Delian

  • Commander
  • *****
  • Posts: 589
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #184 on: January 13, 2025, 03:01:34 am »
It looks like, sadly, rapidyaml doesn't implement any detection of non-printable characters. I found out why. I spent a few hours trying to implementing something myself, but it quickly got too complicated.
The first problem is that, if you wanna do the detection correctly, then you need to implement a UTF-8 variable-width character iterator and iterate through the string that way.
The second problem is converting UTF-8 into Unicode. For instance, a character with a UTF-8 hex code 0xC285 when converted and escaped becomes "\u0085", so not a trivial thing.
And then there's detecting whether a character is printable or not. That wasn't hard to do, because the character ranges are well defined in the yaml standard.

Btw, the "tileIndexSize", "tileFireSize" etc nodes... why exactly are these nodes being serialized as strings? The values are Uint8 type, so one would expect them to be saved as numbers and not strings.

Offline Yankes

  • Global Moderator
  • Commander
  • ***
  • Posts: 3396
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #185 on: January 13, 2025, 03:33:56 am »
Not as string but as `char` (remeber `Uint8` lie to you), yaml-cpp use single quotes like `'\x01'` to somehow match C/C++ code.

Btw why support full unicode when problem is only with ASCII?

Online Meridian

  • Global Moderator
  • Commander
  • ***
  • Posts: 9235
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #186 on: January 13, 2025, 11:03:29 am »
We use only 2 non-printing characters in strings really.

Code: [Select]
Unicode::replace(s, "{SMALLLINE}", "\x02"); // Unicode::TOK_NL_SMALL
Unicode::replace(s, "{ALT}", "\x01"); // Unicode::TOK_COLOR_FLIP

And from all fields in the save file, only 1 or 2 can actually contain them.

I wanted to manually replace \x01 with \\x01 but saving it was a bit confusing.
The same string gets saved differently depending on whether quotes are used or not.
See below, why is that the case?

Code: [Select]
std::string converted = "abc\\x01def";
headerWriter.write("craftOrBase", converted).setAsQuoted();
headerWriter.write("craftOrBase2", converted);

craftOrBase: "abc\\x01def"
craftOrBase2: abc\x01def

I would like to save it as

Code: [Select]
craftOrBase: "abc\x01def"

but I could not find a way how to do that?

Online Meridian

  • Global Moderator
  • Commander
  • ***
  • Posts: 9235
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #187 on: January 13, 2025, 11:05:40 am »
Btw, the "tileIndexSize", "tileFireSize" etc nodes... why exactly are these nodes being serialized as strings? The values are Uint8 type, so one would expect them to be saved as numbers and not strings.

If I remember correctly, it's because certain yaml-cpp versions can/could not serialize/deserialize `Uint8` correctly and the game crashed.
So a workaround was made to save them as `char`.

Online Delian

  • Commander
  • *****
  • Posts: 589
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #188 on: January 13, 2025, 02:42:13 pm »
Btw why support full unicode when problem is only with ASCII?

craftOrBase: "CRAFT> \x01AIRBUS*058-1" becomes
craftOrBase: "기체> \x01에어버스*058-1" in Korean localization.
Those are not ASCII characters hehe, so we can't be moving through the string 1 byte at a time.

why is that the case?

Well, inside double quotes, certain characters need to always be escaped. When you give it a backslash, it thinks you want a literal backslash in there.

I did find a way to hack it. Instead of setting the scalar style to be "double-quoted", you instead force a "plain" scalar style, and manually wrap it in double quotes. Then it won't automatically escape backslashes during saving, while correctly resolving the escaped hex code during the next load.

So a workaround was made to save them as `char`.

...why not simply cast it to Uint32 then?
Anyway, is it possible for us to change those strings to numbers now to avoid having to escape those non-printable characters? Or are there backwards-compatibility issues?
« Last Edit: January 13, 2025, 02:44:04 pm by Delian »

Offline Yankes

  • Global Moderator
  • Commander
  • ***
  • Posts: 3396
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #189 on: January 13, 2025, 03:56:35 pm »
craftOrBase: "CRAFT> \x01AIRBUS*058-1" becomes
craftOrBase: "기체> \x01에어버스*058-1" in Korean localization.
Those are not ASCII characters hehe, so we can't be moving through the string 1 byte at a time.
But whole point of utf8 was is "compatible" with ASCII? you will never find `\x01` byte outside ascii char with value `1`.
All multi byte chars have all biggest bite (128) set. Only utf16 and utf32 could be confused as it can have zeros or other bytes in char.

This mean we can scan only for ASCII with value 1 or 2 and hack it, other can stay as is.
If we have some Unicode "unprintable" characters it will stay as is as this is orders of magnitude harder to handle.

Online Meridian

  • Global Moderator
  • Commander
  • ***
  • Posts: 9235
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #190 on: January 13, 2025, 04:07:20 pm »
...why not simply cast it to Uint32 then?
Anyway, is it possible for us to change those strings to numbers now to avoid having to escape those non-printable characters? Or are there backwards-compatibility issues?

We want "new" OXCE saves to work with older OXCE versions... and also with OXC... so far we were able to maintain that compatibility (which is almost unbelievable).

If we can't find (yet another) workaround, then I would prefer keeping the saves as they are now.

Online Delian

  • Commander
  • *****
  • Posts: 589
    • View Profile
Re: Converting OXCE to rapidyaml
« Reply #191 on: January 13, 2025, 09:51:51 pm »
Ok. I rewrote the last solution then. When the setAsQuoted() is called, it checks if the node's scalar contains any non-printable ASCII characters, and if it does, it replaces it with a new one where the non-printables and special characters are escaped. All 2+ byte UTF-8 characters are assumed to be printable.
« Last Edit: January 14, 2025, 04:48:07 pm by Delian »