Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - hmaon

Pages: 1 [2]
16
Fan-Stuff / Re: what happened to "The Two Sides" art work?
« on: February 26, 2013, 06:46:21 pm »

17
Programming / Re: HQX and scale2x filters edit: Now with OpenGL shaders
« on: February 17, 2013, 12:27:37 am »
Only 5 attachments per post?!?!?!? This is an outrage!


18
Programming / Re: HQX and scale2x filters edit: Now with OpenGL shaders
« on: February 17, 2013, 12:25:33 am »
I've copied the OpenGL output code from bsnes and jammed it into OXC. It's faster than scaling the image on the CPU and it trivially allows the use of a bunch of shaders from other authors. I'm attaching some screenshots. Some are nice and subtle, like Scale4xHQ, while others are ugly as hell (bloom!)

19
Programming / Re: Profiling OXC (and optimizing some code a little)
« on: February 17, 2013, 12:15:50 am »
Thanks, _michal; that sounds handy. Also, I'm going to need more changes to the gitbuilder makefile at some point, please! Namely, -msse2 when compiling and -lopengl32 when linking.

Yankes, I tried to vectorize StandartShade to at least read 64 bits at a time but I didn't see any performance gains. You're probably right! Thanks for your insight.

What I've done instead has been to copy the OpenGL output code from bsnes and jam it into OXC. The result is 400 fps at 1280x800 while filtering the image using hardware shaders. Code is here: https://github.com/hmaon/OpenXcom/tree/opengl Screenshots of shaders in other thread.

20
Programming / Re: Battlescape development
« on: February 10, 2013, 11:33:39 pm »
I made the binary saves work. Part of the problem was that YAML doesn't seem to actually implement base64 decoding. I had to rip some code out of libb64. Another problem was some confusion when saving 32-bit -1 to an unsigned 16-bit int and then reading it back as 65535. Good times.

Here's the code: https://github.com/hmaon/OpenXcom/tree/binary_saves

I've made a pull request.

21
Programming / Re: HQX and scale2x filters
« on: February 09, 2013, 07:30:49 pm »
*shrug*

Yes, the filters are already controlled by options in the config file.

22
Programming / Re: HQX and scale2x filters
« on: February 09, 2013, 06:24:32 pm »
That's about the response I expected.  ;) At least a couple of people like it though so the relatively small effort was worth it to me.

As someone request on IRC, I'll try to add another option -- hq2x at 4x scale. Then also maybe 2xSAI.

23
Programming / HQX and scale2x filters edit: Now with OpenGL shaders
« on: February 09, 2013, 03:53:24 am »
After getting all the FPS I could for now out of the game, I've taken steps to make it slower instead. That is, HQX and Scale2x filters are now an option with this code: https://github.com/hmaon/OpenXcom/tree/remix

The results are a little weird but kind of nice at times. Screenshots attached!

edit: OpenGL stuff in a reply below somewhere

24
Work In Progress / Capture aliens and brainwash them to join the XCom cause
« on: February 05, 2013, 07:32:37 am »
This code is kind of in its infancy. Also, yes, I know it's a stupid idea. I like stupid ideas.

Here's the branch: https://github.com/hmaon/OpenXcom/tree/recruit_aliens

Right now you just have to have an alien mind-controlled at the end of the mission and he's yours. Future plans include some kind of special brainwashing facility. Then perhaps a cloning facility. After that, chryssalids break out and try to overrun your base, I guess.

TODO: Perhaps don't let the player take off the alien's skin and replace it with XCom armor via the armor menu? Perhaps mutons can still wear armor, though? What's the point of having aliens on your side? Maybe there should be some new special mission requiring an alien infiltrator? I dunno, it's kinda dumb, like I said.

This branch is also the testing ground for on-demand loading of .SPK files. I'm using it to load inventory sprites for aliens. (Also works if they're just regularly mind controlled!) I got the sprites from another mod, the inventory screens here: https://openxcommods.weebly.com/downloads4.html So, thanks for that!

A couple of screenshots are attached.


25
Programming / Profiling OXC (and optimizing some code a little)
« on: February 05, 2013, 07:21:03 am »
Hellope. I did some profiling on OpenXcom and I've been working on trying to speed up some critical sections.

I used callgrind (a valgrind tool) to do the profiling. It seems the easiest approach even though the actual game runs excruciatingly slowly under valgrind's emulation. Except single-digit FPS. Look at kcachegrind's pretty output, though:

This is a human turn plus an AI turn of a base assault: https://bumba.net/~hmaon/OXC_callgrind_kcachegrind_one_turn_base_assault.png
The method individually using the most CPU in the battlescape is obviously the shader. Then, curiously there's SavedBattleGame::getTile() and then _zoomSurfaceY().

I bet there's something that could be done to speed up the shader code but I actually don't understand it yet. I moved on to the other functions.

getTile() seemed to cry out to be inlined so that's what I did. I then inlined getTileIndex() along with it so the whole procedure can avoid a call. As you can see, getTile() gets called a lot. In this run, it was called over 68 million times. _zoomSurfaceY() gets called once per frame, I think; that makes 68513755 / 4031.0 = 16996.7 getTile() calls per frame on average. It's hard to say whether that's actually a lot; it's 1/4 of the pixels of a 320x200 window, though? It's not quite 5% of the CPU load. Then again, 5% CPU time in a single getter function, really?

Anyway, next I looked at _zoomSurfaceY(). It's responsible for stretching the 320x200 native resolution window to the display resolution (e.g., 640x400 or my preference of 1280x800). It's written to be a very general function to scale the image correctly to any arbitrary resolution given any pixel format. That allows for a lot of optimization in the special cases of x2 or x4 scale at 8bpp which seem like the most common use cases. I wrote two rescaling functions to read data as a 64 bit int and write it back as 64 bit ints (and then 32-bit versions of the same.) The results seems to have been an FPS increase anywhere from +10% to +100%. At 1280x800 on my particular laptop, the game went from ~70 fps to ~140 fps. Coincidentally the 32-bit versions of the zoom function are only slower by a couple of FPS. I'm not sure why -- write combining maybe? Could be the register spill I'm noticing in the assembly output on the 64-bit version? -- if anyone has some experience in this sort of analysis, please take a look.

Coincidentally, there's probably some opportunity to insert other filter functions here, perhaps copied from any of the many console emulators out there.

Finally, here's the profiler's output after my changes: https://bumba.net/~hmaon/optimized_zoom_function_profile.png
As you can see, getTile() is gone from the results and its most frequent callers from the TileEngine are the next in line. Also, _zoomSurfaceY() has fallen below TileEngine code in CPU use! From 4.68% CPU to 3.07% CPU seems like a nice change.

Of course, those figures are hardly scientific. I made hardly any effort to keep the two runs identical. There's also no demo that I could run the game through to help me repeat similar runs. I have to actually play the game at ~0 fps in valgrind's virtual CPU.

Oh yeah, _michal asked on IRC for a write-up of my profiling and optimization attempts.

The branch with my optimizations is here: https://github.com/hmaon/OpenXcom/tree/optimization_attempts
I've submitted a pull request for whenever SupSuper is done working on actual important stuff.

Suggested points for discussion:
1) What is up with the Shader code? How does it work? Anyone? How can it be sped up?
2) What's the deal with my coding style? Why is it such a mess?
3) How about some optimizations that I missed?
4) Can those TileEngine methods be improved somehow?
5) Shouldn't we just OpenGL to scale and filter the output? (Perhaps?)
6) Does ANYONE have a working PowerPC Mac? I bet my code is broken on big-endian systems right now but I have no computer to test on!

tl;dr: I made the FPS number go up a little; maybe someone porting to really underpowered hardware (or running debug builds) will care.

26
Programming / Re: Battlescape development
« on: January 10, 2013, 11:03:48 pm »
thanks for the hint.
So if you save a file and transfer it to a system with other endianness and try to load it there, that'll not work with that casting/pointer thing. So I prefer the one with the shifts, so you convert using same endianness?

Yeah. Just sure you're writing the integers for saving also using shifts and in the same order, of course.

Another way to make code endianness agnostic is with the htonl() and ntohl() functions, which may be faster and easier to read. Network byte order is big-endian but the byte swap can be implemented using one bswap instruction on x86 instead of four shifts. Also, the one 32-bit read from memory is likely to be faster than reading 8 bytes individually.

So,
Code: [Select]
i = ntohl(*((int*) (buffer+ptr)));
and the write operation:
Code: [Select]
*((int*) (buffer+ptr)) = htonl(i);

It's not a big deal though, I guess.

27
Programming / Re: Battlescape development
« on: January 10, 2013, 08:43:45 pm »
try this
Code: [Select]
buffer[ptr] + (buffer[ptr+1] << 8) + (buffer[ptr+2] << 16) + (buffer[ptr+3] << 24)

try also
Code: [Select]
*((int*) (buffer+ptr))or
Code: [Select]
*((int*) &(buffer[ptr]))
That is, assuming the number is always in native endianness. The code with the shifts and adds assumes a little-endian integer.

edit: Also, if you use the shifts and adds code, make sure buffer is declared unsigned char*. Otherwise you will have trouble.

Test code:
Code: [Select]
#include<stdio.h>
#include<stdlib.h>

int main()
{
    unsigned char buffer[] = {0xff, 0x7f, 0, 0};
    char caveat[] = {0xff, 0x7f, 0, 0};
    int ptr = 0;

    printf("%d\n", *((int*) (buffer+ptr)));
    printf("%d\n", buffer[ptr] + (buffer[ptr+1] << 8) + (buffer[ptr+2] << 16) + (buffer[ptr+3] << 24));
    printf("%d\n", caveat[ptr] + (caveat[ptr+1] << 8) + (caveat[ptr+2] << 16) + (caveat[ptr+3] << 24));
}

output:
Code: [Select]
32767
32767
32511

Pages: 1 [2]