Sorry if I came off as naive, I'm pretty sure from a technical perspective the 8-to-32 and 32-to-8 colourspace conversion is easy on the GPU.
What I mean by RGB-to-CLUT is this:
A far as I can tell an RGB to CLUT fragment shader is reasonably simple. It's a reverse lookup which is extra work, but GPU time is cheap.
Pass in the colour lookup table to the shader as a uniform array of vec with 256 elements. You need to supersample the frame buffer to get a colour area around each pixel. You could used fixed pattern or random sampling depending on how you wanted to dither. Then you can iterate the fragment and pick an indexed colour, again there are a couple of popular options here. The result is a 32bit source image rendered to an 8-bit framebuffer with an 8bit palette. OpenGL looks like it has a few eight bit pixel formats and the 8-bit unsigned one should be the same as the SDL_Surface pixel data.
You can pass in a second uniform parameter containing a second output palette if you want to support palette cycling. It will likely produce some artifacts if you don't mask out non-cycling areas of the original palette and are doing complex shading, but I wouldn't expect nearest-neighbour sampling to have any problems which would be a good starting point and retain compatibility with all the existing assets.
I used RGB-CLUT as a shorthand, (for the operation performed as a postprocess in the fragment shader) sorry if I didn't make that clear.
I've not written the code yet so I've not worked through all the problems, but that process can be used to output a 32-bit render pipeline onto an 8-bit render target. With the option of reading 8 or 32 bit source assets which is good for content creators.
It's the same process that photoshop goes through to palletize an image, except you can do it in milliseconds on the GPU. I'm sorry if this comes across as having no idea what I'm talking about, I am still getting to grips with the SDL and OpenGL terminology.
8-bit to RGB
Likewise going the other way is trivial since you can use the lookup table, and rendering an 8-bit source to a 32-bit framebuffer just needs the same CLUT data. You'd create a texture buffer with an 8-bit unsigned pixelformat and pass the CLUT as an uniform vec array to the fragment shader, outputting to a 32-bit RGBA target.
You could convert every asset as it loads, saving it in a 32 bit surface or more likely just pushing it to VRAM to use later. (more on this later, you'd be using the gl ETX methods from SDL_Opengl.h so I'd want to check what the compatibility is on those)
After that you can perform all render steps in 32Bit, and present back to an indexed palette using a the post-process fragment shader above.
The shading differences between RGB and CLUT are irrelevant if you maintain 8-bit source images and render to an 8-bit palette. With nearest neighbour filtering the first iteration would look pixel-perfect to the original.
But doing so opens up the option of additional effects e.g. scale/rotate/shear that just aren't possible (read impractical and slow) with 8-bit indexed surfaces. My example being the baseview, where a 32-bit render pipeline would allow easy scroll/scale to support extended base sizes and configurations.
It's also likely to be faster to draw to use 32-bit on a lot of modern platforms. Speed isn't an issue asn OpenXCom isn't really bottlenecked, but keeping things fast is always good.
I'm aware of the colour cycling that XCom uses, but it's not too difficult to emulate the effects in native 32bit. XCom seems to rarely, if ever, rely on complex cycling patterns. There are a few flashes and simple effects that are as trivial in 32bit as they are with a colour look up table.
Given that you can still cycle palettes with a 32-bit render pipeline as you are rendering to a colour look up table, palette cycling could still be supported. My opening gambit is the baseview, where complex palette cycling isn't required and that will provide a platform to iron out the problems.
Given memory capacity has increased by several orders of magnitude since the mid-90s, wanting images to look different on different screens is likely to be irrelevant as you can have a duplicate of each image customised for each context.
But given I think I can still colour cycle with a 32-bit render pipeline during the CLUT post-process step, we would have the option of multiple assets, or single assets with cook-time, load-time or runtime recolourisation. You've got the option of using original source assets but could load new 32-bit assets for new content without breaking the pipeline.
Again, this is all speculative - I've not written the code yet.
Given you can do palette cycling, and a number of other effects, in the fragment shader and we have essentially no memory limit (compared to the mid-90s) There doesn't seem to be a (technical) advantage to the eight bit render pipeline, and some reasons not to.
32 bit rendering adds up more options in the baseview, you could improve the geoscape a lot, and there are loads of battlescape options - smoke/fog for example, and the ability to have a lot more assets on screen - as well as zoom levels could be tried.
Thanks for the link to the topic about shading discussions. I think there are a couple of good points raised there and while they are mostly concerned with the battlescape, a lot of the same things apply. I'll consider those and see if they can be addressed.
Other problems I can see are hardware compatibility. The SDL_OpenGL header file looks like some of the framebuffer stuff uses the EXT methods which are old. You might need two versions of the renderer to support both generations of hardware, although TBH I suspect there isn't a target platform that doesn't support all the features we'd need but I just don't know.
I'm happy to have a discussion on any level about the render pipeline changes and requirements, and have had a little experience both with index'd palettes and modern GPU programming.