That is called Z-buffer. A thing 3d hardware implements very efficiently.
But in case of tile-based isometric game with oversized sprites, you want to be very careful with Z-Buffer, so objects wont clip through walls. I.e. you want to render your giant units with Z-test turned off, just like you render HUD in FPS games to avoid it clashing with objects on screen. Then there are also transparent surfaces, which still require tricky sorting. Unless you do physically based rendering, where actual light rays travel the scene.
Although games like Warcraft III turned the clipping bug into a feature, providing buildings with foundation, that can be unearthed on uneven terrain.
This is correct, if done poorly then big objects will be problem, but if we use tiles then on load we can check if any pixel do not "escape" 3d box where it should be.
Then if it clip then it should clip, this WC3 example is why I think this is very good function why try use this approach. In some special cases I would like if unit clip through other objects. Water, high grass, mud, waterfall, ivy, bushes. magic portals etc.
Transparency is indeed problem, but I think of work around. Instead of multiplicative transparency use additive, with it order in each surface is draw will be irrelevant. First you will draw all not transparent objects, after this transparent.
To have reasonable speed this will probably require manual use of see4 intrinsic because compilers sometimes miss some transformations (same OXCE code have 3x time speed difference based on different GCC version).