I’m doing early work on my next game, a completely new IP. I’ll be announcing it in a few months. Anyway… it involves drawing a big world with a LOT of objects in it. tens of thousands on screen probably. Drawing 10,000 objects in 2D is not as simple as you think.
If you are a non coder, or someone who only ever uses middleware, you might think ‘the new video cards can draw 10,000,000 polys per frame, what’s the problem? and indeed there is *no problem* if you want to set a single texture and then splat 5 million sprites on the screen that show it. Thats one (well…probably several) draw call, one render state, one texture. Even really old video cards like that.
The problem is when you have a lot of different textures and want to swap between them, because for engine-related reasons, you need to draw stuff in a specific order. In a 3D world, you can use a Z-buffer and draw out of order, but with 2D objects with some soft aliased edges, that looks BAD. The good old fashioned painters-algorithm is your friend. The problem is, if you draw back to front and the sprite textures needed go A B A B A B, you are kinda fucked…that means a lot of texture changes, and in directx9 (which I use for compatibility reasons), texture changes mean separate draw calls, which stalls the video cards, and is sloooowwwww.
Relevant video from GSB2:
So what are the workarounds?
Texture atlases. This is the obvious one. Stick A & B in the same texture, and you are laughing, suddenly stuff is a LOT quicker. This only solves the texture issue, not drawing to different render targets, but you can defer those draws anyway and do them separately (GSB 2 does this). Texture atlases are an obvious ‘win’ even if they only halve the texture changes. The problems here are that you either need to know what textures will follow each other and pre-compile texture atlases (something I’m trying right now), or you need to dynamically create texture atlases based on recent drawing, and effectively use an off-screen render target as a texture ‘cache’. I tried this recently…and it was actually slower :(
Dirty-rects. Basically draw the whole scene once, and save it in an offscreen buffer, and use it as your background, a single quad blaps the whole screen, and you only draw stuff that has changed / is animating. This, as I recall was used by sim city 4. The only problem is that scrolling really causes hell.
Intelligent grouping. The painters algorithm is only really needed where stuff overlaps. if I draw a tile, then draw a sprite on top of it, I need the tile first, but there is no reason why I can’t draw all the tiles first, then the contents. That means I can then sort the tiles by texture and draw them in a handful of calls (or one, if the tiles all fit into an atlas). You can do this at pretty much any level, effectively drawing ‘out-of-order’ but with caveats. Again, GSB2 does this, with various objects, especially debris and asteroids. In fact it goes one stage further by scanning ahead with each draw call to see if some non conflicting later objects could be ‘collapsed’ into the current draw call.
Multi-threading and other speed boosts. If you have too many draw calls and things are too slow, then you can expand on the time available to make draw calls. Essentially you have two threads, one which prepares everything to be drawn, and the draw-call thread, which makes all your directx calls. This way they both run in parallel (also note that the directx runtime will be another thread, and the video card driver another one, so you have 2 threads less than you think. With a hyperthreaded 4 core chip, you have 8 truly simultaneous threads, so you give away 2, have 1 core thread, 1 render thread and 2 extra ‘worker threads’ spare. Because of my own disorganisation, I tend to have directx called from my main thread, which means I do the inverse. GSB2 did this, with all of the transformation stuff for the asteroids, debris and other bits and pieces handed to a bunch of threads while I was busy with other stuff, then returning to the main thread to present the draw calls. Less efficient, but way better than single threading.
Hybrids. All of the above techniques seem valid to me. Although I am currently fixated on pre-compiled texture atlases, I’ll definitely use multithreading and probably some of the others. With some parts of a game, a specific optimisation system may work well, and with others it could be useless. It really is specific to what you draw, and is why I prefer a hand crafted engine to a generic one.
My basic problem is that (without explaining what the game is), I have a small number of ‘components’ that make up a tiny scene on a tile. There will be a lot of components per tile, as I want a fairly ‘busy’ look, but rendering them all individually may be ‘too much’. What I may end up doing is pre-rendering each conceivable tile as an offline-step, to reduce the number of calls. I’d like that to be a last minute thing though, so I can keep editing what the scenes look like. IU also want sections of each tile to animate, or be editable and customizable, which means there is less scope to pre-render them.
It will make a lot more sense once I announce the game :D