Heres the other ‘big thing’ that sometimes causes 2D games like mine to slow down. Not drawing too many pixels, but changing what image is being drawn from too often. Video cards are great at drawing tons and tons of triangles using the same image to copy the pixels from. changing which image is ‘active’ will cause a major stall in the video cards rendering operation and waste time. Bizzarely, AFAIK you can only have ONE texture as the active texture for any specific moment (without doing multiple textures in different ‘stages’). So regardless how many pipelines your card has, you can’t be drawing from two different textures to two different polys at once. (is that right?)
Worst case scenario:
You render 200 sprites on screen. half have texture “ogre.bmp” half have “elf.bmp”. They have to be drawn from back to front, so that nearer ones obscure further ones. They are positioned (in distance from you)
Elf / Ogre / Elf / Ogre / Elf….
Etc. This is hell. because it means changing the texture 100 times. Ideally what you would do is get the video card to sort this stuff out. You would use Z values (distance from the viewer), and send all your sprites to the card with the right Z values and let the card sort it out.
Thats easy if you have a nice modular engine where every single rendering call goes through some nice sorted system where everything drawn on the screen is esentially some offshoot of the same base object. if you only ever draw sprites, then just send a huge bunch of Z-positioned sprites to the card and click go.
Unfortunately few engines work that simply, because there is no ‘one size fits all’ drawing object. Text is often composed of thousands of 2-poly characters, best sent as a vertex buffer. Some lines and primitives are drawn using direct 3dDevice rendering calls. And some are sprites with different render states. This is where it gets horribly messy. I’m slowly, with each game getting closer to a system where I am not blindly just rendering over the top of myself and hoping for the best. I can at least now bunch up a load of sprites called from different places, and have them drawn with a single call. What I don’t have is a perfect system that auto Z-sorts my sprites by position and texture, and makes the most efficient calls. maybe some 2D games have such a system, but I’m assuming most of them just don’t do enough fancy drawing for it to be an issue (or they ignore backwards compatibility with slower cards).
If I was a bigger company with a dedicated graphics programmer who just worked on the engine, I’d have a better system, but I’m still a one-man show doing everything, and there just isn’t time :(