Yeah you read that right, I’m reading back from the card. yes, I feel kinda dirty. What am I talking about? (skip this is you are gfx coders…)
***generally speaking games create ‘textures’ in memory on the graphics card, so the data is actually stored there. We write data *to* the card, and then we forget about it. We tell the card to draw chunks of that data to the screen, and it does so. What you never do, is read back *from* that same data. In other words, you draw stuff to the screen, but have no way of actually looking *at* the screen from back where you generally are in CPU / RAM land. The reason for this is everyone understands this to be slow, and there are very few reasons to do it***
I have some technique, the details of which I won’t bore you with, which requires me to draw the scene in a certain way, then blur that scene, and then check the color value of specific pixels. I cannot find any way to do this without reading back from the video card. I should say this is for Gratuitous Space Battles 2.
Theoretically, I could maintain an system-memory only version of the scene, render to it there, blur it, and read from it without ever touching the card GPU or card RAM. This would mean no sneaky using that video card bus to do any data transfer. The problem is, I suspect this would be slower. The GPU is good at blurring, and rendering, and in fact, all of the data I draw to the scene is in gfx loaded in the cards RAM. Make no bones about it, I have to compose this scene on the card, in card RAM. And if I want to access specific pixel colors, I need to get that data back.
So what I’m doing now is a call to GetRenderTargetData to grab the data and stick it into a system memory texture I created earlier specifically for this purpose. BTW did I mention I have to do this every frame? Once there I call LockRect() on the whole texture, and then quickly zip through my list of points, then UnLock() as soon as I can. So what happens?
Well if I look at the contention analysis in Visual Studio, it shows me that this calls a lot of thread blocking. It’s pretty much all of the thread blocking. This is clearly sub-optimal. But if I look at the actual game running in 1920×1200 res in FRAPS, the whole thing runs at a consistent 59-60 FPS. My video card is an Nvidia Geforce GTX 670. In other words, it really isn’t a problem. Am I over-reacting to what was once a taboo, and now is not? Are people calling LockRect() on textures just for giggles these days? Is my engine sufficiently meek that it leaves plenty of spare room in each frame to put up with this clunky technique?
I’ve also considered that I may be screwing up by doing this close to the end of a frame (sadly this is a requirement of my engine, unless I let a certain effect *lag* a frame). If it happened mid-frame I suspect the thread-blocking that prevents the end frame Present() wouldn’t be so bad. Sadly I can’t move it.
I’ve also wondered if a series of smaller LockRects() that don’t fill the screen might be quicker, but I doubt it, I think it’s the mere lockiness, not the area of memory that matters. I can easily allow the effect to be toggled under options BTW, so if it is a frame-rate killer for some people, they can just turn it off.