A small tool to measure draw calls

Discussion in 'Track Modding' started by K Szczech, Aug 31, 2012.

  1. K Szczech

    K Szczech Registered

    Joined:
    Oct 5, 2010
    Messages:
    1,720
    Likes Received:
    45
    Here's a small d3d9.dll file that I've been using to measure draw calls: d3d9.zip

    Put it in Core directory (where rFactor2.exe is) and when you launch rFactor it will produce d3d9log.txt file.
    This version only reports number of draw calls per frame.

    What's it for?



    When rendering a scene, CPU selects objects that are in view, prepares textures and shaders for them and gives to GPU for rendering.
    Let's assume you have 100k triangles in your scene. You tell GPU to render all of them and it takes some time.

    Now you split it into 10 pieces, 10k triangles each. Now you look at that scene from some point and CPU decides that 6 objects are visible. It will handle these 6 objects and ask GPU to render 60k polygons.

    Now you do that again, but you split your scene into 100 pieces, 1k triangles each. It turns out that we can now select geometry that's in view more precisely. Last time it was 6 objects visible, but it turns out some of them were partially in view. When we split our scene to 100 objects it turned out that not 60, but 52 are visible.
    So we need to handle 52 objects now, but only 520k polygons.

    Now we try again, splitting our scene into 1000 pieces, 100 triangles each. It turns out that not 520, but 492 objects are in view. So our CPU prepares textures and shaders for 492 objects and GPU must render 49200 triangles.

    Ultimately, you would split your scene into 100k pieces, 1 triangle each. This time you would get 47213 triangles in view. CPU would have to prepare each one of them individually, and GPU would have to render 47213 triangles.



    As you can see, time needed by CPU to prepare objects for rendering grows linearily with number of objects:

    [​IMG]

    In the above example, our "fast CPU" needs more less half the time to handle the same number of objects.

    But let's look at polygon count in each case - it does drop at first, but then drops slower and slower. That's because only some objects are at the edge of view and only these are actually removed from rendering when we increase our "object in view" search precision by splitting geometry into smaller pieces.

    So, time needed by GPU is more less dependent on number of triangles and it falls non-linearily with number of objects:

    [​IMG]



    Now let's assume our scene is not split into too many objects:

    [​IMG]

    Now how to read this diagram:
    If you want to see how much time is needed to render that many objects on a system with fast GPU and fast CPU - check where that yellow vertical line crosses "fast GPU" and "fast CPU" line and choose the one that is the highest, because this is as much time as this system will need (slower component limits performance).

    As you can see, both slow and fast CPU can handle that small number of objects in some time, but both fast and slow GPU's need some time to render that number of polygons. Everything is fine, because graphics works as fast as GPU can handle it.

    So we get the following rendering time:
    Slow GPU, slow CPU = 4,0 ms (25 FPS)
    Slow GPU, fast CPU = 4,0 ms
    Fast GPU, slow CPU = 2,0 ms (50 FPS)
    Fast GPU, fast CPU = 2,0 ms


    As we split our scene into smaller pieces we get this:

    [​IMG]

    We helped the slow GPU a little, gaining 0,4ms and we ąre runing at more less 28 FPS with both slow and fast CPU.
    With fast GPU we also gained 0.15ms but only with fast CPU.
    It turns out slow CPU cannot handle that number of objects in the same time fast GPU is rendering them. So it turns out slow CPU / fast GPU owners will actually have framerate drop while others see framerate gain.

    But this is still acceptable optimization - we gained something on most systems, but lost something in a specific case, when there is fast GPU running on slow CPU.


    Now let's split our scene into even smaller objects:

    [​IMG]

    This time, only fast CPU can keep up with slow GPU. In all other cases graphics performance is CPU-bound.



    Optimizing your track is like searching for that sweet spot where "GPU time curve" crosses "CPU time line".
    The problem is, different systems will have different GPU-CPU ratio and their "sweet spot" will be at different number of objects.

    You may spend days optimizing your track and get something that works best on your system, while creating heavy performance issues on other systems.


    Considering that "GPU time" has a slow, non-linear falloff and "CPU time" goes up linearily, it's better to stay on the left - with less objects that seems optimal.



    Now a few comments:
    - by "CPU" i mean processor and memory actually, but in general it means the processing power.
    - "number of objects" is a bit vague term but it was only to give you an idea without going to much into details (draw calls, render state changes and current rF2 optimizations) - I'm leaving that for a .pdf I'll post later.
    - there are other factors at play here - for example, collision detection algorithms may prefer bigger number of smaller objects but I haven't performed any tests yet to see how it's actually done in rF2


    This dll file will report draw calls, so it should give you an idea of how much CPU usage your track yields (always test it with single car). Enabling reflections and shadows boosts number of draw calls because your objects need to be drawn into shadowmaps and reflection mappers.
    My suggestion is to try to stay within 2000 draw calls with maxed out settings. ISI tracks usually are between 2000-2500 but in the worst frames and they are still pretty much CPU-bound on my system with a single car. With larger grids of cars it only gets worse.


    So the last question, how to decrease number of draw calls?
    Combine nearby objects into one, apply one material to them and put their textures onto one bigger texture.
    For example - different kind of trees do not necesarily need to have separate materials. Or perhaps buildings standing next to each other could be actually modelled as one mesh with one material (like you were modelling one building).



    I'm leaving for weekend now, so I won't be able to respond, but feel free to ask questions - I'll answer when I get back (probably monday).
     
    Last edited by a moderator: Aug 31, 2012
    2 people like this.
  2. feels3

    feels3 Member Staff Member

    Joined:
    Sep 4, 2011
    Messages:
    1,201
    Likes Received:
    142
    Thanks K :)

    I did one lap at Croft and I have draw calls from 700 to 2900.
    It looks like that my track isn't optimized very well, but lot of poeple have more fps than on ISI tracks.
     
  3. JJStrack

    JJStrack Registered

    Joined:
    Dec 23, 2011
    Messages:
    469
    Likes Received:
    9
    Hey feels,
    is there an exceptional number of draw-calls when you come out of tower-bend? that always is the spot on croft-circuit where i get the worst fps performance. the fps always drop massively (from approx. 80 to 50) while i am between the apex and the curbs at the end of the corner.
    To K Szczech: i think it is awesome how you try to pass some of your knowledge on to us! thx a lot!
     
  4. feels3

    feels3 Member Staff Member

    Joined:
    Sep 4, 2011
    Messages:
    1,201
    Likes Received:
    142
    At towe-bend I have ~2000-2100.

    Worst places at Croft are in last three corners and in first three. I have there almost 3000 draw calls.
     
  5. MJP

    MJP Registered

    Joined:
    Oct 5, 2010
    Messages:
    988
    Likes Received:
    21
    Have you got some kind of onscreen display, how do I tell which numbers correspond to which part of the track or even which numbers are when I'm actually on the track as opposed to rF2 loading/sitting in the garage before you goto track/rF2 exiting etc?
     
  6. Jka

    Jka Member Staff Member

    Joined:
    Jan 31, 2011
    Messages:
    954
    Likes Received:
    213
    Wow!

    Thank you, K! :cool:

    This is very, very usefull utility.

    Cheers!
     
  7. feels3

    feels3 Member Staff Member

    Joined:
    Sep 4, 2011
    Messages:
    1,201
    Likes Received:
    142
    I read values from d3d9log.txt file. If you want to check one part of track, take a car and go there where you want, stop the car and push esc. In txt file last values will be from that place.
     
  8. MJP

    MJP Registered

    Joined:
    Oct 5, 2010
    Messages:
    988
    Likes Received:
    21
    Been playing around with it a bit more, think I can tell when I'm actually on the track, this is what I get for 1 lap of Croft.....
    Draw Calls Croft.gif
     
  9. anthing

    anthing Registered

    Joined:
    Jan 12, 2011
    Messages:
    45
    Likes Received:
    0
    could we get realtime onscreen display for this? :) I think it would make its purpose even more for optimizing tracks and everything.
     
  10. K Szczech

    K Szczech Registered

    Joined:
    Oct 5, 2010
    Messages:
    1,720
    Likes Received:
    45
    Hehe, guys - don't get too serious about this tool - like I said, it only reports number of draw calls.
    Time needed by CPU to perform a draw call may vary a lot and rF2 does optimize for render state changes.

    For example - let's asume your .scn file contains 10 GMT files with materials A and B, and 10 GMT files with material C and D, placed in random order.
    This means we have 10 portions of geometry with material A, 10 with B, 10 with C and 10 with D.

    If you render them as they go, you may find yourself performing 40 draw calls switching textures and shaders all the time. A set of (currently selected for rendering) textures, shaders, blending settings and other stuff is called render state.
    Now, if you change your textures before each of these 40 draw calls, you will end up with 40 render state changes, each followed by a draw call.
    Each render state change is far more costly than a draw call, so it will work slow.

    But rF2 will sort these portions of geometry and only change render state 4 times, rendering 10 portions of geometry for each render state change.

    You will still have 40 draw calls, but only 4 render state changes. So you may say that first draw call out of ten will be slow and the following 9 draw calls will be handled fast by CPU.
    Garage doors are a good example - that may be 20-30 draw calls, but all without changing render state, so it may actualy require as little CPU as 2-3 draw calls with render state changes.

    So like I said - draw calls are not the best measure, but they give you a general idea.

    Changing a render state and issuing a draw call can cost CPU similar amount of time GPU needs to render a few thousand textured triangles, but issuing another draw call with the same render state may only take time equivalent to rendering a few dozen textured triangles.

    Yea, that's the whole trick - splitting tracks into smaller pieces slightly offloads GPU while increasing CPU load. As long as CPU had some reserves on given system, it will increase performance (GPU curve on my diagrams). Unfortunately at the same time, it may cause noticeable performance drops (CPU line on my diagram) on systems where graphics was allready CPU-bound.

    Since Croft is a pretty flat area with not many buildings, it will not require a lot of GPU power (lesser area on screen that needs to be covered with polygons). So as long as your CPU keeps up you can have better performance.
    Tracks like Spa, Algarve, Monaco - these have many hills, trees or buildings that occupy most of the screen.

    If your system is GPU bound with performance, these will be key factors determining framerate. If you cross that sweet spot with number of objects you start to get CPU-bound and it doesn't matter if you render big and complex objects or small and simple ones - you're underperforming anyway.

    I think it's always good to leave some reserves on CPU - disproportion between GPU's and CPU's will only grow in future and like I said - being on the left side of the "sweet spot" is being on the safe side.
    Another reason for this is that cars are usually small objects on screen (one or two nearest cars may occupy some noticeable space, but others will be relatively small). Modern GPU's can chew up that number of polygons quite easily and small space occupied on screen by each car will not require a lot of memory throughput to make changes on screen. But each of these cars will still have many separate parts - brake lights, tyres, suspension, driver, helmet, wings - you name it. It means that GPU will handle it well, but CPU will need to issue some draw calls for each car.

    So rendering a lot of cars offsets the performance balance more towards being CPU-bound. A grid of 26 Formula ISI cars with maximum graphics settings can cost approximately 5000 draw calls. So it's nice to have some CPU reserves for that :)


    Well it would make sense if my tool would take render states into account and display that aswell, but maybe I'll find some time for this later.
    I only made this tool to perform some tests on rF2 - putting different content into it and watch how rF2 handles it.
    The reason why I shared it with modding community was more about making modders aware that content design can have huge impact on CPU usage and in turn, encourage them to pay more attention to a good CPU-GPU balance.

    So this tool is kind of a motivator -"hey, ISI's track can do it under 2500 draw calls, let's aim for that aswell" ;)

    Otherwise we may end up with mods that run horribly on system specs given by ISI as minimum.
     
    Last edited by a moderator: Sep 3, 2012
  11. osella

    osella Registered

    Joined:
    Jan 11, 2012
    Messages:
    864
    Likes Received:
    26
    Very interesting, I do not intend to make tracks but its always great to learn about something that has impact on performance.

    On a related note I read an interesting discussion about optimizations of all software nowadays in general (it was a czech forum, forum.pctuning.cz ,despite the name its not about casemodding, there are good serious articles) and the reality is that today very little time is spent on optimizing anything, developers are forced by their management to deliver required features as fast as possible and not care about optimizing much simply because it costs extra money that wouldn't generate much more sales.
    Reason is: hardware is cheap today. 15 years ago sw HAD to be optimized well, as otherwise only people with 4000$ hw would be able to run it.
    While today it is estimated that all the hw demanding games like crysis2, metro etc. could easily have DOUBLE fps on most systems if few weeks more were spent on optimizing. Imagine that, having 60fps instead of 30 on ultra details just because of "small" changes in code..
    on the other hand one could argue that if todays sw would run too well, gpu developers wouldn't be forced to deliver faster chips so we would be stuck with slower gpu, hard to say what would be better.
     
  12. K Szczech

    K Szczech Registered

    Joined:
    Oct 5, 2010
    Messages:
    1,720
    Likes Received:
    45
    If it's about money then it's more cost-efficient to buy an engine like CryEngine or Unreal Engine.
    But yes - it's often true that game developers are just required to provide a product for sale and it only needs to make good first impression. With hardware being developed so fast no one will tell that game is squeezing 30% or 60% out of that hardware. When seeing game running at some framerate people will just think that that's the way it is.

    There are of course some studios where it's not just about pushing product to the market and development is entrusted to programmers. That's why these companies are world leaders and sell their technologies to other game developers.

    I'm willing to bet that this estimation was made by person or people with little in-depth knowledge.

    I can agree that engines used in many RPG games like Skyrim or Risen could probably run twice as fast, but at the same time I don't think that it could be acheived with "a few weeks more". Same goes for rF2.
    But engines like Crysis, Unreal, ID Tech or Frostbite? No, I seriously doubt you could squeeze more than 5-10% more out of these and it wouldn't be easy and maybe even require some compromise on quality.

    To optimize a game properly you must work on multiple levels. There are some stuff that you can just add to engine, but they're usually "low level" optimizations and many games have some of these allready. Improvements in that area can be quick but only have very limited impact on performance.
    The more high-level optimizations you want to introduce, the more changes to the game engine design, content design, and content creation tools you will have to make. Impact on performance can be pretty big in this case but definitely requires more than "a few weeks" of work.

    High level optimizations are all about "doing things the smart way" - this is my favourite area of research nowadays. The idea of modelling two nearby buildings as one mesh is an example, but this is a much bigger topic (instancing, "uber shaders", imposters, re-usable lighting, etc.)

    We owe progress in GPU market mostly to competition between hardware vendors. For the same reason graphics drivers can eat up all unused cores on multi-core CPUs even when running older or poorly optimized games.

    Customers don't care where the difference comes from - if poorly optimized game runs faster on your competitor's GPU just because he did the "multi-core trick" in their drivers, then they are going to buy that GPU instead of yours. You cannot put the blame on game's design even if it's the truth. Customers don't want the truth - they want results.

    So I'd rather say that games force graphics card manufacturers to optimize their drivers rather than their hardware. Chase after better hardware goes on anyway.
     
    Last edited by a moderator: Sep 4, 2012
  13. marcatore

    marcatore Registered

    Joined:
    Sep 24, 2011
    Messages:
    51
    Likes Received:
    0
    @K Szczech
    Your infos are very very interesting.
    I hope you could be able to do one or more thread to stick with some tips to help us to create better mods (cars or tracks) in the graphic side.
    I think it should be done by ISI, but I'm not too much sure that they'll do.... so any tips I think it's very very well appreciate.

    Thanks again for your knowledge sharing.
     

Share This Page