Recurrent NVIDIA Driver Crashes and Erratic GPU Usage

I managed to fix (until now) the issue.

I realized there's a difference on how pCars, Assetto Corsa and other games "talk" to Nvidia driver in contrast to rFactor 2. Please have a look at the table below:

NwuttY.png


It seems rFactor 2 are "asking" for less GPU power demand even under heavy GPU usage. Whereas pCars delta between GPU usage and GPU power demand is 0.85% in average, rFactor 2 presents -10.62% of difference in average of power x usage. It's quite away.

The correlation matrix in both cases are useless since pCars raw data is very homogeneous, but we can spot moments while pCars are requiring 90% of GPU usage and GPU power is almost steady, always around 90% as well. In constrast, rFactor has recors with 90% of GPU usage and less than 75% of GPU power demanding!

This is a typical sample, all the time I have a crash the tables present this behavior. So I concluded that, for some reason, rF2 graphical engine is not requiring proper GPU power from PSU, the graphics board works cooler than could and when the game changes to a external camera (example) with close cars (detailed LODs), there's no power to supply the sudden demand and = gpu crash. It's something like GPU is "lazy" to wake up and wavers to demand more power which leads to a hardware reboot (black screen), although sound engine and CPU processing keep alive.

So, what I did? Downloaded EVGA Precision X16 (http://www.evga.com/precision/) which should be able to manage any board brand, not only EVGA, and set it up as following:

- Increased power target to 111% and set it to priority.
- Increased temperature target to 80º, which should be OK according to card manufacturer.
- Forced fans to work faster with a personal map (maybe not necessary)
- Increased GPU voltage in 6 mV (minimum step).

This increased the GPU power demand from my PSU and there is no more crashes in rF2 so far. When the game asks 99 - 100% of GPU usage, there is enough power to keep the software running now. Please notice it's not an overclocking procedure at all. It only will drain more eletrical power from unit to your board.

When I reverted to default EVGA settings, I had crashes again. Increased Power target, no crashes. So I believe it may work for other guys than me.

I cannot guarantee it will work for you but I think is a worth try. It's good to mention I have a good Corsair 1200W unit as well, so if you using a cheap PSU perhaps you'll encounter some trouble.

Add: both pCars and rF2 @ 1080px no Vsync.
 
Last edited by a moderator:
I managed to fix (until now) the issue.

I realized there's a difference on how pCars, Assetto Corsa and other games "talk" to Nvidia driver in contrast to rFactor 2. Please have a look at the table below:

Those are very interesting graphs, perhaps someone from ISI can comment on them. Although unless I'm totally wrong, a game doesn't ask for power from a graphic card, it would be crazy design philosophy to allow games decide how much power the card draws. This is handled by the GPU driver and hardware together. But perhaps the fact that rF2 is DX9 game and those other games are DX11 has something to do with it, I know that DX9 handles a few things less efficiently.
 
Yeap, and something even more strange: this fix works for my Bugatti Circuit forcing GPU power going higher but I confirm now it's not working for Le Mans 1991. In this track the GPU usage don't go under 97% but GPU power barely reach 85% whatever I do. Result: I can race in my track now but I can't do 1 single lap in Le Mans 1991 V (literally: tried 10 times and not a single lap finished).

In few words, GPU power doesn't match to GPU usage in LM track and video drive crashes.

Why can I force higher GPU power in certain tracks and others I can't is a mystery so far.
 
Did you try limiting frame rate in order to curb GPU usage ? I was experiencing high GPU load until limiting frame rate to refresh rate. Dropped it down to 60% usage. GTX 980ti classified. Obviously there is something going on that needs fixing but this worked for me for the time being.

Sent from my Nexus 7
 
Last edited by a moderator:
Thanks for sharing this information ECARS_Tracks! I know there are quite a few people struggling with the same problem you had so it will be interesting to see if this helps them too. In any case I am glad it resolves your problem. :)
 
Thanks for the studying and information.

I did the trial and could run (AI mode) 20 laps without crash. Stock GPU/MEM clock, Added power to max and voltage with total 35 AIs at Le Mans 1991. Will try drive myself later.

Edit:

Whatever I did, it still crashed.

Minus Mem / GPU clock to min,
Switch Phyx to CPU,
limit to 65 fps
reduce graphic settings to low.

Even the usage of both GPUs at ~80%, it still crashed at about 5-6 laps.

In addition, disabling TDR means I need to reset the PC myself as it crashed without returning PC control.
 
Last edited by a moderator:
Yes, I tried to cap my FPS through PLR and / or Nvidia Inspector but did not help.

I have two 980, tried to switch the main GPU control via BIOS and in Le Mans the issue is persistent. It's hard to believe I bought two defective boards.
 
Nvidia driver 353.30 and Wagnard TDR manipulator is the only way to get maximum performance in rFactor2, no crashes, stutters etc.
 
Nvidia driver 353.30 and Wagnard TDR manipulator is the only way to get maximum performance in rFactor2, no crashes, stutters etc.

I'll search on it.

Just got one changed variable from passed tests with EVGA Precision to the newer crashed ones: "prefer maximum performance" in Nvidia control panel. Running more tests.

Just to clarify, I'm understanding GPU power (W) is the percentage of maximum power draw from PSU (in the case of 980 GTX = 165 W).
 
Last edited by a moderator:
I'll search on it.

Just got one variable from passed tests with EVGA Precision and the newer crashed ones: "prefer maximum performance" in Nvidia control panel. Running more tests.

Try TDR manipulator with you current driver first it works fine to me but the problem with latest drivers is stutters and other hick ups.

Uninstall EVGA Precision x16 and use Display Driver Uninstaller (DDU).
Nvidia Profile:
Maximum pre-rendered frames =1
Power management mode = Prefer Maximum Performance.

Download link Wagnard TDR manipulator: http://www.wagnardmobile.com/DDU/download/tdr Manipulator v1.1.zip
Open TDR manipulator > (TdrLevel) > Disabled > Apply > restart your PC. Don`t mess with those other values.
 
Last edited by a moderator:
Limiting FPS was covered in posts 10&11 of this thread. The point is the potential 'fix' is related to power, not processing load (in isolation).
Sorry, must of missed it.

Sent from my Nexus 7
 
Last edited by a moderator:
I just like you all to know that this is not only Nvidia issue AMD/ATI cards has also this problem and you can find threads like this in every game forum. These errors are triggered by a Windows service called 'Timeout Detection and Recovery' (TDR) TDR is a feature of the WDDM driver model implemented first in Windows Vista in 2007-2008. Timeout Detection and Recovery error is nothing new.
If you are having problems with rFactor2 you should start with Wagnard TDR manipulator as I describe in my post # 53. If Wagnard TDR manipulator don`t solve your problems use stress test tool like HeavyLoad for your PC and if your PC don`t past the test you know what is the problem:( https://www.jam-software.com/heavyload/
You can find more help here: https://forums.geforce.com/default/...hat-is-it-an-fyi-for-those-seeing-this-issue/

:)
 
It is clear there's a pattern for the 900 series. Nvidia had loads of RMAs in the early RTM phase, because of this problem (timeouts, gpu reboots, driver crash)...
I got similar problems and at least i RMA`d mine GTX980 (bought 1.5 years ago) this week. EVGA returned within 3 days a complete new GTX980 (not refurbished). Together with ECAR_Tracks`s suggestions, it`s running complete flawless, now. I tested it with SaoPaulo and rMegane WT Mod, 25 AI, max settings, most of the time above 200/220 FPS, just on s/f dropdown to ~160 FPS. GPU load was between 85% and 96%, GPU temp ~ 60°, max. CPU was 62%
 
I got similar problems and at least i RMA`d mine GTX980 (bought 1.5 years ago) this week. EVGA returned within 3 days a complete new GTX980 (not refurbished). Together with ECAR_Tracks`s suggestions, it`s running complete flawless, now. I tested it with SaoPaulo and rMegane WT Mod, 25 AI, max settings, most of the time above 200/220 FPS, just on s/f dropdown to ~160 FPS. GPU load was between 85% and 96%, GPU temp ~ 60°, max. CPU was 62%

Could you also give feedback if you can do "1 single lap in Le Mans 1991" ? (Post # 44)
 
I just like you all to know that this is not only Nvidia issue AMD/ATI cards has also this problem and you can find threads like this in every game forum. These errors are triggered by a Windows service called 'Timeout Detection and Recovery' (TDR) TDR is a feature of the WDDM driver model implemented first in Windows Vista in 2007-2008. Timeout Detection and Recovery error is nothing new.
If you are having problems with rFactor2 you should start with Wagnard TDR manipulator as I describe in my post # 53. If Wagnard TDR manipulator don`t solve your problems use stress test tool like HeavyLoad for your PC and if your PC don`t past the test you know what is the problem:( https://www.jam-software.com/heavyload/
You can find more help here: https://forums.geforce.com/default/...hat-is-it-an-fyi-for-those-seeing-this-issue/

:)

Try TDR manipulator with you current driver first it works fine to me but the problem with latest drivers is stutters and other hick ups.

Uninstall EVGA Precision x16 and use Display Driver Uninstaller (DDU).
Nvidia Profile:
Maximum pre-rendered frames =1
Power management mode = Prefer Maximum Performance.

Download link Wagnard TDR manipulator: http://www.wagnardmobile.com/DDU/download/tdr Manipulator v1.1.zip
Open TDR manipulator > (TdrLevel) > Disabled > Apply > restart your PC. Don`t mess with those other values.

What to do if one has an ATI?
 
Back
Top