Sporadic display crashing on new GPU

C

Ciel R

Been having a bizarre problem which began a few days after I installed the Gigabyte Gaming OC Pro 3060 Ti that I managed to get ahold of earlier this month. One possibility of course is that the GPU itself has a defect which is causing all this, but I’m hoping that’s not the case because of how hard they are to get ahold of (I’m not sure Newegg would even be able to get me a replacement in any reasonable length of time given the ongoing supply issues, so if I return it, they may just say “here’s a full refund, if you want to buy a new one to replace it, good luck!”). Apologies for the length of the post but I wanted to include as much detail as possible. All attempts to help are much appreciated!

TL;DR version –computer ran fine with new GPU for a few days, then my displays crashed (all 3 monitors black screened) 6-7 times in a few days. Windows Reliability Monitor just says “Hardware Error” caused Windows to stop working. Error codes seen are 119, 117, and 141. Also had one instance of the “Desktop Window Manager” crashing. I can’t do anything to MAKE these errors occur (many of them were when the GPU was idle and no games or anything resource-intensive was running). After a small virtual memory tweak suggested by a friend, it seemed like the problem went away, but 7 days later, both kinds of crash occurred again (last night and the night before). The crashes have occurred through multiple recent Nvidia driver versions (457.51, 460.79, 460.89), and wiping (with DDU) and reinstalling drivers didn’t change anything. Benchmarking tools gave me good results and no errors of any kind.

The major problem is that with these crashes being so sporadic and impossible to reproduce, and with them able to simply disappear for a full week then suddenly return, it makes troubleshooting extremely difficult. Regardless I am going to put my old GPU back in for a day or two and see what happens. Beyond that, I’m thinking about just wiping everything and starting over with a full reinstall of Windows, which in theory will either solve it (if it’s a gremlin in the system somewhere), or show me that the new GPU likely IS at fault (if I reinstall everything and it starts crashing again). Looking for any insights or ideas or anything I might have missed, or other ideas for what to do next.


Detailed breakdown:

-The 3060 Ti was installed on December 9th. For a few days everything was fine, but then I had a display crash. The form this takes is: everything freezes (can’t even move mouse) for about 5 seconds, then all 3 monitors go black, which lasts another 5 seconds or so. Then they come back, and the computer acts pretty normal after that. The one other thing that happens is in Chrome: a strange visual bug with the way tab names are displayed occurs after the crash, and persists until Chrome is restarted. This doesn’t affect browser performance, and effectively does nothing beyond serving as an indicator that a crash happened.

-Over the course of four days, it crashed in this same way repeatedly (6-7 times in total). In Windows Reliability Monitor, these crashes all appear as “Hardware Errors” under Critical Events. In the details, they are listed as “LiveKernalEvent”, and I’ve seen three different error codes: 117, 119, and 141. Information on exactly what these mean is annoyingly hard to come by, but it seems all three can potentially indicate display driver issues.

-ONE of these crashes caused the computer to restart right after the screens went black. This added a “Shut Down Unexpectedly” to the critical events list (typed as “BlueScreen” but still with error code 119). This particular crash happened soon after I completely nuked my display drivers with DDU and reinstalled them (Nvidia driver version 460.79).

-A friend found a thread posted by someone with a similar problem who said they fixed it by upping the amount of virtual/paged memory in Windows to 8GB. My virtual memory was already set at higher than 8GB, but on the chance it could help, I changed it to 8. The next day, I got a different kind of crash: this time it was the “Desktop Window Manager” which “stopped working”. This had a different effect: it didn’t cause all monitors to go black, but it did kill the game that was running, and Discord. The Chrome visual bug was also present afterward.

-After this, the problems went away for 7 days. I almost wondered if the virtual memory tweak fixed the original Display crash, and the one Desktop Window Manager crash was just a poorly timed fluke, but I now know that’s not the case.

-Tuesday night (8 days since the last full black screen crash), I got the crash again. Same as almost every time: 5 second freeze, all 3 screens go black for 5 seconds, then the computer works normally except for the Chrome visual bug. Wednesday night – 8 days since the one instance of the Desktop Window Manager crash – I got THAT crash again. This, too, was the same as its previous manifestation: running game died, Discord died, no other ill effects that I could detect, and then the Chrome visual bug.

-Drivers: I got crashes on different drivers: 457.51, 460.79, and (as of last night) 460.89. So it doesn’t seem as if any one of those drivers being faulty can be blamed. I would assume the likelihood of ALL of them being faulty in the same way is very low, especially since I’ve seen no widespread reports of these kinds of issues. I did get the most amount of crashes on 460.79, but I don’t think that means much.

-I used both the Furmark and Heaven benchmarking tools to test the card within the last couple of days; there were zero issues in either one. Furmark seems to just run forever, and there were no issues, and in Heaven I used the “benchmarking” mode (on high settings at 1440p full screen) which ran for several minutes (image attached showing the results). the "min FPS" is VERY low (9), but as I have never used these kinds of programs before, I don't know if that's normal to see (e.g. the FPS starts out super low for a moment). After the first couple seconds of the benchmark running, it never dipped below 120 FPS. Other than that, there were no problems of any kind; temp never went above 62C and everything looked right in GPU-z.

-The big problem: That 7/8-day gap presents a massive impediment to further troubleshooting. This issue occurs randomly and sporadically, and as evidenced by the gap, can just go away and not happen for days and then suddenly reappear. In addition, there is no pattern or trigger that I’ve been able to determine. Out of all these crashes of both types (around 10 in total), only three times has it occurred while gaming (which of course shut down the game). The rest happened while the GPU was idle or near-idle (i.e. when just browsing, or when doing almost nothing at all). One of them did occur when watching a video (a local file played in MPC-HC, not youtube), but that still wasn’t using the GPU more than a little bit. So there is no way to force the error to occur or reproduce it. The second error specifically (the Desktop Window Manager one) HAS only occurred while a game is running, so it may be that it’s a problem linked to the higher load on the GPU of running a game… but that could also be coincidence since that type of crash has only happened twice (as opposed to the other type which happened 7-8 times now). Plus, even if it is “caused” by running games, I have still been playing one game or another nearly every day during this entire period and only twice has that error occurred.

So all of this means that something like “go back to my old GPU (1660 Super) and see if it crashes” may not be worth anything. If I put the old GPU back in and it crashes within a day or two, then okay, I’ve learned that the 3060 Ti is not the problem. But if I put the old GPU back in and it DOESN’T crash… I learn nothing, because this problem can apparently just stay away for 7-8 days (very likely could stay away for MORE than that in theory) and then suddenly come back with no rhyme or reason. So the lack of crashing could mean the 3060 Ti was at fault, or it could just be coincidence that the problem stayed away while the old GPU was back in, and I’d have no way of knowing.

Despite this, as I said in the TL;DR, my next move is going to be to swap back to the old GPU anyway, at least for a few days (it’s worth a shot at least). After that, I’m probably going to just reinstall Windows 10 entirely, given the difficulty of tracking this down.

System info:

-On current Windows 10 version (20H2)

-i5 9600k

-32 GB (2x16) Corsair Vengeance LPX DDR4 memory

-ASUS TUF-Z390 Gaming + WiFi motherboard

-Seasonic Focus 80 Plus Gold 650 watt PSU

-Gigabyte Gaming OC Pro 3060 Ti

NOTHING is overclocked by me at all. Even the RAM, which is rated for 3200 mh2547c252-34ba-4863-b4d5-6914cb6b4ada?upload=true.pngz, is running only at 2133 and I haven’t attempted to change that.

Continue reading...
 
Back
Top Bottom