Kernel-Power Event 41 (63) Caused by Erroneous Windows CPU Critical and Hot Temperatures

K

K.C.Jen

I finished my build last night (System Builder + 3 noctua case fans that I didn't put on the list), installed AMD and Gigabyte drivers. It was running fine under light gaming load (Destiny 2 at 1440p with CPU and GPU just under 65C, well within safe ranges). I powered it down for the night.

This morning I boot it up to watch some YouTube and about 3 hours in it just shuts itself off, no BSOD. At first, I removed my NVMe drive and used an extra SATA III SSD that I formatted and then installed Windows from USB recovery media, which was repeatedly interrupted by automatic restarts.

I looked up some guides on how to fix this, and I tried removing one stick of RAM, to no avail. I also checked my power supply cables to make sure they are all seated properly. I also tried sfc /scannow and chkdsk /f /r, which did not help either.

The shutdowns continued, only in Windows, and usually only when I was running some (slightly) intensive tasks, like installing drivers from a USB drive. My computer could sit in BIOS for more than an hour without issue, only leaving BIOS when I exited it (either through shutdown or through booting to Windows)

Eventually I decided to skim through events leading up to Kernel-Power Event 41 (63), finding several problems:

WHEA-LOGGER 19: Cache Hierarchy Error - I don't believe this is the actual cause of the shutdowns since this does not occur before every Event 41, but it could contribute to the issue.

Kernel-Power Event 185: APCI thermal zone \_ TZ.TZ10 has been enumerated.
_PSV = 290K
_TC1 = 0
_TC2 = 0
_TSP = 1000ms
_AC0 = 0K
_AC1 = 0K
...
_AC8 = 0K
_AC9 = 0K
_CRT = 294K
_HOT = 293K
minimum_throttle = 0
_CR3 = 0K

About halfway through the Microsoft document for thermal design: Design Guide , there is a table (ctrl+f for " The following table lists ") defining what each value is. I believe there are several errors in my APCI thermal zone value definition:

_TC1 and _TC2 are both 0. This prevents any sort of thermal throttling.

This is the bigger issue and is what I believe is causing the Kernel-Power 41 (63) issue. _CRT (CPU critical temperature where " operating system initiates critical shutdown") is set to 294K, or 21C. This would explain why it shut itself off more frequently while it was reinstalling Windows in the afternoon (SoCal, ambient reached about 75F, 23.9C), and shut itself off even while the system was at idle after the install. At night (9:30-11:15), it seems to be significantly more stable, and I was able to get the motherboard drivers installed without it crashing immediately. It did crash about 5 minutes afterwards though.

Is anyone familiar with how I can manually change these values and what values they should be set at? I know _CRT should be just below 100C, and I would prefer it to be around 90C, and _HOT should be just below _CRT. I also know that _TC1 and _TC2 should most definitely not be 0.

I am quite desparate, I spent quite a lot of money on this build. Note: my PSU is a Corsair SFX 600W Platinum. The lot number is 2016x, which is just outside the 1944x to 2011x bad PSU range. Not sure if this would affect it, but the Windows thermal management values are most likely incorrect.

Continue reading...
 
Back
Top Bottom