T
TodddHunter
So the latest in our problems with S2D. There is a bit of history so read on.
We have a 3 Node cluster that has 1 Pool and 4 VDs/SOFS. If I am reading it correctly 1 of the 4 disks appears to have repairing for 168 days( 5 1/2 months) and is stuck. Another 2 VDs have been repairing for 120 days and 13 days.
This is a new development. When I checked the VDs and Storage Jobs last month and this morning it showed everything Healthy and no Storage Jobs running. So how we now have a repair that has been going on for 5 months is a mystery.
Back in October 2017 one of the 3 Windows 2016 Nodes failed to complete Windows Updates. The other 2 Nodes were updated in December 2017 but we stopped updating them further for fear of getting too far out of sync on the OS and Updates. Today we were able to finally fix the Windows update and SANHA01 is now updated to July 2018, SANHA02 & SANHA03 are still on Dec 2017.
Before running updates and restarting SANHA01 I checked both the VDs and Storage Jobs and everything was normal.
After Windows Updates completed I check and the VDs were repairing. This seemed normal until I checked the repair job from Powershell and found this.
PS C:\> get-storagejob
Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
---- ---------------- ----------- -------- --------------- -------------- ----------
Repair False 01:58:23 Running 93
Repair False 01:59:08 Running 87
Repair False 01:58:47 Running 86
Optimize False 06:17:55 Running 0
Repair False 01:58:29 Running 91
Rebalance True 00:00:00 Running 0 268435456 1123402383360
Repair True 120.01:01:00 Running 93 1000101380096 1066898817024
Repair True 126.14:10:16 Running 91 1662903123968 1819966177280
Repair True 13.14:02:40 Running 87 1393240834048 1595104296960
Repair True 168.17:42:34 Running 86 1168200695808 1356105515008
During the day 1 of the VD repairs has completed, and 2 others are progressing slowly.
This is approximately 8 hours later
PS C:\> get-storagejob
Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
---- ---------------- ----------- -------- --------------- -------------- ----------
Repair False 07:56:37 Running 92
Repair False 07:56:20 Running 86
Optimize False 12:15:27 Running 67
Repair False 07:56:02 Running 97
Rebalance True 00:00:00 Running 65 268167020544 406679715840
Repair True 126.20:07:45 Running 97 1771052728320 1819966177280
Repair True 13.20:00:10 Running 92 1476060250112 1595104296960
Repair True 168.23:40:03 Running 86 1168200695808 1356105515008
The last VD has made no progress and the BytesProcessed has remained unchanged and appears to be stuck and not progressing.
I cannot run Windows Update on the remaining 2 nodes and restart until the repair is complete.
Am I reading the dates on these repair jobs correctly?
How I can get the last VD to finish the repair?
Thanks,
Todd
Todd Hunter
Continue reading...
We have a 3 Node cluster that has 1 Pool and 4 VDs/SOFS. If I am reading it correctly 1 of the 4 disks appears to have repairing for 168 days( 5 1/2 months) and is stuck. Another 2 VDs have been repairing for 120 days and 13 days.
This is a new development. When I checked the VDs and Storage Jobs last month and this morning it showed everything Healthy and no Storage Jobs running. So how we now have a repair that has been going on for 5 months is a mystery.
Back in October 2017 one of the 3 Windows 2016 Nodes failed to complete Windows Updates. The other 2 Nodes were updated in December 2017 but we stopped updating them further for fear of getting too far out of sync on the OS and Updates. Today we were able to finally fix the Windows update and SANHA01 is now updated to July 2018, SANHA02 & SANHA03 are still on Dec 2017.
Before running updates and restarting SANHA01 I checked both the VDs and Storage Jobs and everything was normal.
After Windows Updates completed I check and the VDs were repairing. This seemed normal until I checked the repair job from Powershell and found this.
PS C:\> get-storagejob
Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
---- ---------------- ----------- -------- --------------- -------------- ----------
Repair False 01:58:23 Running 93
Repair False 01:59:08 Running 87
Repair False 01:58:47 Running 86
Optimize False 06:17:55 Running 0
Repair False 01:58:29 Running 91
Rebalance True 00:00:00 Running 0 268435456 1123402383360
Repair True 120.01:01:00 Running 93 1000101380096 1066898817024
Repair True 126.14:10:16 Running 91 1662903123968 1819966177280
Repair True 13.14:02:40 Running 87 1393240834048 1595104296960
Repair True 168.17:42:34 Running 86 1168200695808 1356105515008
During the day 1 of the VD repairs has completed, and 2 others are progressing slowly.
This is approximately 8 hours later
PS C:\> get-storagejob
Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
---- ---------------- ----------- -------- --------------- -------------- ----------
Repair False 07:56:37 Running 92
Repair False 07:56:20 Running 86
Optimize False 12:15:27 Running 67
Repair False 07:56:02 Running 97
Rebalance True 00:00:00 Running 65 268167020544 406679715840
Repair True 126.20:07:45 Running 97 1771052728320 1819966177280
Repair True 13.20:00:10 Running 92 1476060250112 1595104296960
Repair True 168.23:40:03 Running 86 1168200695808 1356105515008
The last VD has made no progress and the BytesProcessed has remained unchanged and appears to be stuck and not progressing.
I cannot run Windows Update on the remaining 2 nodes and restart until the repair is complete.
Am I reading the dates on these repair jobs correctly?
How I can get the last VD to finish the repair?
Thanks,
Todd
Todd Hunter
Continue reading...