Microsoft Windows Cluster Issue

Submission by Mukesh Narvekar

Votes: 655

Banking and Capital Markets

Problem Statement

There were two Node Clusters. The Node1 was going out of Network and getting unresponsive after transferring all the resources from Node 1 to Node 2.

Solution Description

I analyzed the OS eventlogs and Cluster Logs but couldn’t find the exact issue. So I took console of that problem Node and tested the Cluster Resource movement. The Node became unresponsive because the network subnet, which was configured on that node, was automatically getting removed after transferring the Cluster Resources. When I connected to working Node and transfer back the resources from Working node to Problem Node, the Cluster resource was getting transferred but Problem Node was still unresponsive to Network. On manually configuring the Subnet IP on that problem node, the problem node was coming on Network. So I checked the same from ESX, Datastore and virtual NW configuration side but without any luck. So, finally, I called Microsoft and open a Case. As per their troubleshooting (Collected EventLogs and Cluster Logs, Online utility, NW scanning), the issue was suspected from ESX side and I was suggested to keep both virtual machine on single ESX host, and try to move the Resource and test the same but it didn’t work. After that Microsoft again suggested to check the same from Virtual NW side, and I did it again but the same issue was still persisting. So, I started troubleshooting from my side and taken the downtime of the problem node from the client. The troubleshooting was done based on my previous IT experience. Suddenly, I thought of checking the network configuration in device manager. So, I shutdown the node, started it in safe mode, and checked for the hidden Network Devices. I had followed the same steps, which I followed in previous Company where I fixed the Network issue for one server. I had followed the same steps which are available on Set the Key in Registry devmgr_show_nonpresent_devices=1 and restarted the server in safe mode. After removing all hidden devices, I reconfigured the Cluster for the second node. Made the server up and tested the MS Cluster resource movement and it was successful. I have documented everything and updated the client and received the appreciation from him. I also called Microsoft, explained everything and updated to close the case from their side.

Challenges Faced

When I was settng the registry from Command prompt using command “set devmgr_show_nonpresent_devices=1”, it was not helping me. So, I had to set it manually by going into the proper registry path. I will request Microsoft to mention the below path in below mentioned KB article. Registry Settings System Key: [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\ Environment] Value Name: DEVMGR_SHOW_NONPRESENT_DEVICES Data Type: REG_SZ (String Value) Value Data: (1 = show all hidden devices)

Community Buzz