Many organisations run operational tests on infrastructure before making a system live. In a high availability RAC environment with dual switches and network cards for resilience, one such test is to fail network cards in the server and to fail switches. This test is performed to ensure that the secondary network cards and switches carry on without interruption to service.
In an active/active bond configuration or dual interconnect situation the test would prove that the cards and switches can cope with the additional load. In an active/standby bond configuration the test would prove that the additional network card and switch can pick up where the failed card or switch left off without any interruption to service.
With this in mind it is critical to get the configuration of the network cards and switches right. This article details an interesting switch configuration issue.
- A three node RAC 11g configuration with two network cards per server.
- Each network card on each server was plumbed into two separate CISCO switches on a dedicated VLAN
- A Redhat active-standby bond (virtual nic in mode 1) was create across the two network cards in each server
- The RAC interconnect was configured to use the RedHat bond
A test was devised to prove the loss of a switch or network card would not cause any issues with the RAC cluster. The test was relatively simple.
Initially, the bond on all servers was pointing to the network cards plumbed into the primary switch. A simulated failure of the network cards was performed by running an ifdown on the network cards plumbed into the primary switch for each server. In this case eth4. The expected behaviour was to see the bond failover to the backup network cards plumbed into the backup switch.
The next test was to simulate a failure in the backup switch by issuing a reload. The expected behaviour in this instance is to see the bond failback to the originally network cards plumbed into the primary switch.
Once completed the final test was to bounce the primary switch to simulate a primary switch failure to once again ensure all bonds failover to the network cards plumbed into the backup switch.
Switch Configuration Issue
In the test described above, failures of the network cards was successful as was the reload of the backup switch. However, the reload of the primary switch caused issues with RAC. In particular, as the primary switch started back up, it cause network interference on the interconnect.
At this point, questions may be raised on the configuration of the bond. Has it been configured properly? Has it failed across to the backup switch? In this instance the bond had been configured correctly and it had failed across to the backup switch.
So what was happening? Why when the primary switch came back up, (and it is worth being very clear here it was only when the primary switch came up. The primary switch going down was fine and no issues were experienced with the backup switch). Why was the primary switch causing network outages on interconnect which was currently plumbed into the backup switch as the primary switch started?
The answer lies in the configuration of the port channels and a slight change in 11g. The port channels need to be configured as edge ports. Any non-edge port channels on the CISCO switch will have their network connectivity interrupted for approximately 30 to 40 seconds when a primary switch starts up due to "spanning tree". i.e. the primary switch takes over control of the DNS configuration and during this time non-edge port channels are interrupted.
On 10g this issue may not cause any problems because the miscount value (the value that RAC can survive without its interconnect before node eviction takes place) is set to 60 seconds. On 11g, the miscount value has been reduced to 30 seconds.
Get your network folk to check the port configurations on the switches for the nics used by the 11g interconnect and ensure that they are configured as edge ports. They need to look something like this
switchport access vlan 322
spanning-tree port type edge
spanning-tree bpduguard enable
vpc orphan-port suspend
Alternatively, the miscount value in RAC can be changed to a value greater then the network outage, say 45 seconds.
WARNING - The miscount value should only be changed in agreement with Oracle support