Multi-subnet Failover Clusters
I want to take a closer look at one of the new features in SQL Server Denali. While everyone has been drooling over HADRON, another HA feature has gone mostly unnoticed. I haven’t heard anyone talking about it yet, so this should be an introduction to this new feature, Multi-subnet Failover Clusters.
Geographically Dispersed Failover Clusters
Multi-subnet failover clusters is an enhancement to an existing technology, geographically dispersed failover clusters. SQL Server 2008 and Windows Server 2008 introduced geographically dispersed failover clusters, often called stretch clusters or geo-clusters. It allows you to stretch a cluster across mutliple data centers to provide a greater level of availability and also provides protection at the storage level by having more than a single copy of the data.
The downside of a geo-cluster that you cannot simply select two datacenters and set up a geo-cluster. Implementing a geo-cluster requires a stretch VLAN so that all nodes are in the same subnet. Currently, SQL Server failover clustering does not support multiple subnets. A geocluster allows quick failover between all nodes of the cluster; however, failover between nodes in the same site is measurably faster.
* Note: the above image is also downloadable as a Visio Drawing: GeoCluster.vsd (172 K)
Multi-subnet Failover Clusters
SQL Server Denali (SQL 11) takes geo-clustering a step further and enables you to set up a failover cluster using nodes in different subnets. This is available for both geo-clusters and local clusters. This eliminates the requirement for a stretch VLAN.
When a failover cluster is configured with multiple IP addresses with nodes in different subnets, it will set the dependency to be an OR dependency instead of an AND dependency. This allows the cluster to come online as long as it is able to bind to any 1 IP address. Currently, we only have the option to set AND dependencies on the IP addresses for the cluster network name which requires that all IP addresses be online for the cluster to come online.
When a failover occurs from one subnet to another, Windows clustering updates the DNS entry for the cluster virtual name to direct users to the other subnet. As a result, connections to the cluster will fail until the DNS update propagates and clients get the new IP address. A subnet failover will have a noticeably longer failover time than a local failover cluster and the failover across sites will be longer than for a single-subnet cross-site failover due to the dependency on updating the DNS record for the network name.
This new functionality allows for new failover cluster configurations that support multiple levels of failover and even provides disaster recovery functionality. You can mix local clusters and multi-subnet clusters. The local nodes can be used as the primary failover and the subnet failover as a secondary failover. This would provide quick failover (High Availability) for most issues and subnet failover to protect against site outage or SAN outage (Disaster Recovery).
This is a huge step forward in ease of setup and supportability of a geo-cluster, but due to the longer failover time for cross-site failovers, it won’t be the best choice for all situations that call for multi-site protection.
* Note: the above image is also downloadable as a Visio Drawing: MultiSubnetCluster.vsd (176 K)