Implementing Failovers to Prevent Database Disasters

Manish Kumar
Manish Kumar
15 min read
Posted on November 03, 2022
Implementing Failovers to Prevent Database Disasters

A Brief Introduction to Failovers

A failover is a critical component to disaster recovery that allows a standby database acting as a backup to transition to a primary role once the primary database malfunctions.

Now let’s check the prerequisites for implementing a failover:

  • A secondary SQL DB server

  • A failover group in the primary SQL DB server

  • The server login and firewall settings for the secondary server must match with your primary server

We must have a database backup and an automated failover group to facilitate high availability. The failover group automatically switches across primary and secondary database if anyone of them goes down.

This article provides a step-by-step approach to implement a failover group.


How to Implement a Failover

Here we have opted the Azure portal approach instead of using PowerShell for convenience.

STEP 1: Create a Secondary SQL DB Server

For starters let’s create a secondary database server.

Observe the database server creation carefully.

The Resource group should be the same as the primary database.

The Location must be different from that of the primary database.

Keep the Server admin login and Password same as that of the primary database.

Once the secondary server is successfully deployed the following screen appears.

Since the secondary server has a different location, we need a separate virtual network and subnet for the region. Once you have created the virtual network, navigate to the Networking tab of the secondary server and add it to the virtual network rule. Check whether you need any Firewall rules for your server.

Add a server-to-server peering (primary and secondary) inside your newly created virtual network. Check whether any other peering is needed there. Absence of peering can cause database connection issues while switching to the secondary database.



 

STEP 2: Create a Failover Group in the Primary SQL DB Server

Let’s create a failover group in the primary database server.

Navigate to Failover groups of your primary server and click Add group.

Select the Server from the dropdown list. This should be your secondary SQL database server. Choose the database you wish to add to your failover group. Here we are opting for the primary database. Adding the database to the failover group will automatically start the geo-replication process.

Configure your database server.

Once you have selected the database click the Create button to implement a failover group.


STEP 3: Test the Failover

Let’s now test the newly created failover.

Click Failover groups from the navigation panel to the left and choose your failover group.

Verify your primary and secondary servers.

Configure a failover group between these two servers by selecting Failover from the task pane.

A warning message notifies that the TDS sessions will be disconnected. Select Yes to confirm.

Review your primary and secondary servers. If the failover succeeds the servers should have swapped roles.

Reselect Failover to switch back the servers to their original roles.

In case you need to delete the secondary database, first remove it from the failover group to avoid any unpredictable behavior.


STEP 4: Locate Listener Endpoint

After configuring the failover group update the connection string for your application to the listener endpoint. This will keep your application connected to the failover group listener, rather than the primary database, elastic pool, or the instance database. So, there is no need to manually update the connection string every time your database entity fails over, and the traffic is always routed to the existing primary entity.

The listener endpoint is visible on the Azure portal while viewing the failover group.

Whenever a failover group is deleted, the DNS records for the listener endpoints are also deleted. Here, there’s a probable chance of someone else creating a failover group or a server DNS alias bearing the same name. Since, failover groups and DNS alias must be unique globally, this will prevent you from reusing the same name. So, it’s a good practice to avoid using generic failover group names.

Removing a failover group for a single or pooled database doesn't stop replication, and it doesn't delete the replicated database. If you wish to add a single or pooled database back to a failover group after it has been deleted, manually stop the geo-replication and delete the database from the secondary server. Failing to do either of the above may be result in errors while attempting to add the database to the failover group.


STEP 5: DB Failover Group Orphaned Users

Orphaned users occur when the user object in the database has an SID that deviates from what the login suggests. Orphaned users are not authenticated by Active Directory and are related to SQL logins.

While restoring an on-premises database from one server to another you may need to manually adjust a database user to fix an incorrect SID. Orphaned users can also occur in Azure SQL database and failover groups. The login is created on the primary server and subsequently a user is created in the database.

-- While in Master
CREATE LOGIN [User2] WITH PASSWORD = 'ThisIsNotThePassword1234'
GO
-- While in the user database
CREATE USER [User2] FROM LOGIN [USER2]
GO

Once a database user is created the command is sent to the secondary replicas, but the login information is omitted. As a result, the SID of the database user doesn’t match any corresponding login on the secondary server. When you try to authenticate your user the following error prompt might appear:

 

This happens because the SIDs of the two logins are different. So, the user object of the secondary server is unable to authenticate it. Since it is a secondary replica, the database is read-only, and nothing can be done with the user object. The only way out is to drop and recreate the user login on the secondary server with an appropriate SID. To obtain the correct SID check the following locations on the primary server:

Database level

Sys.database_principals

Database level

Sys.sysusers

Master level

Sys.sql_logins


In context of the master on the secondary server perform the following:

DROP LOGIN [User2]  
GO  
CREATE LOGIN [User2] WITH PASSWORD = 'ThisIsNotThePassword1234', SID = 0x01060000000000640000000000000000CA6EAC7C69DC024DBB850F80F8E595E6  
GO  

Now the database user SID matches with the login SID. The user can provide authentication to the server and access the database.

This occurs only with SQL logins, but the resolution is fairly straightforward once you figure out the underlying cause. Whenever you need to create user accounts in a failover group always use the right SID from the start to avoid unnecessary hassles.


Citations

  1. Configure an auto-failover group - Azure SQL Database | Microsoft Docs

  2. Azure SQL DB Failover Group Orphaned Users (c-sharpcorner.com)