In case you’re not familiar with T-SQL Tuesday, let me enlighten you. This blog entry is participating in T-SQL Tuesday #40, hosted this month by the Midnight DBA Jen McCown (Blog|@MidnightDBA|@JenniferMcCown). You are invited to visit his blog to join the blog party. You are welcome to write your own participating blog post for the party or just to read more blogs participating in this month’s theme: File and Filegroup Wisdom.
If you missed any of the earlier posts in my DR series, you can check them out here:
- 31 Days of disaster Recovery
- Does DBCC Automatically Use Existing Snapshot?
- Protection From Restoring a Backup of a Contained Database
- Determining Files to Restore Database
- Back That Thang Up
- Dealing With Corruption in a Nonclustered Index
- Dealing With Corruption in Allocation Pages
- Writing SLAs for Disaster Recover
- Resolutions for All DBAs
- Use All the Checksums
- Monitoring for Corruption Errors
- Converting LSN Formats
- Extreme Disaster Recovery Training
- Standard Backup Scripts
- Fixing a Corrupt Tempdb
- Running DBCC CheckTable in Parallel Jobs
- Disaster Recovery Gems From Around The Net
- When are Checksums Written to a Page
- How to CHECKDB Like a Boss
- How Much Log Can a Backup Log
- The Case of the Backups That Wouldn’t Restore
- Who Deleted That Data?
- Which DBCC CHECK Commands Update Last Known Good DBCC
- Restoring Differential Backups With New Files
- Handling Corruption in a Clustered Index
- Improving Performance of Backups and Restores
- The Mysterious Case of the Long Backup
- Restoring Part of a Database
- Recovering SQL if the Tempdb Drive Dies
- Using Database Snapshots to Restore Replicated Databases in Test
Replicating a Partial Database
Let’s assume a scenario where you have a very large database (VLDB) and you want to replicate only part of the database, but you don’t want to use a snapshot to initialize the subscription and you don’t want to copy the whole database backup to the subscriber and restore it. For example, if you have a 5 TB database with mostly historical, unchanging data and you only want to replicate a small percentage of the database. For this example, let’s assume that 3 TB of data is old data and 2 TB is current data and you only want to replicate the current data. How would you go about doing it in the fastest, easiest method?
Initialize from snapshot? Snapshots are slow and performance impacting. There is limited support for compression in snapshots, but it uses Windows CAB compression that can be CPU intensive and has limitations such as a 2 GB file limit. Maybe your production database can take that sort of performance hit, maybe not.
Initialize from backup? The trouble with this option is the documented way to do this is to back up the entire database, copy the entire database over, and restore the entire database. For 5 TB of data, this can take a really long time. Additionally, the entire database gets restored and if you don’t want the extra 3 TB of data in the database, you have to delete it yourself. Even more time and trouble to set up, not to mention the performance hit on the subscriber.
From Books Online:
A backup contains an entire database; therefore each subscription database will contain a complete copy of the publication database when it is initialized.
It is the responsibility of the administrator or application to remove any unwanted objects or data after the backup has been restored.
But there’s a better way. You can take the one-time hit of separating your tables to different filegroups (assuming they’re not already that way, and they’re probably not) in order to be able to set up replication in a much quicker and simpler way. Once you have the files segregated, you can use partial database availability to restore only the filegroups that contain the tables you want to replicate.
One of the caveats for doing this is that you have to backup the primary filegroup with any other filegroups you want to restore. The primary filegroup always has to be restored as it contains all of the system objects that define everything in the database, and the primary filegroups and any other filegroups must be based on the same recovery point, so they must be backed up together. You can perform a filegroup backup of filegroups by name or just the read/write filegroups which include all filegroups not marked as read-only. On the other hand, the restore can actually be performed from a full backup by specifying which filegroups to restore.
Any filegroups not restored on the subscriber will still be listed as part of the database, but the files in those filegroups will be listed as RECOVERY_PENDING. Any attempts to query tables in those filegroups will return an error stating that the file is not online. Then it’s a matter of setting up replication subscription to initialize from backup.
You may be wondering, “Any downsides to this process?“. Well, partial database availability is an Enterprise Only feature. so, yeah, you must be on Enterprise Edition (or Enterprise or Datacenter Edition if on SQL 2008 R2).
Steps to Set Up and Demo Code
I’ve done presentations on this process and written a whitepaper on it. I’m not going to repeat all of the details that I’ve presented elsewhere. I recommend downloading my whitepaper Initializing Replication from Backup and my slide deck and demo code from SQL Saturday #107 in Houston, TX.