The comments that followed brought up some good points about some scenarios where you might choose to not back up a very large database. I performed operations for a BI application at Microsoft whose database was 7 TB. At the time I joined the operations team, I was told that they don’t know how to back up a database that large so they just don’t do it. I told them they had to start, and one of the first things I helped them do was to get multiple dedicated LUNs set up so they could speed up the backup by striping it across multiple drives (see Day 25 of 31 Days of Disaster Recovery: Improving Performance of Backups and Restores) so that the total backup time was within a reasonable window. Unfortunately, their database wasn’t designed with multiple filegroups, so our options were limited.
The source system only kept inactive cases (a particular set of data that is no longer in use) for 90 days. This meant that they could reload lost data for up to 90 days, and the system allowed for a long enough downtime (RTO — see Day 7 of 31 Days of Disaster Recovery: Writing SLAs for Disaster Recover) that they could reload a large amount of data. We opted for a schedule of backing up the whole database once per month AND immediately after any schema upgrade.
Because we opted to perform only full backups and were not doing log backups, we need to ensure that the schema is being preserved. Or we would need to be able to identify scripts we could run to re-apply schema changes to the database. This was in SQL Server 2005 when we set this up. SQL Server 2008 wasn’t released yet. Though not an obvious feature, SQL Server 2008 did introduce a new technology that could be used to back up and restore just the schema of a database. More on that in a minute.
One argument I’ve heard for not doing backups of very large BI systems is that the data can be rebuilt from the source systems. My first question is always, what about the schema of the database? Where will you get that from? The easy answer … from the dacpac.
What’s a Dacpac?
A dacpac is a “data-tier application package”. Yes, I know the acronym makes no sense. True story (I $#!+ you not), they chose the acronym dac for data-tier application because “the acronym dta was already taken”. Apparently, the great minds that chose dac had never heard of the dedicated admin connection (the real dac) added in SQL Server 2005. Dac is generally thought of as a way to simply deployment of database applications by encasing it in a package (dacpac) that can dynamically upgrade an existing system to make it match the new schema. But there’s another use for dacpacs that maybe you haven’t considered: schema-only backups and restores.
Schema-only BackupsIf you right-click on a database and highlight Tasks in SQL Server 2008 and newer, you will see several options for the dac. We’re not interested in this post in most of those options. We’re only interested in “Extract Data-tier Application”. This task will export the data tier application with no data. It’s basically just the database as an empty shell. Extracting the dacpac is very simple and there aren’t really any options for it other than where to save it.
Alternatively, there is an option for exporting a data-tier application which has lots of options and allows you to export only part of a database and includes the data. That’s a discussion for another day.
Restoring a Dacpac
So let’s suppose you had a very large database and planned to re-import the data from source systems if you lose the database. Now, let’s assume the database is completely lost, and you want to start over with an empty version for the database, or what we now might call the dacpac. Restoring via the command line is actually very easy as well. The steps to restore the dacpac would be:
- Right-click on the Databases node
- Click on Deploy Data-tier Application… (not import)
- Browse for and select the dacpac you want to deploy
- Click Next
- You have to delete the pre-existing version of the database, if there is one, or give it a different name to be able to deploy it
- Click Next
- View the Summary and click Next to start the deployment
- Click Finish if the deployment completed successfully.
If you completed the above steps with no errors, you now have an empty copy of your database. Some things are different, like the database file settings will be missing and extended properties for the database are not included. You can also use the dacpac to “Upgrade the Data-tier Application” and generate a change script f the database still exists. For example, if you have an older full backup from before a schema change was made, you can then upgrade the database from the dacpac and it will generate a change script to make the schema match the dacpac.
In terms of automation, it would be simple to programmatically export dacpacs for regular backups of the schema using PowerShell. I won’t go into the PowerShell commands here. You can check out this post on the official Data Platform Blog about the PowerShell commands for dac: DAC Powershell Samples.
Long story, short, a dacpac may be a much better alternative to generating very large backups of systems where you do not need to preserve the data, but you want to protect the schema. It’s not exactly the same as a backup with no data included, but it is a great way to easily preserve the schema of a very large database.