SQLU VLDB Week – Backups
It’s day three of VLDB week at SQL University. Today’s lesson is on Backups. Very large databases have very large full backups. Very large backups have certain inherent problems. Disk space, disk performance, log file management, and time to complete a full backup all challenges faced with backing up a very large database. I’m going to give you some tips for dealing with very large backups and help you speed up the back up process.
Backups and Disk Performance
The obvious challenge is that very large backups use a lot of disk space. Most VLDBs are on SANs these days and disk space is usually easy to get, but even SANs have limits on the amount of space available on them. In addition to disk space is the impact on disk performance. Even if the backups are on dedicated LUNs on a SAN, throughput to the SAN can be flooded causing delays in all disk operations to and from the SAN.
This becomes more evident on a shared SAN there are multiple servers sharing the same SAN. One of the VLDBs that I worked with in the past used a shared SAN, and even though our backups didn’t run at midnight, you could see disk performance dip at midnight when everybody else on the SAN ran their backups at the exact same time. We scheduled our backups to run at a different time from everyone else so that we were not competing with them for throughput to the SAN.
Fast disks are important for performance of database backups, but they’re not the most important piece of it. Whether you’re using a SAN or DAS or NAS, you are probably not being bottlenecked at the disk level. Your bottleneck is most likely throughput to the storage. The shared SAN we used for the previously mentioned VLDB was all RAID 6. When I started working with the application, we were experiencing disk perofrmance issues. We did a lot of investigation to try to prove that we needed a faster storage solution. Turns out, we were wrong.
We worked closely with the Storage Unit Operations team and were able to show that we couldn’t push enough data over the fiber to stress the disks. The storage guys increased our queue depth and the number of paths to the SAN. We saw minimal improvements with the increase to queue depth, but massive improvements with the increase in number of SAN paths. It is very likely that we weren’t able to see the improvements from queue depth because we were still bottlenecked on throughput.
Our production SAN connection consists of 12 LUNs with 4 paths per LUN, 24 paths per HBA, for a total of 48 paths. Here’s a glimpse into our SAN pathing:
Log File Considerations
Since full backups take longer to run for a VLDB, you may have to provide more space for the log file and for log backups. When performing a database backup, the log file is not truncated until after the database backup completes. If the log file runs out of free space during the database backup, the log file will grow. The best setting is to simply pre-size the log file so that it doesn’t need to grow during a full backup, but you should configure the growth settings in case it does. You want to make sure you have the growth settings set to a constant value rather than a percentage value. When the log file grows, it has to be locked which blocks all activity in the database. Log files must be zero initialized and instant file initialization will not help you here. I generally use a setting of 512 MB for growth of the log file.
Speeding Up Backups
One of the biggest things you can do to speed up your backups is to back up to multiple files on multiple dedicated drives. When writing the backup, a backup thread is spawned for each LUN or mount point. For example, if you are backing up to 3 different drives, you will get three backup threads. The drives should be laid out intelligently so that they’re not all hitting the same disks. Ideally, they would be on different arrays in the SAN. I have seen a 6 hour backup to a single drive turn into a 2 hour backup to 3 dedicated drives.
On the server that hosts the VLDB that I’ve talked about throughout this post, we have 5 dedicated backup drives. There are two databases on the server. One database is 500 GB and the other is 1.8 TB. We are also doing full backups once per week and differential backups daily the rest of the week. The layout for the backup drives are as follows:
E: – Backups for database 1, backup file #1 for database 2
F: – Log backups for database 1
G: – Log backups for database 2
R: – Backup file #2 for database 2
S: – Backup file #3 for database 3
Here are some additional tips for improving backup speed:
- Use fast disks to maximize write speed
- Use dedicated drives to avoid I/O contention with user databases or other processes
- Maximize the storage throughput
- Increase queue depth
- Increase the number of paths
- Use differential backups to reduce the frequency that you run full backups
- Perform partial backups to spread out the backup load
- Back up read-only filegroups infrequently and read-write filegroups frequently
- Spread backups throughout the week by backing up specific files or filegroups each night
- Use backup compression to reduce the amount of I/O that has to be written to disk
Using the advice mentioned here, we reduced that backup time of the largest database from over 6 hours to 1.5 hours. When we upgraded to SQL Server 2008 and started using native compression, we got another boost of about 30 minutes to the backup speed. We were transitioning from using SQL LiteSpeed for backup compression and were hoping to maintain the same performance. We were delighted to attain an actual improvement. Backup performance and speed was a big issue at one time. It’s very much a non-issue now.