• SQL Server
  • Log Shipping Tricks Demo
  • SQLCruise Alaska 2012 Pics
SQLSoldier News From the Frontlines

Day 9 of 31 Days of Disaster Recovery: Use All the Checksums

January 9, 2013 8:40 am / 21 Comments / SQLSoldier

31 Days of Disaster Recovery

31 Days of Disaster Recovery

Welcome to day 9 of my 31 Days of Disaster Recovery series. Today, I want to talk about the three ways you can use CHECKSUM to protect yourself from and identify corruption. Checksum is the default page verification option in SQL Server 2005+ and helps identify corruption to data pages. The other two uses of Checksum are options for the BACKUP and RESTORE commands. When we are done here, hopefully you will be convinced to use all 3 CHECKSUM options.

If you missed any of the earlier posts in my DR series, you can check them out here:

    31 Days of disaster Recovery

  1. Does DBCC Automatically Use Existing Snapshot?
  2. Protection From Restoring a Backup of a Contained Database
  3. Determining Files to Restore Database
  4. Back That Thang Up
  5. Dealing With Corruption in a Nonclustered Index
  6. Dealing With Corruption in Allocation Pages
  7. Writing SLAs for Disaster Recover
  8. Resolutions for All DBAs

CHECKSUM Page Verification

There are 3 page verification options in SQL Server 2005 and newer. Checksum is the current default setting and the most comprehensive. Torn page detection was the default in SQL Server 2000. Torn page detection is less comprehensive because the way it works is to write a 2-bit pattern to the header of the page and then an alternating 2-bit pattern every 512 byte sector. This means it is possible for corruption to occur in a very small area of the page and not be detected. Checksum on the other hand generates a different value if even one bit of data is different. Any amount of corruption will result in a different value and the page will be flagged as corrupted.

Since the default page verification option changed, new databases are protected, but if an older database is upgraded, the page verification option is not changed. If you have really old databases, they may not be fully protected. You should check the page verification option in sys.databases and update those databases.

Here’s the tricky part. If you change the page verification option, you are not automatically protected. It doesn’t automatically rewrite every page to have checksum values when you change it. The next time a page is written to disk, the checksum value is written to the page. My recommendation when changing to checksum page verification is during the next index maintenance period, simply rebuild every index or heap (SQL Server 2008+) rather than rebuilding selectively. This will cause every table and index to be rewritten to disk and ensure that all data structures are protected by checksums.

The third page verification option is NONE. This has NEVER been the default page verification option and if you have databases using this option then it is time for someone to find a new job. This option is only set if someone changes it and whether the change was done maliciously or due to not knowing better, that person should not be touching SQL Server.

Backup and Restore With Checksum

The final 2 ways to use checksum is with the backup and restore commands. When you use WITH CHECKSUM with the backup and restore commands, it performs extra checks. If checksums exist on a page (see previous section), it will recalculate the checksum values and ensure that they are still corruption free. This does NOT replace the need to run regular integrity checks with DBCC CHECKDB, but it gives you an extra opportunity to catch corruption. Furthermore, if you backup a database that has become corrupted without using this option, it may complete successfully with no warnings or errors. This often leads to corruption getting backed up and restored to other servers or environments without anyone realizing it. Additionally, when the backup completes, it will generate a checksum for the entire backup.

When restoring a backup using WITH CHECKSUM, it performs the same checks. The command does require that the backup was created using the checksum option, but it ensures that you don’t restore a corrupted database without being aware you are doing so. Also, if the backup became corrupt after it was created, then the restore is able to detect that and fail the restore almost immediately rather than waiting until it hits the corrupt page. If the backup file became corrupt, the checksum for the backup will be different and for a very large database, this can save you hours of time.

If you did hit corruption when restoring or backing up with the checksum option, you can complete the backup or restore using the CONTINUE_AFTER_ERROR option. Creating a backup of a corrupt database using this option will mark the database as being damaged and will ensure that the backup cannot be restored unless the continue after error option is also used for the restore. This will prevent someone from unwittingly restoring a backup of a corrupt database without realizing it.

EDIT: recommendation from Paul Randal (blog|@PaulRandal)

A great way to check a backup for corruption is to run RESTORE VERIFYONLY … WITH CHECKSUM; to perform the same checks (check the checksum of every page with checksum values and check the checksum of the backup file) without actually restoring it.

See it in Action

I put together a demo so you can prove to yourself the value of using all 3 checksum options. Rather than explain the demo in the blog post here, the demo code is heavily commented. It uses the two sample databases I’ve used in 2 earlier posts in this series. You may already have them downloaded, or you can download them again. I have provided the demo script and the 2 sample corrupt databases in separate zip files so you can download only what you need.

Sample corrupt databases: SampleCorruptDBs.zip (12.33 MB)
Demo code: Demo_BackupWithCheckSum.zip (2 KB)

-- Check page verify setting
-- Note that both databases have Checksums enable
Select name, page_verify_option_desc
From sys.databases
Where name in ('PFSCorruption', 'AdventureWorksDW2012');

-- Check to see if the databases are corrupted
-- Note that both databases are
DBCC CheckDB(AdventureWorksDW2012)
    With No_InfoMsgs, All_ErrorMsgs, TableResults;

DBCC CheckDB(PFSCorruption)
    With No_InfoMsgs, All_ErrorMsgs, TableResults;

-- Backup PFSCorruption with Checksum
-- Note that it suceeds even though we know it is corrupt
-- Pages have not had checksum values written yet
Backup Database PFSCorruption
    To Disk = 'C:\bak\PFSCorruption_checksum.bak'
    With Init, Checksum;

-- Backup AdventureWorksDW2012 with Checksum
-- Note that this backup fails and notifies us that it is corrupt
-- Pages have had the checksum values written to them
Backup Database AdventureWorksDW2012
    To Disk = 'C:\bak\AdventureWorksDW2012_checksum.bak'
    With Init, Checksum;

-- Backup AdventureWorksDW2012 without Checksum
-- Suceeds with no warning or errors
Backup Database AdventureWorksDW2012 To Disk = 'C:\bak\AdventureWorksDW2012_nochecksum.bak'
    With Init;

-- Use Continue_After_Error option to get backup of corrupt database
Backup Database AdventureWorksDW2012
    To Disk = 'C:\bak\AdventureWorksDW2012_checksum.bak'
    With Init, Checksum, Continue_After_Error;

-- What happens if we restore AdventureWorksDW2012_nochecksum.bak?
-- Database restores with no errors. Another lost opportunity to catch corruption.
-- We have potentially spread corruption to different server/environment
Restore Database CorruptDB_nochecksum
    From Disk = 'C:\bak\AdventureWorksDW2012_nochecksum.bak'
    With Move 'AdventureWorksDW2008R2_Data' To 'c:\bak\AdventureWorksDW2008R2_Data.mdf',
        Move 'AdventureWorksDW2008R2_Log' To 'c:\bak\AdventureWorksDW2008R2_Log.ldf';

-- Let's try it again with the checksum option
If DB_ID('CorruptDB_nochecksum') Is Not Null
    Drop Database CorruptDB_nochecksum;

-- Restore fails because we cannot use checksum for restore unless
-- we used it for the backup
Restore Database CorruptDB_nochecksum
    From Disk = 'C:\bak\AdventureWorksDW2012_nochecksum.bak'
    With Checksum,
        Move 'AdventureWorksDW2008R2_Data' To 'c:\bak\AdventureWorksDW2008R2_Data.mdf',
        Move 'AdventureWorksDW2008R2_Log' To 'c:\bak\AdventureWorksDW2008R2_Log.ldf';

-- What happens if we restore AdventureWorksDW2012_checksum.bak
-- Restore fails because it was marked as a corrupt database
Restore Database CorruptDB_checksum
    From Disk = 'C:\bak\AdventureWorksDW2012_checksum.bak'
    With Move 'AdventureWorksDW2008R2_Data' To 'c:\bak\AdventureWorksDW2008R2_Data.mdf',
        Move 'AdventureWorksDW2008R2_Log' To 'c:\bak\AdventureWorksDW2008R2_Log.ldf';

-- Let's try it again with the checksum option
-- Restore fails because it was marked as a corrupt database
-- Requires using the continue after error option
Restore Database CorruptDB_checksum
    From Disk = 'C:\bak\AdventureWorksDW2012_checksum.bak'
    With Checksum,
        Move 'AdventureWorksDW2008R2_Data' To 'c:\bak\AdventureWorksDW2008R2_Data.mdf',
        Move 'AdventureWorksDW2008R2_Log' To 'c:\bak\AdventureWorksDW2008R2_Log.ldf';

-- Let's try it again with the checksum and Continue_After_Error options
-- Suceeds with a low level warning telling us that the database
-- was damaged and should be checked out
Restore Database CorruptDB_checksum
    From Disk = 'C:\bak\AdventureWorksDW2012_checksum.bak'
    With Checksum,
        Move 'AdventureWorksDW2008R2_Data' To 'c:\bak\AdventureWorksDW2008R2_Data.mdf',
        Move 'AdventureWorksDW2008R2_Log' To 'c:\bak\AdventureWorksDW2008R2_Log.ldf',
        Continue_After_Error;

-- Cleanup extra database(s)
If DB_ID('CorruptDB_checksum') Is Not Null
    Drop Database CorruptDB_checksum;

If DB_ID('CorruptDB_nochecksum') Is Not Null
    Drop Database CorruptDB_nochecksum;

Summary

You can see that the checksum options I’ve described in this post are very useful for protecting the integrity of your data and for increasing the chances that you will find corruption early as well as being the quickest way to detect a corrupt backup file. I highly encourage you to step through the demo code and see how it protects you and can save you a lot of hardship down the line.

Posted in: SQL Server / Tagged: 31 Days of Disaster Recovery, Disaster Recovery, Internals

21 Thoughts on “Day 9 of 31 Days of Disaster Recovery: Use All the Checksums”

  1. SQLSoldier on January 9, 2013 at 10:57 am said:

    Thanks to Paul Randal again for pointing out that the 2-bit pattern used for torn page detection uses an alternating pattern rather than repeating the same pattern every 512 bytes. Also, thanks to Paul for the recommendation of using RESTORE VERIFYONLY … WITH CHECKSUM;.

    See edits in post.

    Reply↓
  2. Sqlchow on January 9, 2013 at 11:29 am said:

    For the torn page detection, How does it determine what pattern to write?

    Reply↓
    • SQLSoldier on January 9, 2013 at 12:21 pm said:

      As far as I know, the details of the bit pattern are not documented. Paul Randal may know, but I’ve never seen that info made public.

      Reply↓
      • Paul Randal on January 10, 2013 at 3:36 pm said:

        Starts with 10, 01, 10, 01 alternating down the blocks, then flips them next time it writes the page.

        Reply↓
        • SQLSoldier on January 10, 2013 at 4:19 pm said:

          Thanks again Paul!

          Reply↓
  3. Angie on January 10, 2013 at 6:41 am said:

    I just scripted out the restore code from the GUI and it does not use CheckSum. That would have been nice if they coded for it.

    Reply↓
    • SQLSoldier on January 10, 2013 at 2:02 pm said:

      Agreed. Sadly, the GUI doesn’t support a lot of good features.

      Reply↓
      • Martin Catherall on April 12, 2013 at 1:12 pm said:

        It seems amazing that SSMS uses the checksum option differently when scripting backups if you choose compression verses np compression.
        Using compression I get checksums scripted using SSMS 2008 / 2012 and when compression is left off the backup then checksums are not scripted – I checked BOL and it stats this also. Yet another reason to go the TSQL route.
        Great series Robert.

        Reply↓
        • SQLSoldier on April 12, 2013 at 1:43 pm said:

          Thanks Martin! I wish they would make checksum the default option for backups and restores, but that would probably break backwards compatibility rules.

          Reply↓
  4. Pingback: Day 10 of 31 Days of Disaster Recovery: Monitoring for Corruption Errors | SQLSoldier

  5. Pingback: Day 11 of 31 Days of Disaster: Converting LSN Formats | SQLSoldier

  6. David on January 14, 2013 at 2:16 pm said:

    Hi, i have a vendor DB that is set to torn page detection. We are running on SQL 2008 R2, I am guessing that if the default is with checksum the vendor must have changed the setting to torn page detection when they setup the DB. Are there any possible reason they would do this? Are there any risks to changing it to checksum?

    Reply↓
    • SQLSoldier on January 14, 2013 at 3:57 pm said:

      Hi David. The vendor’s DB was probably developed initially on SQL 2000 or earlier version when torn page detection was the default and then they generated scripts that hard-coded the torn page option.

      There is no risk to changing it to checksum, but I would first verify that this would not break your support agreement with the vendor.

      Reply↓
    • SQLSoldier on January 14, 2013 at 8:08 pm said:

      Sorry David. I saw you posted a reply, but it accidentally got deleted. Can you repost it?

      Reply↓
  7. David on January 14, 2013 at 9:01 pm said:

    No worries, I will follow up with the Vendor regarding checksum. My question was around the need to do a complete index rebuild after turning on checksum. Our vendor provide all the DB maint jobs that they say need to be run against the DB. Can i just turn on checksum and leave SQL to just create the checksums as the data is accessed or? I understand that the pages will not have a checksum created until the are changed but this would be better then just staying on torn_page_detection.

    Reply↓
    • SQLSoldier on January 14, 2013 at 9:19 pm said:

      I generally recommend doing a full rebuild of all indexes during your next index maintenance period. Otherwise, There could be areas of your database not properly protected. I guess you need to weigh the risks and benefits and decide if you think it is okay to wait.

      Reply↓
  8. Pingback: Day 15 of 31 Days of Disaster Recovery: Running DBCC CheckTable in Parallel Jobs | SQLSoldier

  9. Pingback: Day 19 of 31 Days of Disaster Recovery: How Much Log Can a Backup Log | SQLSoldier

  10. Pingback: Day 24 of 31 Days of Disaster Recovery: Handling Corruption in a Clustered Index | SQLSoldier

  11. Shaun on February 25, 2013 at 5:35 am said:

    Is there any way to detect the number of pages or percentage of pages which do not have a checksum for a database which has been changed from torn page detection (if the maintenance window didn;t allow for a full index rebuild for example)?

    Reply↓
    • SQLSoldier on February 25, 2013 at 8:26 am said:

      Unfortunately, that isn’t tracked anywhere as an aggregate. You would have to dump every page using DBCC PAGE() to see if it had a checksum value. The closest you could come in estimating this without dumping every page would be to dump just the differential change map (DCM) pages and try to estimate how much of the database had changed since the last full backup. This would still be inaccurate because the DCMs track changed extents, not pages. So an extent could be changed but have anywhere between 1 and 8 pages in the extent that have been changed.

      It may be better to figure out a way to do the index rebuild rather than trying to calculate how much of it is protected.

      Reply↓

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Post Navigation

← Previous Post
Next Post →
<

Remote DBA Services
- serious SQL Server expertise for less than a full-time DBA
My Articles
 
My Book
Check out my interview on

Extreme Data Recovery (with Argenis Fernandez)
10 Things all BI System Administrators Should Know
Upcoming Events
    All events shown in Pacific Time

    No events to show

RSS My SQL Server Magazine Articles

  • Database Mirroring for Disaster Recovery September 16, 2011
  • Comparative Review: Database Schema Comparison Tools August 24, 2011
  • 3 Log Shipping Techniques June 22, 2011
  • Hardening SQL Server June 20, 2011
  • Review: ScriptLogic Security Explorer for SQL Server February 8, 2011

Tags

31 Days of Disaster Recovery Architecture Automation CDC & Change Tracking Data Architecture VC Database Mirroring DBCC Denali Disaster Recovery Dynamic Management Views Extended Events Gamers & Geeks General Discussion High Availability How do I ... ? Humor Idera ACE Program Internals MCM Meme Monday Performance & Optimization PowerShell Professional Development Replication Security SQLBits SQL PASS SQL PASS Summit SQLRally SQL Saturday SQL Server Magazine SQL University SSAS & BI SSIS SSMS SSRS T-SQL T-SQL Tuesday tempDB Tips & Tricks Travel Troubleshooting Undocumented Stuff Whitepapers XML in SQL

News

Download my Powershell Scripts

The following scripts can be downloaded as text files. You will need to change the file extension to .ps1 in order to execute them.

Backup a database
Restore a database
Scan a server to find a free port
Query DNS to get the FQDN of a server


To see some examples of my other forms of writing, please visit my page on WritersCafe.org. It is almost exclusively horror fiction, but I sometimes throw other things in there too from time to time. There's one science fiction story, a couple of poems, and quite a few humor pieces as well.


Look for me in the SQL Q&A section of the August, 2007 issue of TechNet Magazine.
August issue of TechNet Magazine's SQL Q&A column

Protect our Heroes

© Copyright 2012 - Robert L Davis
Infinity Theme by DesignCoral / WordPress

Twitter Twitter 
LinkedIn LinkedIn 
TLF TLF RSS RSS 
WritersCafe WritersCafe 
SQLPASS SQLPASS 
Facebook Facebook
grab this