• SQL Server
  • Log Shipping Tricks Demo
  • SQLCruise Alaska 2012 Pics
SQLSoldier News From the Frontlines

Day 24 of 31 Days of Disaster Recovery: Handling Corruption in a Clustered Index

January 30, 2013 11:10 pm / 9 Comments / SQLSoldier

31 Days of Disaster Recovery

31 Days of Disaster Recovery

Welcome to day 24 of my 31 Days of Disaster Recovery series. Previously, I’ve talked about several different forms of corruption: Nonclustered Index, Allocation Pages, and Tempdb. these were all fairly simple to fix. Today I’m going to dive into a scenario that is a little more complex, clustered indexes.

Clustered indexes are the base data. This means we have to go to a restore scenario. Hopefully, the corruption isn’t wide spread. if it’s just 1 or a few pages, we can do page level restores. If it’s a lot of pages, it may be faster just to do a full restore. It may come down to a judgement call as to which you think is faster.

If you missed any of the earlier posts in my DR series, you can check them out here:

    31 Days of disaster Recovery

  1. Does DBCC Automatically Use Existing Snapshot?
  2. Protection From Restoring a Backup of a Contained Database
  3. Determining Files to Restore Database
  4. Back That Thang Up
  5. Dealing With Corruption in a Nonclustered Index
  6. Dealing With Corruption in Allocation Pages
  7. Writing SLAs for Disaster Recover
  8. Resolutions for All DBAs
  9. Use All the Checksums
  10. Monitoring for Corruption Errors
  11. Converting LSN Formats
  12. Extreme Disaster Recovery Training
  13. Standard Backup Scripts
  14. Fixing a Corrupt Tempdb
  15. Running DBCC CheckTable in Parallel Jobs
  16. Disaster Recovery Gems From Around The Net
  17. When are Checksums Written to a Page
  18. How to CHECKDB Like a Boss
  19. How Much Log Can a Backup Log
  20. The Case of the Backups That Wouldn’t Restore
  21. Who Deleted That Data?
  22. Which DBCC CHECK Commands Update Last Known Good DBCC
  23. Restoring Differential Backups With New Files

Identify the Corruption

I’ve created a sample database for us to use for this scenario. It has a table in it with a corrupt clustered index. It also has clean backups from before the corruption occurred so we can perform a page or a full restore. In order to facilitate recovery, I had to do something we tell you to NEVER do with a corrupt database. Instead of providing you with a backup of the corrupt database, I detached the database and copied the files. To run through this demo on your own, you’re going to need to attache the files as a new database.

The scenario here is that the database was offline (let’s say the server rebooted), and corruption occurred during that time. The database is online now and seems fine. We insert some data into our table, and everything works fine. Then we query the data, and we discover the corruption.

-- Take database offline and poof, corruption occurs
-- We don't know that yet though
-- Add more data
Insert Into dbo.FactInternetSales
Select Top(500) *
From AdventureWorksDW2012.dbo.FactInternetSales;
Go

-- Everything seems to be fine. Let's query the data
Select *
From dbo.FactInternetSales;
Go

-- Error occurs
Msg 824, Level 24, State 2, Line 2
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:298; actual 0:0). It occurred during a read of page (1:298) in database ID 8 at offset 0x00000000c14000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL12\MSSQL\DATA\CorruptDB.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

This is a fatal error, so our query window gets disconnected when the error pops up. We know this is a serious error. We need to determine the extent of the corruption. You may be tempted to run DBCC CHECKTABLE at this point, but the corruption could be more than just a single object. I highly recommend running the DBCC CHECKDB in almost every case.

-- Let's run DBCC CHECKDB (there may be other corrupt tables)
DBCC CHECKDB(CorruptDB) With No_InfoMsgs, All_ErrorMsgs, TableResults;
Go

-- Errors returned for:
-- Database ID 8 (CorruptDB)
-- Object ID 245575913 (dbo.FactInternetSales)
-- Index ID 1 (clustered index)

Okay, CHECKDB shows lots of error messages, but we need to look at the ones with a Severity of 16. It appears that just a single table and a single index is corrupted. At first look, it appears that there are 2 pages that are corrupt. Taking a closer look at the error message for page #299 shows that it is reporting that the pointer to it from page #298 is missing. Page #298 is the only one that is corrupted.

If we take a look at the corrupt page using DBCC PAGE, we can see that about half of the page was overwritten with repeating 0′s. This is a sure sign of disk corruption. You need to follow this up by having the disks checked. Disk corruption may be a sign of a failing disk so it is important to investigate right away.

Decide How to Deal With It

Since it is only a single page that is corrupted, I prefer to do a single page restore here. In order to do that, I have to have certain things already in place:

  • Good backup with a non-corrupted copy of the page in it. This is where having tested your backups previously really pays off. I can’t tell you how many times tables have sat corrupted for months on end before someone finds it and by then they no longer have sufficient backups to support a proper restore or they have to wait for many hours or days for another team to recover backups from tape that is in storage. TEST YOUR BACKUPS.
  • Log backups to bring the page current with the rest of the database. This means that if we’re in simple recovery mode or if we don’t have all the log backups, we’re dead in the water. Our only choice in that case would be to do a full restore and save as much of the data as possible.

In our case, we see that we have a full backup and a log backup. We will need to do a tail log backup as well to really bring it current. for this restore, I’m going to put the database into restricted-user mode, take the tail log backup, and then proceed with the restore process.

Page-level Restore

To process this restore, I’m going to put the database in restricted-user mode to kick the non-admin users out of the database. Then I will do the page-level restore including a tail log backup. I’ll restore the full backup specifying the page I want to restore. Then I will restore the existing log backup and then take a tail log backup I just took. Then I can recover the database and run DBCC CHECKDB again to ensure that the corruption is fixed. If the database is clean, we can let the users back in to the database.

-- Only the one page is corrupt, so let's do a page restore
-- Switch to master to restore the damaged page
USE master;
Go

-- Set the database in restricted user mode to keep average users out
Alter Database CorruptDB Set Restricted_User With Rollback Immediate;
Go

-- Restore the corrupt page from the good full backup
Restore Database CorruptDB
    Page = '1:298'
    From Disk = 'C:\Users\SQLSoldier\Documents\BlogFiles\CorruptDB\Backups\CorruptDB.bak';
Go

-- Restore the 1st pre=existing log backup to bring the page current
-- SQL knows which transations to apply, no need to specify any special commands
Restore Log CorruptDB
    From Disk = 'C:\Users\SQLSoldier\Documents\BlogFiles\CorruptDB\Backups\CorruptDB.trn'
    With NoRecovery;
Go
-- If there were more pre-existing log backups, we would restore them in order

-- Now backup the tail of the log...
Backup Log CorruptDB
    To Disk = 'C:\bak\CorruptDB_LOG_TAIL.trn'
    With init;
Go

-- Restore the tail of the log bringing the page current
Restore Log CorruptDB
    From Disk = 'C:\bak\CorruptDB_LOG_TAIL.trn'
    With NoRecovery;
Go

-- Finally, recover the database to bring it online
Restore Database CorruptDB With Recovery;
Go

-- Recheck the database for corruption again
DBCC CheckDB(CorruptDB) With All_ErrorMsgs, No_InfoMsgs, TableResults;
Go

-- Allow users back in
Alter Database CorruptDB Set Multi_User;
Go

-- Run the original query again
Use CorruptDB;

Select *
From dbo.FactInternetSales;
Go

Summary

At first glance, a page-level restore may seem a little tricky, but once you’ve done it several times, it starts to become old hand. Just remember the steps I recommend taking, and follow them one at a time. It’s not as difficult as it seems.

Download the demo scripts and sample corrupt database in zip format: Demo_ClusteredIndexCorruption.zip (5.03 MB)

Special thanks to Paul Randal (blog|@PaulRandal) for his guidance when I saw something weird happening. Something weird was happening, but not what I thought was happening. Talking to him about it cleared it up for me and allowed me to see my error that I was overlooking.

Posted in: SQL Server / Tagged: 31 Days of Disaster Recovery, DBCC, Disaster Recovery, Internals

9 Thoughts on “Day 24 of 31 Days of Disaster Recovery: Handling Corruption in a Clustered Index”

  1. Sandra on February 5, 2013 at 5:38 pm said:

    Are you missing the WITH NORECOVERY in your example where you execute the Restore Database CorruptDB Page = ’1:298′ From Disk = ‘C:\Users\SQLSoldier\Documents\BlogFiles\CorruptDB\Backups\CorruptDB.bak’;

    Reply↓
    • SQLSoldier on February 6, 2013 at 8:57 am said:

      Hi Sandra. No need to use NORECOVERY there. In fact, now that you bring it up, you don’t need to use NORECOVERY on the log file restores either. I did it out of habit. I’m just so used to using NORECOVERY when restoring multiple files.

      Reply↓
  2. Sandra on February 6, 2013 at 10:05 am said:

    Wow. So, is it because you’re restoring a page that you don’t have to use NORECOVERY on that restore or the subsequent LOG restores or am I missing something more basic here?

    Reply↓
  3. Sandra on February 6, 2013 at 3:05 pm said:

    Is it because you’re restoring a page that you don’t need NORECOVERY or am I missing something basic here.

    Reply↓
    • SQLSoldier on February 6, 2013 at 6:28 pm said:

      Right. When you’re doing a page restore, SQL Server knows exactly what to do. You don’t need to tell it.

      Reply↓
  4. Pingback: Day 27 of 31 Days of Disaster Recovery: Restoring Part of a Database | SQLSoldier

  5. Patrick on March 6, 2013 at 2:12 pm said:

    Hi Robert, I understand tail log backup needs to be taken before restore, right?

    Reply↓
    • SQLSoldier on March 6, 2013 at 9:14 pm said:

      Hi Patrick. Not for a single page restore like this. We are only restoring a single page, and we have to bring that page current with the rest of the database. If there are no transactions occurring at all, you can take it ahead of time, but if there are transactions occurring then you must do it after the restore, otherwise, the page can’t be brought up to date.

      Reply↓
  6. Pingback: Day 31 of 31 Days of Disaster Recovery: Backup and Restore of the Resource Database | SQLSoldier

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Post Navigation

← Previous Post
Next Post →
<

Remote DBA Services
- serious SQL Server expertise for less than a full-time DBA
My Articles
 
My Book
Check out my interview on

Extreme Data Recovery (with Argenis Fernandez)
10 Things all BI System Administrators Should Know
Upcoming Events
    All events shown in Pacific Time

    No events to show

RSS My SQL Server Magazine Articles

  • Database Mirroring for Disaster Recovery September 16, 2011
  • Comparative Review: Database Schema Comparison Tools August 24, 2011
  • 3 Log Shipping Techniques June 22, 2011
  • Hardening SQL Server June 20, 2011
  • Review: ScriptLogic Security Explorer for SQL Server February 8, 2011

Tags

31 Days of Disaster Recovery Architecture Automation CDC & Change Tracking Data Architecture VC Database Mirroring DBCC Denali Disaster Recovery Dynamic Management Views Extended Events Gamers & Geeks General Discussion High Availability How do I ... ? Humor Idera ACE Program Internals MCM Meme Monday Performance & Optimization PowerShell Professional Development Replication Security SQLBits SQL PASS SQL PASS Summit SQLRally SQL Saturday SQL Server Magazine SQL University SSAS & BI SSIS SSMS SSRS T-SQL T-SQL Tuesday tempDB Tips & Tricks Travel Troubleshooting Undocumented Stuff Whitepapers XML in SQL

News

Download my Powershell Scripts

The following scripts can be downloaded as text files. You will need to change the file extension to .ps1 in order to execute them.

Backup a database
Restore a database
Scan a server to find a free port
Query DNS to get the FQDN of a server


To see some examples of my other forms of writing, please visit my page on WritersCafe.org. It is almost exclusively horror fiction, but I sometimes throw other things in there too from time to time. There's one science fiction story, a couple of poems, and quite a few humor pieces as well.


Look for me in the SQL Q&A section of the August, 2007 issue of TechNet Magazine.
August issue of TechNet Magazine's SQL Q&A column

Protect our Heroes

© Copyright 2012 - Robert L Davis
Infinity Theme by DesignCoral / WordPress

Twitter Twitter 
LinkedIn LinkedIn 
TLF TLF RSS RSS 
WritersCafe WritersCafe 
SQLPASS SQLPASS 
Facebook Facebook
grab this