Sunday, April 15, 2012

Data Corruption Part 1: Nonclustered Index Corruption

                In this mini-blog-series I will be going over the different victims of data corruption:  Mainly nonclustered indexes (this post), clustered indexes, and system tables.  The first part of this blog post is how to corrupt a nonclustered index in a TEST environment in order to learn and practice the resolution to such a problem.  The base database I’ll be using is AdventureWorks2012 (it can be found here).  I then restored a copy of AdventureWorks2012 to corrupt, and called it AdventureWorks2012_CorruptNCI:

restore database AdventureWorks2012_CorruptNCI
from disk = 'C:\DatabaseFiles\AdventureWorks\AdventureWorks2012_base.bak'
with
       move 'AdventureWorks2012_Data'
              to 'c:\MyDataDir\AdventureWorks2012_CorruptCI_Data.mdf',
       move 'AdventureWorks2012_Log'
              to 'c:\MyLogDir\AdventureWorks2012_CorruptCI_log.ldf',
       stats = 5
go


Corrupting the Data
                First off, I want to emphasize that we are corrupting data!  In other words, THIS NEEDS TO BE DONE IN AN ISOLATED TEST ENVIRONMENT WITH TEST DATA AND DATABASES.  What we want to do now is corrupt a nonclustered index in our new AdventureWorks2012_CorruptNCI database.  I randomly chose the Person.EmailAddress table, so I wanted to get all of the indexes on that table:

select
       t.name as table_name,
       i.name as index_name,
       i.index_id,
       i.type_desc
from sys.indexes i
inner join sys.tables t
on i.object_id = t.object_id
where t.name = 'EmailAddress'
order by i.index_id

                After viewing the result set, I see there is nonclustered index on this table called IX_EmailAddress_EmailAddress (note the index_id of 2):  That will be our victim.  Now we want to see all of the page information of this index.  In order to do this, we’ll utilize the DBCC IND command to output a record per index page showing us certain page metadata:

dbcc ind('adventureworks2012_corruptnci', 'person.emailaddress', 2)
go

                This shows us a list of all pages in the nonclustered index, IX_EmailAddress_EmailAddress.  Choose a page, any page.  This page will be the one that we target for data corruption.  I have randomly selected a data page with an ID of 18250 (your PagePID may vary).  Now we want to just get a glimpse of the data contained on that page, and to do that we’ll make use of the DBCC PAGE command:

dbcc page('adventureworks2012_corruptnci', 1, 18250, 3)
go

                Now that we know we want to corrupt our specific page (PagePID 18250 in my case), I’ll do a quick calculation to get the page offset in the file.  We know that a page is 8 KB = 8192 B:

select 18250 * 8192
-- page offset: 149504000

                I get a resulting page offset of 149504000.  In order to actually corrupt the containing data file, we need to take the database offline to release the NTFS lock on the file:

use master
go

alter database AdventureWorks2012_CorruptNCI
set offline
with rollback immediate
go

                Now for the fun part:  Corrupting the data.  You can use a hex editor to alter the contents of the data file, and the one I typically use for direct file manipulation is XVI32 Hex Editor (it’s free and exactly what you need for this).  It can be found here.  The following explicit instructions are directions for this particular application, so if you are not using the XVI32 Hex Editor, then you may need to do different steps. 

Open up the AdventureWorks2012_CorruptNCI data file that contains the nonclustered index.  Click on Address and then GoTo and input your page offset (ensure that the decimal format is selected).  In my case, I entered in 149504000.  Click OK and then XVI32 will take you right to the beginning of that particular page.  On the right you will see remnants of the actual data of the table (if you’ve opted to use the EmailAddress table as well, you’ll see AdventureWorks email addresses).  On the left will be the hex representation of that data.  Click anywhere on that data page (I recommend using the right side to navigate to a character), and then zero out (00) the hex value for that byte.  That is enough!  Our data is now corrupted.  Save and close the data file.

Back in SSMS, set the database online:

use master
go

alter database AdventureWorks2012_CorruptNCI
set online
go

                Now I want to just make sure that we definitely corrupted the index:

use AdventureWorks2012_CorruptNCI
go

select EmailAddress
from Person.EmailAddress

                This should give you an error stating that “SQL Server detected a logical consistency-based I/O error: incorrect checksum…….” so on and so forth.


Fixing a Corrupted Nonclustered Index
                So the above was just a really slick way to corrupt a nonclustered index in our test database.  We broke it, and now it’s time to fix it.  But in reality, we initially won’t know exactly what is corrupt.  So you run a routine DBCC CHECKDB, or a user has an error of sorts, and now it’s time to get to the bottom of it.  Running a DBCC CHECKDB will give you the information you’ll need as to what’s corrupt:

dbcc checkdb with no_infomsgs

                The biggest takeaway here is the partition ID of the corrupt table/index.  In my case, my DBCC CHECKDB tells me the issue is with partition number 72057594050445312.  So now I want to find out exactly what table and index that is for (if you did the above process of data corruption, you already know.  But just for explanation’s sake, view the query results):

select
       p.partition_id,
       o.name as table_name,
       i.name as index_name,
       i.type_desc
from sys.partitions p
inner join sys.objects o
on o.object_id = p.object_id
inner join sys.indexes i
on o.object_id = i.object_id
and p.index_id = i.index_id
where partition_id = 72057594050445312

                Bingo!  This tells me that the corrupt partition is indeed a nonclustered index on Person.EmailAddress, and the faulty index is IX_EmailAddress_EmailAddress.

                When a nonclustered index is corrupt, the first step is to try a rebuild (and, it goes without saying but a little investigation as to the cause of the data corruption is warranted and necessary, otherwise for things like a failing drive, it’ll happen again):

alter index IX_EmailAddress_EmailAddress
on Person.EmailAddress
rebuild
go

                If the rebuild isn’t possible, you may need to drop and recreate the index.  The easiest way to do this is through SSMS:  Right-Click on the index and select Script Index as DROP and CREATE to New Query Window.

                And, just for good measure I’ll run my DBCC CHECKDB to ensure that all is well.  And in my case, it is:

-- run DBCC CHECKDB to ensure the database is back to normal
dbcc checkdb with no_infomsgs


Summary
                When we run into corrupt data, our first step is to find out what exactly is that is corrupt.  Far too often we want to run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS, but that isn’t always the right answer.  What was explained above is when the victim of data corruption is a nonclustered index.  The first step is to try to rebuild the index if possible.  No harm, no foul.  In the next couple of blog posts I’ll be going over different victims of data corruption and their fixes.  If there are any questions, comments, or issues please feel free to leave a comment or email me at sqlsalt@gmail.com.

Code
                As always, the corresponding code can be found on my SQLSalt GitHub repository:


Comment
                Thanks to Paul Randal for his insightful comment and correction on my post, in regards to rebuild the corrupt index in lieu of jumping to a drop and create first (as well as scripting out the DROP and CREATE through SSMS).  Paul Randal is a leading expert in all things SQL Server.  Thanks, Paul!!  Much appreciated.  Quoted comment from Paul Randal for reference:
“Beware of dropping and recreating the index - if it's enforcing a constraint then the constraint could be violated while the index is dropped, meaning it can't be created again. Better to try rebuilding it first always. I'd also advise using SSMS to script-create the index before dropping it just in case the system tables change in a future release - you don't have to rely on your own code.”

1 comment:

  1. Beware of dropping and recreating the index - if it's enforcing a constraint then the constraint could be violated while the index is dropped, meaning it can't be created again. Better to try rebuilding it first always. I'd also advise using SSMS to script-create the index before dropping it just in case the system tables change in a future release - you don't have to rely on your own code.
    Cheers

    ReplyDelete