In the last post (Part
1) I discussed how to approach a nonclustered index data corruption
issue. What I am going to be explaining
here is what to do with data corruption of the clustered index. I want to start out with a copy of my
AdventureWorks2012 database:
restore database AdventureWorks2012_CorruptCI
from disk = 'C:\MyDir\adventureworks2012_base.bak'
with
move 'adventureworks2012_data'
to 'C:\MyDir\AW_Data_CorruptCI.mdf',
move 'adventureworks2012_log'
to 'C:\MyDir\AW_Log_CorruptCI.ldf',
stats = 5
go
Now I want to corrupt a
clustered index. Last week in Part
1 I went into a long explanation and demonstration how to corrupt data
using a Hex Editor. I’m not going to
rehash, as the process is basically the same.
The only difference between last week’s nonclustered index and this
week’s clustered index corruption is that I want to choose the clustered index
of a table, and I have personally chose the Person.Person table
(you’ll see why soon. Hint:
constraints).
So corrupt a clustered index,
any clustered index. Feel free to work
along with me and corrupt the Person.Person clustered index. Now set the database back online and run DBCC CHECKDB:
alter database AdventureWorks2012_CorruptCI
set online
go
dbcc checkdb('AdventureWorks2012_CorruptCI') with no_infomsgs
go
Notice what you see. Very much like last week’s result, we get a
handful of errors. Notice the summary of
the DBCC command:
CHECKDB found 0 allocation errors and 4 consistency
errors in table 'Person.Person' (object ID 1765581328).
CHECKDB found 0 allocation errors and 4 consistency
errors in database 'AdventureWorks2012_CorruptCI'.
repair_allow_data_loss
is the minimum repair level for the errors found by DBCC CHECKDB
(AdventureWorks2012_CorruptCI).
I have bolded the above portion
because I believe it is worth discussing.
SQL Server actually plants the thought in the DBA’s head to use REPAIR_ALLOW_DATA_LOSS
with DBCC CHECKDB as the “minimum”. Now, one may be thinking “well this is the
minimum, and SQL Server said it is okay then I think I’ll go right ahead. It must
be the right answer.” Not always. And we’ll see why now. Let’s say you did run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS:
alter database AdventureWorks2012_CorruptCI
set single_user
with rollback immediate
go
dbcc checkdb('AdventureWorks2012_CorruptCI', repair_allow_data_loss)
go
Great! It finishes, and all the issues were
fixed. So you grab your car keys, call
your significant other, and tell them you’re on the way home. Easy workday, no late dinner tonight! Or so you thought. Meanwhile, you’re grinning from ear to ear
because you’ve fixed it… but there is something that was missed: Something very important. And then you receive the dreaded phone
call. Here’s what was missed. Take for instance my Person.Person table that
I corrupted. Even after the REPAIR_ALLOW_DATA_LOSS
normal queries like this are successful:
select *
from Person.Person
So what could possibly be the
problem? Here’s what: REPAIR_ALLOW_DATA_LOSS does just
that… it may possibly delete data in order to fix the corruption. But there is still yet another looming issue
here. The answer is in this next DBCC
statement:
use AdventureWorks2012_CorruptCI
go
dbcc checkconstraints
go
On my end that returned a
whopping 25 foreign key relationships that were flagged as culprits:
What looks like happened is that
during my DBCC CHECKDB with the option to REPAIR_ALLOW_DATA_LOSS,
SQL Server deleted five rows of data.
Well, because there were foreign key constraints referencing Person.Person,
that has now become a pretty big problem (not to mention the underlying fact
that we have just deleted data).
The solution? There is a
good chance that when the root data, in this case the clustered index, is
corrupt you will need to utilize a non-corrupt backup that you can restore
side-by-side with the corrupt database and copy data over. This is yet another reason why a great
disaster recovery plan is necessary.
Things happen, data gets corrupted.
And there’s not always a quick and easy command, statement, or query to
get it back to normal. Do the
appropriate investigation as to what exactly
is corrupt, and the best means to fix it.
The code for this test can be
found on my GitHub SQLSalt
repository. A direct link to this
test code is here. NOTE: THIS CODE AND TEST IS MEANT FOR ISOLATED TEST
ENVIRONMENTS ONLY. DO NOT RUN ANYWHERE
NEART PRODUCTION. If there are any
comments, questions, or issues please feel free to leave a comment or email me
at sqlsalt@gmail.com.
Beaware that REPAIR_ALLOW_DATA_LOSS can get you in more complicate corruption scenarios, so it should be considerate like the last resource. If you are able to identify the corrupted pages, you can always restore that page from backups, and avoid the REPAIR_ALLOW_DATA_LOSS, if the issue is located outside the primary Filegroup and you are running on enterprise version of SQL server, this can be done fully online :)
ReplyDeleteRegards
Kenneth