Corruption Bug: Common Criteria and SA.

When I talk about recovering from database corruption, I mention that occasionally corruption can be caused by bugs in SQL Server. Such things are very rare, but they do happen. Case in point – MS just recently updated an article stating such. KB2888996 details a specific set of circumstances where actually running CHECKDB can cause corruption.

Scary, huh?

I thought it would be an interesting exercise to reproduce those results, and then test the patch for myself. Feel free to follow along.

WARNING: Once again, we are causing intentional damage, here. Do this only on a test system where you can afford to simply reinstall SQL Server if everything goes to hell. And before you do anything else, take a backup of your master database. Do it now. I’ll wait. <muzak plays…>

Ready? Ok, here we go…

  • Make sure you’re running an affected version of SQL Server. It should be Enterprise, Standard or Developer edition of SQL Server 2014, down to 2008 R2. See the KB article mentioned above for the specifics.  Note that you’ll need to be running a version earlier than the hotfixes or cumulative updates that they mention in the fix.
  • Enable common compliance criteria on your SQL Server instance using:
EXEC sp_configure 'common criteria compliance enabled',1;
GO
RECONFIGURE WITH OVERRIDE;
GO
  • Install the auditing trace procedures, available on Microsoft’s SQL deployment page.
  • Restart the SQL Server service. Common Criteria is one of those rare cases where just running RECONFIGURE WITH OVERRIDE won’t actually cause the change to take effect.
  • Now log into the server as SA, and run:
DBCC CHECKDB(master) WITH NO_INFOMSGS, DATA_PURITY;

Remember that out of the box, data purity checks on master and model are not enabled. That’s why I’m including it in the command above, even though it’s a default from 2005 going forward.

In my case, when I ran CHECKDB, I got the following:

Msg 2570, Level 16, State 2, Line 1
Page (1:85), slot 26 in object ID 58, index ID 1, partition ID 281474980511744, alloc unit ID 72057594037993472 (type "In-row data"). Column "name" value is out of range for data type "nvarchar".  Update column to a legal value.
CHECKDB found 0 allocation errors and 1 consistency errors in table 'sys.sysbinobjs' (object ID 58).
CHECKDB found 0 allocation errors and 1 consistency errors in database 'master'.

So there is our data purity error. These are kind of a mixed bag. You can’t run a repair statement to fix them, but then again, you don’t risk losing any more data than you already have.  What it means is that SQL Server cannot interpret the data within a column, because the data isn’t valid or within the correct range for the data type of that column. In this case, we have an nvarchar column with a single byte of data in it.

Now you’ve probably guessed by this point, and if you haven’t then I’ll make it clear: the corruption in this case was NOT caused by CHECKDB. It has to do with some peculiarity in the security setup here. Somehow something is being logged in such a way that is bypassing the data type check, and inserting a record into this table that doesn’t fit the data type. Let’s take a look at this using DBCC PAGE.  If you’re not familiar with DBCC PAGE, please see my previous post on it, DBCC PAGE to the Rescue.

DBCC TRACEON(3604); /* Enables output to console */
GO
DBCC PAGE (1,1,85,3); /* dbid 1, file 1, page 85, style 3 (show me everything) */
GO

After bringing up the page, we do a search for slot 26, and find:

Slot 26 Column 4 Offset 0x2f Length 1 Length (physical) 1
name = INVALID COLUMN VALUE

Just for comparison, here’s what a normal, single-character value of that type would look like:

Slot 0 Column 1 Offset 0xb Length 2 Length (physical) 2
col1 = a

See the problem? Because this is an nvarchar type, the minimum length of a value is going to be 2 bytes. One for the data, and one for the unicode data that goes with it. (See here for the details on what I mean regarding the data types.) So, the value stored is too small for SQL Server to attempt to interpret as a valid nvarchar.

In the demos I do, I fix a data purity error with an update statement. Unfortunately, you can’t directly access this particular table. It’s a hidden system table, so there’s no way to even select from it, much less update it.

So what *can* we do? Restore master from a backup. Well, technically we could just ignore the error by running CHECKDB against master with PHYSICAL_ONLY, which would just avoid data purity checking. You can also disable data purity checking on master by enabling trace flag 2566.  Keep in mind, though, that disabling data purity checks means that if any new data purity issues crop up, you’re not going to know. I’m not in favor of this option, so let’s go to that backup we took at the beginning of this exercise.

Restoring master is understandably different than restoring any other database, and here’s how you have to do it. You need to stop the SQL Server service, then start it from the command prompt in single user mode, using the -m switch. In my case, I have multiple instances of SQL running on this particular machine, so I also need to specify which instance using the -s switch. Like so:

X:\<path to sqlservr.exe>\sqlservr.exe -m -s mssqlserver

Note that specifying “mssqlserver” will start the default instance, which is where we’re testing this. In a separate command window, I logged into my instance using sqlcmd.exe and restored master with a plain old restore command:

RESTORE DATABASE [master] FROM DISK = 'X:\<path to file>\master_full.bak' WITH REPLACE;

When the restore of master is complete, SQL Server will automatically shut down. I was already at the command line, so I restarted both SQL and the SQL Server Agent with net start…

Net start mssqlserver
Net start sqlserveragent

Logging in with my user account, (not sa), and running CHECKDB as above, shows that the database is now clean.

So to avoid this, should you fit the criteria at the beginning of this article and on Microsoft’s KB article, please be sure you download and install the appropriate update for your instance of SQL Server.

Keep checking those databases for corruption, folks.

Thanks for reading.

-David.

Advertisements

Please Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s