Script: Backup Throughput

Recently on one particular server, I noticed that backups were taking much longer than they should have. The databases or backups weren’t getting significantly larger,  so I wanted to know if there was a problem with the backup storage, and to do that I wanted to look at backup throughput over time. Here’s the query I wrote to do that.

The query makes use of two tables in MSDB, namely [backupset] and [backupmediafamily]. From [backupset], we can get the database name, start and finish times, and the size of the backup. To get the file name, we need to go to [backupmediafamily], and join on the media_set_id.  By calculating the difference in seconds between start and finish times, and converting the backup size from bytes to MB, we can get the backup throughput in MB/s.

Also, in this case, I’m filtering for backups that are either full or differential, and are taking longer than 10 minutes. This eliminates backups that only take a few seconds, since those backups will give atrificially low throughput numbers. (If I only backed up 1MB, and it took 1s, then my throughput is 1MB/s even if I could have done more.)

bs.database_name AS DBName
,bs.backup_start_date AS DateStarted
,bs.backup_finish_date AS DateCompleted
,Duration =
DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date)
,bs.backup_size / 1048576.0 AS DataSizeMB
,[MB/sec] = (bs.backup_size / 1048576.0) /
WHEN DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date) > 0
THEN DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date)
,bmf.physical_device_name AS BackupFile
FROM msdb.dbo.backupset AS bs
INNER JOIN msdb.dbo.backupmediafamily AS bmf
ON bs.media_set_id = bmf.media_set_id
WHERE bs.type != 'L'
AND DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date) > 600

That will give me the backup throughput for each Full or Differential backup running longer than 10 minutes that remains in the backup history. If you want to look at averages for all databases over time, you can group by parts of the datetimes. One way to do that would be like this:

WITH BackupThroughput AS (
bs.backup_finish_date AS DateCompleted
,[MB/sec] = (bs.backup_size / 1048576.0) /
WHEN DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date) > 0
THEN DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date)
FROM msdb.dbo.backupset AS bs
INNER JOIN msdb.dbo.backupmediafamily AS bmf
ON bs.media_set_id = bmf.media_set_id
WHERE bs.type != 'L'
AND DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date) > 600
BackupDate = MIN(DateCompleted)
,AverageThroughput = AVG([MB/sec])
FROM BackupThroughput

With that information in hand, I can check to see if there was an obvious drop in backup throughput, and then investigate that.

Hope that helps.

Shrink Is A Four Letter Word

Recently, I’ve been noticing some large-scale log growth on some of my developer SQL servers, which has been causing them to run out of disk space. Though shrinking files is almost universally a bad idea, there is a time and a place for doing so provided the ramifications of shrinking are understood. Needing to shrink a log file indicates that either we’re operating on more data than we expected, or operating inefficiently. Either way, it’s almost always the result of unplanned circumstances.  Under such circumstances, you can’t always use the option you want to. Sometimes you have to do something regrettable.

When *NOT* to shrink:

  • When the growth happens in a Data file. Shrinking is going to rob you of free space required for index maintenance, not to mention fragmenting the hell out of your existing indexes, making them near useless.
  • When the growth is in TempDB.  It can cause corruption. Either add TempDB files, or kill the transaction and recreate tempdb. You’ll need to restart SQL Server for that, and killing said transaction that big may cause a very long rollback. You’ve been warned.
  • As part of regular maintenance.  If your log files are growing repeatedly, then they’re trying to get to the size they need to be in order to handle your workload. Why are you shrinking them? What do you have against them? They have dreams, too…  Fix the workload, not the log files.

Alternatives to shrinking:

When to shrink. I can only think of two situations off the top of my head

  • Under an emergency where other workloads on the same disk are affected and there’s no additional disk elsewhere for another log file.
  • When a database is being archived / retired, and will be set to read-only. Even in that case, you wouldn’t just shrink the DB. See Retiring a Database.

Benefits to shrinking:

  • Free disk space.
  • That’s about it.

Caveats to shrinking:

  • Massive performance hit during re-growth, as all transactions must be suspended.
  • Massive performance hit due to file fragmentation on re-growth.
  • Massive performance hit due to internal fragmentation on shrink.
  • Massive perfor…. You get the idea.

So really, we want to avoid shrinking log files wherever possible. The best way to avoid shrinking is to properly size files to begin with.   Make sure you have enough room for your transactions to run. If you have large transactions or volumes of data to work with, one good way to avoid file growth is to batch large transactions into smaller groups. i.e. Instead of inserting 5M rows, insert 100k rows at a time, 50 times.  This can actually be faster than a single large insert, as you’re not stuck waiting for the log file to grow to accommodate 5M records.

Some growth may still occur since, let’s face it, we have a lot of data to work with. What we want to avoid is large, recurring growth. Remember that when the log file must grow, all transactions are suspended.  If you are only growing slightly, that may not be a problem, but when you have 100GB of growth, that’s going to stop you in your tracks for the duration of the growth.

If you must shrink a log file due to uncontrolled or unexpected growth, you can use the SHRINKFILE command, but with one caveat – I would recommend shrinking the file by half instead of to its smallest size.  This way, future growth may be avoided while queries are tuned to use less log space.   For example, if a log were to grow to 40GB, after averaging a size of 2GB, I would shrink the log to 20GB, like this:

USE <DatabaseName>;
DBCC SHRINKFILE(<file ID>,<target size>);



2 is the ID of the log file for most databases.  You can use sp_helpfile on a database to check that.  20000 is the target size, which is always in MB. Shrinking requires sysadmin or diskadmin permissions, which some developers do have.  I can also assist you with these tasks as needed.  Another advantage to this is that it gives us an idea of how large the production log files will need to be to accommodate new SQL code.

Also, I would like to stress that shrinking is used as a temporary fix, not as a workaround, and *never* as part of regular maintenance. If you have a job on a server that is regularly shrinking log files, then you either need more disk space or smaller transactions. Regular shrinking is a sign that something bad is going on. Also, while shrinking can be considered on log files, it is almost never a good idea to shrink a data file.  Avoid using SHRINKDATABASE.

Thanks for reading.

Dealing With Disk Space Issues.

On a typical day a few months ago, I received an automated warning in my inbox that a particular server was low on disk space. This came along with the requisite “panic” emails I usually get from other areas who are also on that notification list.  I responded that I was looking into it in order to keep the email volume down. :-)

I mapped the volume in question to my local machine and fired up SpaceSniffer to get a quick read on what was taking up all the space. It was a dedicated SQL Server data file drive and the usual suspects were taking up the lion’s share of the space. (SpaceSniffer is a fantastic tool, by the way. You should check it out, if you haven’t already.)

Now, when it comes to disk space issues, there are three basic options:

1. Do nothing.  This should always be considered as an approach, though you may rarely use this as a solution. Sometimes you really don’t need to do anything.

2. Reduce the space required. You can relocate files, clean up un-needed data, etc. You can even shrink files if you understand what the consequences would be and have a plan for dealing with that.

3. Increase the space available. In this case, we were on a SAN drive that had available space left, so we could grow the drive to make more space available.

Another quick check showed that most of the data files were over 90% used, and none of them would be able to grow unless more space was made available. That eliminated option 1. There were no other files on that drive to clean up or delete, i.e., no databases to drop, so we couldn’t reduce the amount of space required for option 2. That left us with option 3: Grow the drive.  This was an online operation handled off-hours by our SAN admin, and was successful.

After resolving any issue, it’s a good idea to go back and do some root-cause analysis. What caused the disk space issue?  One of the first things I do when taking on a new client or job is to set up a monitoring job on every server that keeps a few month’s worth of history on the database files. Using the sys.dm_io_virtual_file_stats DMV I take a snapshot of the disk activity for each data file, as well as it’s size on disk, every 15 minutes.  This is then logged to a table in my administrative database on each server.

With the stats in place, I was able to put together a quick query to see which database had grown the most over the last few months. It turned out that one of the smaller databases had now become the second biggest database, all within the space of the last 8 weeks. Checking with the owner of the application, I found out a lot of new data had recently been imported, causing the database to grow by over 600%.  After that, it didn’t take much more growth on the part of the other data files to push that disk over its threshold, and send an alert.

Here are the scripts I use, in case you’re interested.

  • DDL for FileIOStatsRaw table and vw_FileIOByHour view. – Created in the administrative DB. (In my case, [SQLAdmin].)
  • GetFileIOStats stored procedure – Run this as frequently as you want and output to the FileIOStatsRaw table. – Also created in the admin DB. Will keep 90 days by default, but call it how you want in an agent job.
  • Trend Query:
 CalculatedGrowthInMBPerMonth = SUM(FileGrowthBytes) / 1024.0
FROM dbo.vw_FileIOByHour 
WHERE FileType = 'ROWS' 
GROUP BY DBName, Yr, Mo 
ORDER BY DBName, Yr, Mo;

How do you monitor, measure and alert on disk space usage?

Thanks for reading.


Be A Master of Disaster

Disaster recovery plans are a must have. If you’re not motivated enough by recent natural disasters, or data breaches, or anything else that’s been wreaking digital havoc in this world lately, I’m not sure what will do the trick. If you haven’t made a DR plan, I would take a few minutes to just sketch out a quick, paper napkin-style plan.  What would you do first, second, third, etc… It doesn’t have to be complete, it just has to be a plan.  Go ahead. I’ll wait…

Got it?  Good.  Now. How do you know that will work?

Plans are an important, and great first step. Much like restores, however, they don’t do you any good unless you test them.  It’s not always easy, or even practical to do so. Sometimes even testing DR plans can cause outages.

Further, we may be in the habit of constantly testing our backups, but how often do we test DR plans?  Quarterly? Yearly? I’ve been in more than one DR rehearsal where we found that major changes in other parts of the infrastructure have had catastrophic impacts on our DR processes. How often do we do documentation in the first place, let along update all the related DR documentation when a major system changes?

In my current position, if disaster struck right now, I’m confident we would be fine. But only because we recently went through several very painful DR tests that did not go as planned. Just about everything that could go wrong, did.  However, I don’t see that as a bad thing. In fact, I see it as a very positive thing. We tested, failed, adjusted the plan, tested again, repeated that process a few more times, and off we went. Now we know we have a solid DR plan.

I think that’s worth a few failures. :-)

How do you test your DR plans?

Thanks for reading.


Automation is Power

Recently, I had someone looking over my shoulder as I was setting up a new server. I tend to welcome opportunities like that because I’m all about learning and teaching, and it’s always a positive thing to have an interested audience. :-) One of the many things my friend was surprised by is the number of jobs I have on a default installation of SQL Server. In short, once I’m done with an installation, before I start adding user databases, I create anywhere between 10 and 20 jobs, depending on the purpose of the server. Some of these are just maintenance tasks, like backing up or running CHECKDB on the system databases. Others are monitoring jobs, like logging the output of Adam Machanic’s (b | tsp_WhoIsActive every 5 minutes, or the output of a few of Glenn Berry’s (b | tdiagnostic scripts every 15 minutes. Not to mention keeping running logs of wait statistics, job history analytics, msdb history purging, cycling the error logs… you name it, if it’s a repeatable task, I have a job for it in my post-installation scripts.

Give that server a few months, and there are a whole host of other jobs that enter the picture. Many of my servers have hundreds of jobs on them. Outside of the user or vendor required jobs, or even the basic backup / CHECKDB / index & stats maintenance trifecta, I’ll have jobs in place that do automated restore tests, monitor for specific conditions, etc… In fact, any time someone comes to me with something that *can* be automated, I create a script and a job for it. I do this because, of all the one-time requests I’ve ever received from my users, almost none of them were actually one-time requests. They almost always crop up again. When they do, it’s a simple matter of opening up SSMS, right-clicking on the job, and selecting “Start job at step…”. Alternately, I could even schedule the task for a specific time, so I don’t even have to be there for it.

What kinds of tasks do you automate?

Thanks for reading.

Script: Can My Databases Grow?

Most of us try to proactively manage our available disk space as much as we can, but sometimes you’re not able to predict when a file will grow. We usually leave auto-grow on as a safeguard for this. One of the problems I’ve run into in the past is. having a very large growth setting or lack of disk space create a situation where there isn’t enough space on disk for a file to grow. There may be enough “free space” on the disk for whatever your monitoring threshold is, but maybe not enough for a growth or two.

Here’s a stored procedure you can use to check to see if any of the databases on your server would fail to auto-grow. It requires dbmail be set up. Also note that for this, and most scripts I will usually post, I have a separate administrative database that I put this in. Let me know if you have any feedback on this, or if you get any use out of it.

Thanks for reading.

USE [SQLAdmin];

David M Maxwell

Apr 24 2012

Checks to see if sufficient disk space is available for autogrow. If not, sends a warning.



EXEC dbo.AutogrowWarning
@dbaemail = ''

1. Nothing at this time.

Please send any comments or suggestions to

Date - Author - Change
Apr 24 2012 - David M Maxwell - Initial Version

@dbaemail varchar(500) = NULL -- DBA email address to receive warnings.


/* Test Parameters.  Run everything from here down to test. */
--DECLARE @dbaemail varchar(500), @msgquery nvarchar(4000)
--  @dbaemail = ''

/* Parameter check */
IF (SELECT @dbaemail) IS NULL
('ERROR: Email recipient address must be provided.',16,1)

/* Create temp tables for comparison. */
IF (SELECT object_id('tempdb.dbo.#DiskFree')) IS NOT NULL

DriveLetter char(1) NOT NULL,
FreePages int NOT NULL

IF (SELECT object_id('tempdb.dbo.#NextAutoGrow')) IS NOT NULL
DROP TABLE #NextAutoGrow

CREATE TABLE #NextAutoGrow (
DBName varchar(100) NOT NULL,
LogicalName varchar(100) NOT NULL,
FileType CHAR(10) NOT NULL,
DriveLetter char(1) NOT NULL,
Growth int NOT NULL

IF (SELECT object_id('tempdb.dbo.#AutoGrowFail')) IS NOT NULL
DROP TABLE #AutoGrowFail

CREATE TABLE #AutoGrowFail (
DBName varchar(100) NOT NULL,
LogicalName varchar(100) NOT NULL,
FileType CHAR(10) NOT NULL,
DriveLetter char(1) NOT NULL,
Growth int NOT NULL,
FreePages int NOT NULL

/* Create perm table, if it does not exist. */

IF NOT EXISTS (SELECT name FROM sys.tables WHERE name = 'AutoGrowWarnings')
AutoGrowWarnings (
WarningMessage nvarchar(max)

TRUNCATE TABLE dbo.AutoGrowWarnings;

/* Get the amount of disk space free on each volume. */
EXEC xp_fixeddrives;

/* Convert to pages for easier comparison, since master_files.size
is in pages. */
UPDATE #DiskFree
SET FreePages = FreePages * 128;

/* Get the next autogrow for each DB per drive letter. */
CASE is_percent_growth
WHEN 1 THEN (size / growth)
ELSE growth
END AS Growth
FROM master.sys.master_files
WHERE Growth > 0

/* Compare the next autogrow to the disk free and create
a list of files that autogrow would fail on, due to
insufficient disk space. */
FROM #NextAutoGrow NAG
ON NAG.DriveLetter = DF.DriveLetter
WHERE NAG.Growth > DF.FreePages;

/* If the list is not empty, email it to the DBA. */
IF (SELECT COUNT(*) FROM #AutoGrowFail) > 0

INSERT INTO dbo.AutoGrowWarnings
SELECT 'WARNING: As of ' + CONVERT(varchar(24),GETDATE(),120) + ',' + DBName + '.'
+ LogicalName + ' requires ' + CONVERT(nvarchar(24),growth)
' pages of free space for autogrow and only ' + CONVERT(nvarchar(24),FreePages)
' are available on drive ' + DriveLetter + '.'
FROM #AutoGrowFail

EXEC msdb.dbo.sp_send_dbmail
@recipients = @dbaemail,
@subject = 'WARNING: Possible Auto-Grow Failure',
@query = 'SELECT * FROM SQLAdmin.dbo.AutoGrowWarnings'


Two Approaches to Index Maintenance

Though well-done index maintenance is usually an iterative process, trial and assessment, I have two default modes for how I set up index maintenance tasks for a given database. I refer to them as “Aggressive” and “Relaxed”.

When I set up index maintenance tasks, I set them up to run as often as I can. I’m using Ola Hallengren’s IndexOptimize script, and I usually set it to run nightly, with a time limit dependent on what else goes on within that particular system. For example, on one server I have some backups that kick off at 8PM, and an ETL process that runs around midnight, so I set IndexOptimize to run at 6 PM, with a 90 minute time limit. That gives SQL a half hour to repopulate it’s buffer cache with data needed for queries or complete any running index tasks that may spill over the time limit. This seems to be a good setting for this system. It’s slightly different elsewhere.

The difference between what I consider aggressive, and more relaxed, is how I set the thresholds for fragmentation and index size. For a more aggressive plan, I would set those thresholds lower. For a more relaxed plan, I would set them higher.  Much depends on both the usage and the performance of the system, but as a baseline, this is a decent place to start.

Lower thresholds mean that the indexes are going to be maintained more often, but the average fragmentation levels will remain lower. Maintenance jobs will tend to average run times that are a bit longer and use more available RAM. Higher thresholds mean that indexes would be maintained less often, and fragmentation levels would average a bit higher. However, you’ll have shorter running maintenance tasks on average, and more available RAM left for caching query-related data.

Where to set these thresholds depends on the individual system, of course, but here’s what I use as a rough guideline. Keep in mind for the following thresholds that “page count” refers to the number of 8k pages contained in the index, so a page count of 1500 is around 12 MB of data.

Aggressive: Page Count = 1500, Defrag at 5%, Rebuild at 15%.
Relaxed: Page Count = 4000, Defrag at 15%, Rebuild at 30%

So on a system that has lots of available RAM for cache, or smaller indexes, I may be more relaxed with my index maintenance, since many of the indexes will reside in cache and reading them from disk will be more rare. On a system that would have less RAM or larger indexes, I’ll want to be more aggressive with my index maintenance, since I’ll be reading data more often from disk than from cache. It will take more time and resources, but will pay off in better query performance.

On the topic of frequency, index maintenance should ideally be done as often as you can allow. It hurts nothing to do some maintenance on your databases as long as nobody is using them. However, if you’re trying to schedule index maintenance in a tight window in order to keep the database available to users, or run maintenance along with other resource intensive jobs, then setting the thresholds as I describe may alleviate some contention between index maintenance and other process. You will also find it more efficient than the “rebuild everything every time” approach.

Thanks for reading,

Quickie Script – Create Index Rebuild or Reorg Commands

The scenario here is, you’ve just performed some massively fragmenting operation, like a shrink.  (Don’t look at me like that – it happens, just don’t do it as regular maintenance…) You know you need to get in and do some index maintenance. You could run your nightly index maintenance job, but in this case, you want to do this now, instead of waiting for off hours or running that job in the middle of the day.  Furthermore, you want to see what indexes need maintenance and maybe run some of the commands independently.  Here’s a quick script to make that easy for you.  It outputs the table and index name, page count, and percent fragmentation, as well as a one line command to rebuild or defrag the index, depending on what thresholds you want to use for that.


  Quick script to create index defrag / rebuild commands.
  Useful for when you KNOW you have the work to do. 
  Not so useful for ongoing index maintenance, and should
  not be used for such.  

  The usual cautions about running this in production, 
  fitness for a particular purpose, or guarantee that this
  will fix anything apply. 

  DMMaxwell, October 2012.


   TableName = SCHEMA_NAME(t.schema_id) + '.' + OBJECT_NAME(ips.object_id)
  ,IndexName = 
  ,Pages = ips.page_count 
  ,Fragmentation = ips.avg_fragmentation_in_percent 
  ,ReindexCmd = 'ALTER INDEX [' + + '] ON [' + 
    SCHEMA_NAME(t.schema_id) + '].[' + OBJECT_NAME(ips.object_id) + '] ' + 
      WHEN ips.avg_fragmentation_in_percent > 15 THEN 'REBUILD; ' 
        /* BOL recommendation is 30 - being more aggressive. */
      WHEN ips.avg_fragmentation_in_percent > 5 THEN 'REORGANIZE; '  
        /* BOL recommendation I actually like. */
FROM sys.dm_db_index_physical_stats(
  DB_ID('AdventureWorks'),  /* Your database name here. */
  NULL, /* Specify object ID or NULL for all of them.*/
  NULL, /* Specify index ID or NULL for all of them.*/
  NULL, /* Specify partition ID or NULL for all of them.*/
  'LIMITED' /* Going for speed, rather than depth of info. */
  ) ips
INNER JOIN sys.tables t
  ON ips.object_id = t.object_id
INNER JOIN sys.indexes ix 
  ON ips.object_id = ix.object_id
 AND ips.index_id = ix.index_id
WHERE ips.page_count > 500  
        /* Usually 1000 */
  AND ips.avg_fragmentation_in_percent > 5 
        /* Ignore fragmentation less than 5%. */
        /* These numbers vary, based on a multitude of factors. 
           I'm being a bit aggressive today. */
  AND ips.index_type_desc != 'HEAP' 
        /* Can't defragment a heap... */

Thanks for reading.


My Toolbox

I actually have about 3 toolboxes at home.  One is for the basic stuff like hammers, screwdrivers, wrenches, pliers and the like. Another one is for electrical work and contains my multimeter, spudger, lineman’s pliers and assorted bits like tape, switches, wire, caps… The third one is for woodworking, and contains my router bits, block plane, jigsaw bits, sandpaper and a few other wood-specific tools.  I have my go-to tools that I reach for all the time, and my special purpose implements that don’t see much daily use, but when I need them, boy are they important. Take for example, the basin wrench. I believe I’ve used that tool about 3 times since I bought it several years ago, but each time it was the perfect tool for the job. I used it recently to tighten up a loose faucet mount. Any other tool would have been far too awkward.

That got me to thinking about my professional toolbox and some of thing things that populate it.  Much like my home toolboxes, I have my daily, go-to tools, and those that get infrequent, but important use. I thought I’d share a few with you, and maybe you could get some use out of them, as well.

First, there’s my daily use tools.  These are:

  • SQL Server Management Studio
  • Microsoft Outlook
  • Microsoft OneNote
  • Google Chrome

No surprises there – basically Management Studio, email, note taking and the internet.  However, some of my special purpose tools are way more interesting:

  • SQL Sentry Plan Explorer – Provided by SQL Sentry, this is my go-to tool for examining execution plans and making recommendations to developers for improving their T-SQL. Between the color-coding and the easily sortable lists of operations, It’s very easy to quickly zero in on the most expensive operations, and make big improvements.
  • Qure Workload Analyzer – Metrics are meaningless if you don’t have a baseline to compare them to. Even though profiler traces have been deprecated, I still use them quite a bit and Qure makes it dead simple to compare one trace against another, and note where things have gotten better or worse.
  • ClearTrace – When I’m not comparing traces, I’m looking for patterns in them, and ClearTrace is the tool of choice for this. While I could just dump the trace file into a table and run my own queries on it, ClearTrace has about 95% of what I need, and handles the import to table for me. If I really need custom queries for trace analysis, I can write them, but it’s rare now that I’m using this.
  • Sp_WhoIsActive – sp_who2 is soooo 2005… The new, and better proc is Adam Machanic’s sp_WhoIsActive. I have a job set up on every server I manage that runs this every 5 minutes and captures the output to a table. (30 days of retention.)  When I get that random call that goes, “Hey, do you know why the server was so slow last Wednesday around 4:00 PM?” – now I can have an answer ready.

These free tools, that’s right – free, have become a big part of my weekly, if not daily usage.

What are yours?

Thanks for reading.