SQL Salt: Delete a Large Amount of Data in Batches

There may be times when you need to delete a large amount of data and there isn’t the comfort of a large maintenance window, or you just simply can’t wait for some down time in production. Issuing a basic DELETE FROM could potentially cause contention and drastically minimize concurrency for the given time of the data delete. With just a few rows of data, this is typically no big deal and the end users will be none-the-wiser. But with a large amount of data to be deleted, it could have a great impact on the users, and performance in general. One solution is to delete the data in batches. As a set-based proponent, this is one of those times where we could really benefit from a loop.

I want to create some test data to do our experimenting on (I know this goes without saying, but only run a variation of this code in production after extensive testing in your particular environment). I’m using a database called TestDB, so feel free to create this database or use your own test database (it is not dependent on any objects, so it can be a newly created DB).

use TestDB

drop table LargeDeleteTest

create table LargeDeleteTest

(

id int identity(1, 1) not null

primary key clustered,

SomeData nvarchar(128) not null

default replicate('a', 128)

)

insert into LargeDeleteTest

values(default)

go 1250

So now we have a test table (LargeDeleteTest) with 1250 rows of data in it. I am over-simplifying this scenario, as in the grand scheme of data this is negligible. But as we go through this example, apply this thought process to millions upon millions of rows that need to be deleted.

Let’s demonstrate what it’ll look like if you don’t delete data in batches. By executing the below code, notice the return of the DMV sys.dm_tran_locks query:

begin tran

delete

from LargeDeleteTest

--commit tran

select

resource_database_id,

resource_associated_entity_id,

request_session_id,

request_mode,

request_type,

request_status,

resource_type,

count(*) as resource_count

from sys.dm_tran_locks

group by

resource_database_id,

resource_associated_entity_id,

request_session_id,

request_mode,

request_type,

request_status,

resource_type

We get an expected 1250 exclusive key locks on the clustered index of our LargeDeleteTest table. This is 100% of the table. In other words, until this DELETE transaction is committed, there is an exclusive lock on all of that data. (Note: run the commented out COMMIT TRAN in order to commit that transaction and not hinder our experimentation further down this post)

Here’s some code to show you how to delete data in batches to skirt around the above concurrency nightmare:

declare

@batch_size int,

@del_rowcount int = 1

set @batch_size = 100

set nocount on;

while @del_rowcount > 0

begin

begin tran

delete top (@batch_size)

from dbo.LargeDeleteTest

set @del_rowcount = @@rowcount

print 'Delete row count: ' + cast(@del_rowcount as nvarchar(32))

commit tran

end

The code is very simple. All it does it loops through and deletes the data in batches of 100 (I chose batch size to illustrate the point on a smaller scale, but as explained above you should alter this variable for your specific environment). To continue with the details used above to show locks, execute the below code, which is basically just one iteration of the WHILE loop:

insert into LargeDeleteTest

values(default)

go 1250

begin tran

delete top (100)

from LargeDeleteTest

--commit tran

select

resource_database_id,

resource_associated_entity_id,

request_session_id,

request_mode,

request_type,

request_status,

resource_type,

count(*) as resource_count

from sys.dm_tran_locks

group by

resource_database_id,

resource_associated_entity_id,

request_session_id,

request_mode,

request_type,

request_status,

resource_type

Now in the result set of querying sys.dm_tran_locks we see that we only have 100 exclusive key locks. We see that concurrency has now been optimized due to deleting the data in batches, as the whole clustered index will not have an exclusive lock on all keys, just the current ones in the batch.

As we’ve seen here, there are going to be times that we will need to delete a large amount of data, and it may not be the best idea to just do it in a single DELETE FROM statement. Through the use of a loop and batch deletes, we can minimize the impact this will have on our underlying data’s accessibility.

The batch delete code can be found on my GitHub SQLSalt repository here: DML_Delete_DeleteLargeDataByBatches.sql. If there are any comments, questions, or issues please feel free to leave a comment below or email me at sqlsalt@gmail.com.

SQL Salt

Pages

Tuesday, May 8, 2012

Delete a Large Amount of Data in Batches

No comments:

Post a Comment