Wednesday, May 9, 2012

Study Guide Introduction


                With the advent of Microsoft’s revamping their certification tracks, eliminating future MCTS/MCITP certifications, there are a whole new set of exams for the new MCSA/MCSE certifications.  Because of this, I plan to start drilling down into the MCSE: Data Platform track which surrounds SQL Server 2012.  This blog post is not to go further in-depth about the new changes for the certifications.  For more information on this topic, please see Kendra Little’s blog post on the details.

                Microsoft doesn’t just thrust you out into the dark prior to taking the exam.  They graciously publish “study objectives” under a section called Skills Measured.  It’s an ordered list broken down from a broad topic, to relatively specific topics that will be expected on the respective exam.  I am a big proponent of trudging through these Skills Measured sections in order to get a good idea of what’s on the exam, what I need to brush up on, what I don’t know enough about, and what I feel pretty confident about.

                So why am I bringing this up?  Because I’m going to be spending the next couple of months going through all of these study objectives for a couple of exams (mainly the advanced exams for the MCSE:  70-464 and 70-465).  Disclaimer:  These blog posts covering the study objectives for the aforementioned exams are not meant to be the 100% solution to studying for these Microsoft exams.  These posts are simply meant to supplement experience and existing knowledge surrounding the SQL Server data platform.

                The structure of the blog posts will be a post per sub-objective.  In other words, viewing the Skills Measured for 70-464, the first blog post will be on the objective Implement Database Objects, Create and alter tables.  The second post will be on Implement Database Objects, Design, implement, and troubleshoot security, so on and so forth.  I will show examples, give some explanations, and point to more resources for these topics where appropriate.  Some will be easy, some will be more challenging.  As I progress, a back log of pertinent posts will be linked on the Study Guides page.

                So follow along!  If you are planning on achieving your MCSE: Data Platform, then this could be some well-guided learning.  If not, it will hopefully still be some valuable information.  Any comments, questions, or suggestions feel free to leave a comment below or email me at sqlsalt@gmail.com.

Tuesday, May 8, 2012

Delete a Large Amount of Data in Batches


                There may be times when you need to delete a large amount of data and there isn’t the comfort of a large maintenance window, or you just simply can’t wait for some down time in production.  Issuing a basic DELETE FROM could potentially cause contention and drastically minimize concurrency for the given time of the data delete.  With just a few rows of data, this is typically no big deal and the end users will be none-the-wiser.  But with a large amount of data to be deleted, it could have a great impact on the users, and performance in general.  One solution is to delete the data in batches.  As a set-based proponent, this is one of those times where we could really benefit from a loop.

                I want to create some test data to do our experimenting on (I know this goes without saying, but only run a variation of this code in production after extensive testing in your particular environment).  I’m using a database called TestDB, so feel free to create this database or use your own test database (it is not dependent on any objects, so it can be a newly created DB).

use TestDB
go

drop table LargeDeleteTest
go

create table LargeDeleteTest
(
       id int identity(1, 1) not null
              primary key clustered,
       SomeData nvarchar(128) not null
              default replicate('a', 128)
)
go

insert into LargeDeleteTest
values(default)
go 1250

                So now we have a test table (LargeDeleteTest) with 1250 rows of data in it.  I am over-simplifying this scenario, as in the grand scheme of data this is negligible.  But as we go through this example, apply this thought process to millions upon millions of rows that need to be deleted.

                Let’s demonstrate what it’ll look like if you don’t delete data in batches.  By executing the below code, notice the return of the DMV sys.dm_tran_locks query:

begin tran
       delete
       from LargeDeleteTest
--commit tran

select
       resource_database_id,
       resource_associated_entity_id,
       request_session_id,
       request_mode,
       request_type,
       request_status,
       resource_type,
       count(*) as resource_count
from sys.dm_tran_locks
group by
       resource_database_id,
       resource_associated_entity_id,
       request_session_id,
       request_mode,
       request_type,
       request_status,
       resource_type

                We get an expected 1250 exclusive key locks on the clustered index of our LargeDeleteTest table.  This is 100% of the table.  In other words, until this DELETE transaction is committed, there is an exclusive lock on all of that data.  (Note: run the commented out COMMIT TRAN in order to commit that transaction and not hinder our experimentation further down this post)

                Here’s some code to show you how to delete data in batches to skirt around the above concurrency nightmare:

declare
       @batch_size int,
       @del_rowcount int = 1

set @batch_size = 100

set nocount on;

while @del_rowcount > 0
       begin
              begin tran
                     delete top (@batch_size)
                     from dbo.LargeDeleteTest

                     set @del_rowcount = @@rowcount
                     print 'Delete row count: ' + cast(@del_rowcount as nvarchar(32))
              commit tran
       end

                The code is very simple.  All it does it loops through and deletes the data in batches of 100 (I chose batch size to illustrate the point on a smaller scale, but as explained above you should alter this variable for your specific environment).  To continue with the details used above to show locks, execute the below code, which is basically just one iteration of the WHILE loop:

insert into LargeDeleteTest
values(default)
go 1250

begin tran
       delete top (100)
       from LargeDeleteTest
--commit tran

select
       resource_database_id,
       resource_associated_entity_id,
       request_session_id,
       request_mode,
       request_type,
       request_status,
       resource_type,
       count(*) as resource_count
from sys.dm_tran_locks
group by
       resource_database_id,
       resource_associated_entity_id,
       request_session_id,
       request_mode,
       request_type,
       request_status,
       resource_type

                Now in the result set of querying sys.dm_tran_locks we see that we only have 100 exclusive key locks.  We see that concurrency has now been optimized due to deleting the data in batches, as the whole clustered index will not have an exclusive lock on all keys, just the current ones in the batch.

                As we’ve seen here, there are going to be times that we will need to delete a large amount of data, and it may not be the best idea to just do it in a single DELETE FROM statement.  Through the use of a loop and batch deletes, we can minimize the impact this will have on our underlying data’s accessibility.

                The batch delete code can be found on my GitHub SQLSalt repository here:  DML_Delete_DeleteLargeDataByBatches.sql.  If there are any comments, questions, or issues please feel free to leave a comment below or email me at sqlsalt@gmail.com.

Thursday, May 3, 2012

Ease of Perfmon with Logman (and PowerShell)

                It is no mystery that, as data professionals, we are oftentimes put in sticky situations where a lot of quick and effective troubleshooting needs to happen in order to get to the bottom of a specific problem.  One of the most prominent tools for that is Performance Monitor (Perfmon):  A built-in performance counter capturing utility.  Perfmon is extremely effective, yet the setup time can be a bit of an obstacle, as you need to select your counters, configure data collector parameters… so on and so forth.

                But Microsoft has given us yet another tool to make this a bit easier and cut down our time through the use of the command-line utility Logman.  This tool allows us to setup and manage performance counters and collector sets.  But let’s take this a step further:  Let’s automate to utilization of Logman so that we can run a quick PowerShell script to quickly stand up a data collector set to get the information we are looking for.

                Before I jump right into the PowerShell script, it’s worth noting a few of the Logman optional parameters.  Like most all-encompassing tools, a lot of these parameters and options I will not be breaching here in this post, but feel free to play with them and leverage their full capability.  Logman can create, start, stop, delete, query, and update performance counter collections.  The significant parameters (at least for our needs right now) are as follows:

-v
nnnn | mmddhhmm
Output versioning
-c
“Counter 1” “Counter2”
Counter list
-cf
“C:\CounterDir\Counters.txt”
Counter list file
-o
“C:\OutputDir\OutputFile”
The path of the output file
-f
bin | bincirc | csv | tsv | sql
Output file format
-si
[[hh:]mm:]ss
Sample interval
*Note: these are only a few of many options.  Please refer to TechNet for the full listing.

                The above parameters are relatively straightforward, but I want to concentrate on the –cf switch.  This is going to be a file that contains a list of counters (one per line) to load into the new data collector set.  This is more significant than it might seem on the surface, as we can have a little library of text files that contain counters for each frequently used set of performance monitoring metrics.  In other words, you can have a text file called MemoryPressureCounters.txt that’ll include a list of all the counters you’d typically use to troubleshoot memory pressure.

                Taking this idea of quick Perfmon collector config, what I’ve done is created a PowerShell script to make it even more streamlined.  There are some options that’ll probably remain relatively constant:  Maybe the sample interval is typically a desired five seconds in order to capture enough information, but not poll the counters too often in a mid-range log.  Also, I may want to always store my log files in a central location with a particular naming convention and that’ll typically never change.  So instead of always having to type all of this relatively static information in, I want to make this process as quickly as possible.

param
(
      [string]$mach,
      [string]$inf,
      [string]$logloc,
      [string]$colname
)


# <DEFAULTS>      [alter to environment specs]
$sampleInterval = 5
$defaultLogLocation = "C:\DefaultDir\Default_log"
# </DEFAULTS>


# check to see if input parameters were filled
if ($inf -eq "") {
      Write-Host "ERROR!!!  -inf must be set to counter file"
      exit 1
}

# if the machine name wasn't supplied, use local
if ($mach -eq "") {
      $mach = "localhost"
}

# if log location wasn't supplised, use default
if ($logloc -eq "") {
      $logloc = $defaultLogLocation
}

# if no collector name specified, generate one
if($colname -eq "") {
      $dataCollectorName = "pfs_" + [datetime]::Now.Ticks
}
else {
      $dataCollectorName = $colname
}
Write-Host "Collection Name: " $dataCollectorName
Write-Host

Write-Host "Log File location..."
Write-Host $logloc

# create the collection and start it
Write-Host
Write-Host "Attempting to create collector set..."
logman create counter $dataCollectorName -s $mach -v mmddhhmm -cf $inf -o $logloc -f csv -si $sampleInterval
Write-Host

Write-Host "Attempting to start collector set..."
logman start $dataCollectorName -s $mach
                This is a very simple and concise script, but effective in saving some time and eliminating repetitive tasks.  It takes three parameters:
-inf (required)
The path to the file containing the list of counters.
-mach (optional)
                The name of a remote machine.  Omit if working with the local machine.
-logloc (optional)
The path of the log file.  If unspecified, it will use the default in the script.  Therefore, when pushing this out to your environment alter the default parameters accordingly.
-colname (optional)
                The desired name of the new data collector set.  If omitted, it will be automatically generated.

                So there you have it.  By setting initial defaults right within the script, it only actually requires one parameter to get a data collector set up and running.  There are a few things to note:  First off, the PowerShell script will start the collector provided that it was successfully created.  Also, it will run indefinitely and unscheduled.  In other words, as it is shipped right now you will have to manually stop the data collection.  Feel free to mold the code into your particular desires and environment.

                The code for this script can be found on my GitHub SQLSalt repository at the following location:  PerfMonStarter.ps1.  Check back frequently for additional “counter libraries” for common troubleshooting scenarios.  If there are any questions or comments regarding this post, please feel free to leave a comment below or email me at sqlsalt@gmail.com.