Tuesday, May 15, 2012

Implement Database Objects: Implement data types


(This is part of the Study Guide series, 70-464)

Microsoft’s Measured Skill description: This objective may include but is not limited to: use appropriate data types; develop a CLR data type; understand the difference between @Table and #table; impact of GUID (newid, newsequentialid) on database performance, indexing and privacy; use spatial data; LOB data types; understand when and how to use column store and sparse columns; implicit and explicit conversions, integer math

What I see:
·         develop a CLR data type
·         temp table vs. table var (discussed about here)
·         NEWID() vs. NEWSEQUENTIALID()
·         spatial data
·         LOB data types
·         column store
·         sparse columns
·         implicit/explicit conversions

Develop a CLR data type
                CLR User-Defined Types are created utilizing managed code and registering an assembly within SQL Server.  Then a type can be created based off of an Assembly’s class/struct.  For a full, working example on how to accomplish this, follow the walk-through on BOL for CLR User-Defined Types.  It is worth noting particular security concepts relating to the PERMISSION_SET option (can be SAFE, EXTERNAL_ACCESS, or UNSAFE).

NEWID() vs. NEWSEQUENTIALID()
                These are both functions that generate unique identifiers.  There is a very prominent difference between then though:  NEWID() create a [nonsequential] UID, and NEWSEQUENTIALID() creates a sequential UID.  When creating a key column of a clustered index, it is advantageous for that to be ever-incrementing.  If NEWID() was used for a clustered index key column then there would be page splits upon INSERT statements.  But with NEWSEQUENTIALID(), it acts more like an IDENTITY column; with incrementing values.  See the below example to prove the different natures of these functions.  Notice the identity field order for each resulting SELECT queries.  One of them is random, and the other is in a nice order:

Example
use MeasureMySkills;
go

create table NewIdExample
(
       id int identity(1, 1) not null,
       NewIdVal uniqueidentifier
              default newid()
);
go

insert into NewIdExample
values(default);
go 100

create table NewSequentialIdExample
(
       id int identity(1, 1) not null,
       NewSequentialIdVal uniqueidentifier
              default newsequentialid()
);
go

insert into NewSequentialIdExample
values(default);
go 100

select top 10 *
from NewIdExample
order by NewIdVal;

select top 10 *
from NewSequentialIdExample
order by NewSequentialIdVal;

Spatial data
                See BOL’s reference on Spatial Data for information regarding these types.

LOB data types
                See this description of LOB data types (this is from 2008, but the current ones there are still current).  Also, Kimberly Tripp’s blog post on Understanding LOB Data is a great read.

Column store
                Columnstore indexes are one of the cool new features of SQL Server 2012.  Traditionally, indexes are stored by row.  But columnstore indexes now introduce storing indexed data by column.  This can leverage great performance due to the smaller data (and compressed), as it will be less reads.  There are a lot of caveats to columnstore indexes, though:  First off, you can’t do DML against data that is included in a columnstore index.  The workaround is to drop before the INSERT/UPDATE/DELETE and then recreate.  It seems as though the design around columnstore indexes are moreso meant for data warehousing and fact tables, not so much OLTP systems (yet).  There’s no such thing as key columns with a columnstore index, so there is no 16 column limitation like we are typically used to.  Likewise, you can’t have an INCLUDE clause in the definition.  There are many more gotchas and things to know with columnstore indexes, so look to the reference at the bottom to get a full document.

Example
create columnstore index ICX_Department_Name
on HumanResources.Department(Name);
go

Sparse columns
                Sparse columns are a storage method that SQL Server utilizes to leverage saved storage space.  Typically when a column value is NULL it still takes up space.  But if a significant amount of data for a particular column is NULL, the SPARSE attribute can be used to save space as nothing will be stored when the value is NULL.  Like all things, nothing is free though.  Because of a sparse column, there will be additional overhead when dealing with non-NULL data in a sparse column.

Example
use MeasureMySkills;
go

create table MySparseTable
(
       id int identity(1, 1) not null,
       SomeString nvarchar(128) sparse null
);
go

Implicit/Explicit conversions
                An explicit conversion is evident to all of us, as it can be seen through the query with functions like CAST() and CONVERT().  But implicit conversions happen behind the scenes and can be huge performance issues.  Implicit conversions can be analyzed through execution plan predicates, and can potentially make a query non-SARGable, causing a scan when a seek is expected.

References
·         BOL reference on CREATE TYPE
·         BOL reference on CLR UDTs
·         BOL reference on NEWID()
·         BOL reference on Spatial Data

Version-Controlled Code (GitHub)

If there are any comments, questions, issues, or suggestions please feel free to leave a comment below or email me at sqlsalt@gmail.com.

No comments:

Post a Comment