Wednesday, June 13, 2012

Query Data by Using SELECT Statements


(This is part of the Study Guide series, 70-457)

Microsoft’s Measured Skill description: This objective may include but is not limited to: use the ranking function to select top(X) rows for multiple categories in a single query; write and perform queries efficiently using the new code items such as synonyms and joins (except, intersect); implement logic which uses dynamic SQL and system metadata; write efficient, technically complex SQL queries, including all types of joins versus the use of derived tables; determine what code may or may not execute based on the tables provided; given a table with constraints, determine which statement set would load a table; use and understand different data access technologies; CASE versus ISNULL versus COALESCE

What I see:
·         ranking functions
·         except and intersect
·         data access technologies
·         CASE vs. ISNULL vs. COALESCE

Ranking Functions
                There are four ranking functions available in SQL Server:  RANK, DENSE_RANK, ROW_NUMBER, and NTILE.

As per BOL, the RANK function “returns the rank of each row within the partition of a result”:

use MeasureMySkills;
go

create table RankingTable
(
       id int identity(1, 1) not null,
       IntValue int not null
);
go

insert into RankingTable(IntValue)
values
       (16),
       (17),
       (17),
       (19),
       (26),
       (25),
       (26),
       (26),
       (30);

select
       id,
       IntValue,
       rank() over
       (
              order by IntValue
       ) as Rank
from RankingTable;

As you can see from the above result, ties are the same rank but then subsequent rank values reflect the gap caused by the ties.  If you want to fill those gaps (i.e. have a “dense” ranking), use the DENSE_RANK function:

select
       id,
       IntValue,
       dense_rank() over
       (
              order by IntValue
       ) as DenseRank
from RankingTable;

Another common ranking function is ROW_NUMBER.  This ranking function will assign a sequential, non-duplicated “row number” to the result set (very handy for deleting duplicate data through the use of the windowing PARTITION BY clause):

select
       id,
       IntValue,
       row_number() over
       (
              order by IntValue
       ) as RowNumber
from RankingTable;

And the last ranking function is NTILE.  This ranking function breaks the windowed data into the specified number of groups:

select
       id,
       IntValue,
       ntile(3) over
       (
              order by IntValue
       ) as NTile
from RankingTable;

EXCEPT and INTERSECT
                These two operators return the same distinct rows from two queries (INTERSECT) or missing distinct rows from first query that aren’t in the second query (EXCEPT).  Below shows examples of this output:

create table IntersectExceptTbl1
(
       MyInt int not null,
       MyString nvarchar(128) not null
);
go

create table IntersectExceptTbl2
(
       MyInt int not null,
       MyString nvarchar(128) not null
);
go

insert into IntersectExceptTbl1(MyInt, MyString)
values
       (1, 'a'),
       (2, 'b'),
       (3, 'c'),
       (4, 'd'),
       (5, 'e');

insert into IntersectExceptTbl2(MyInt, MyString)
values
       (1, 'a'),
       (6, 'b'),
       (3, 'c'),
       (4, 'f'),
       (5, 'e');

select *
from IntersectExceptTbl1
intersect
select *
from IntersectExceptTbl2;

/* RESULT SET (INTERSECT)
MyInt  MyString
1             a
3             c
5             e             */

select *
from IntersectExceptTbl1
except
select *
from IntersectExceptTbl2;

/* RESULT SET (EXCEPT)
MyInt  MyString
2             b
4             d             */

Data Access Technologies
                For more information, please see this MSDN reference on the Data Access Technologies Road Map.  It is worth familiarizing yourself with the most commonly-used methods.

CASE vs. ISNULL vs. COALESCE
                Choosing between CASE, ISNULL, and COALESCE just comes down to what’s the best tool for the given task.  CASE is typically used for relatively complex conditional logic that COALESCE can’t encompass (for instance, not driven by NULL).  COALESCE is really just a restricted and abbreviated CASE expression.  ISNULL is simply used to return either the first parameter’s value, or if that’s NULL then it’ll return the second.  This is great for ensuring that NULL isn’t passed to aggregating functions, or any other time that NULL is less than desired.

References
·         BOL reference on RANK
·         BOL reference on DENSE_RANK
·         BOL reference on ROW_NUMBER
·         BOL reference on NTILE
·         BOL reference on ISNULL
·         BOL reference on CASE
·         BOL reference on COALESCE

If there are any comments, questions, issues, or suggestions please feel free to leave a comment below or email me at sqlsalt@gmail.com.

No comments:

Post a Comment