COUNTing with the DISTINCT Clause

October 4th, 2010
by The Grateful DBA
One ringy-dingy...

Ok, that’s not the count I’m referring to, but one of the many important tasks we DBAs are charged with from time to time is to convert INSERTS/UPDATES/DELETES (IUDs) into “count scripts” in order to verify the amount of records the DML statement would affect before it goes into production. While this may seem like one of the most basic things we have to do daily, the importance cannot be stressed enough. A proper and thorough series of checks and balances is vital in not only the biggest of organizations, but small companies as well. Nobody wants to run scripts against prod only to find out later that instead of affecting the 100 rows you thought it would, your dev left out the WHERE clause and you just updated every row in the table. Eeek! Now you have to take time away from other important things you could/should be doing, and have to test out your data recovery strategy. People (specially Vice Presidents) get extremely irritated at situations like this. Jobs WILL be on the line. Even if you were not the dev who wrote the script, you were still the DBA who let it slide past unchecked. This can be very hazardous to your career.

And since The Grateful DBA wants you to have a long and prosperous SQL career, I offer this little tip regarding one count script situation you might find yourself in; that of an IUD based off a SELECT DISTINCT statement. It’s a bit trickier than you might suspect, but since my readers are way above average intelligence, I have no doubts it’ll be eazy-peazy for you.

Here’s an example of a typical customer INSERT you may have to convert to a count script:

INSERT INTO Customer
     (ID,
     firstName,
     lastName,
     nickname,
     address1,
     address2,
     city,
     stateID,
     zip,
     entryDate)
SELECT DISTINCT
     1,
     t.firstName,
     t.lastName,
     t.firstName,
     t.address1,
     t.address2,
     @city,
     s.longStateName,
     ISNULL(t.zip,'00000'),
     getdate()
FROM CustomerTemp t
     INNER JOIN States s ON s.ID = t.ID

We have a little of everything here: constants, joined values, repeats, functions, & local variables. Let’s see what it would take to convert this to an accurate count of exactly how many records we’re inserting. Here are all the rules you need to remember…

CONSTANTS – All constants must be removed from the list. This includes things like numbers and string literals for instance. These will naturally return the same value each and every time iterated. In the case of our example script, this means we’ll need to remove the “1”.

REPEATS – Values in the final count list cannot be repeated more than once or else an error will indeed be thrown. In our example, we must remove one of the two “t.firstName”s.

FUNCTIONS – Function calls such as the ISNULL in our example are allowed and valid, but must be aliased. Even if it’s exactly the same as the fieldname itself, an alias is required. As well, deterministic functions such as “getdate()” must be removed.

LOCAL VARIABLES – Similarly to the rule concerning constants, local variables will also naturally insert the same value each time, so must be removed from the script. In our example, the “@city” is a goner.

THE WRAPPER – No, I’m not referring to Jay-Z or Eminem; I mean how the count statement is formed. From my (albeit brief) research, there are apparently several “legal” ways of forming the final statement. The method I’ve always used, and to be honest, have found easiest to read and grasp what’s being accomplished, is to wrap the new, slimmer select in it’s own “SELECT COUNT(*) FROM”. The inside SELECT must be surrounded by parenthesis and must have it’s own unique alias. Even if you have no intention on ever using the alias for anything important, nonetheless it’s required.

This is what our example above turns into after we’ve trimmed it down…

SELECT COUNT(*)
FROM
     (SELECT DISTINCT t.firstName,
          t.lastName,
          t.address1,
          t.address2,
          s.longStateName,
          ISNULL(t.zip,'00000') zip
     FROM CustomerTemp t
          INNER JOIN States s ON s.ID = t.ID) A

I hope this helps when you find yourself faced with a similar task to accomplish. As always, I welcome feedback.

Have a Grateful day…Troy


Tags: , , ,
Posted in T-SQL | Comments (1)

  • […] This post was mentioned on Twitter by Troy Gallant, Troy Gallant. Troy Gallant said: [New Blog Post] "COUNTing with the DISTINCT Clause" http://troygallant.com/?p=618 […]

  • Leave a Reply

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Get Adobe Flash player