Posts Tagged ‘t-sql’

Which JOIN Should I Use?

February 16th, 2011

A couple years ago, I purchased Paul Nielsen’s wonderful book “SQL Server 2005 Bible” published by Wiley.  It includes a very small little sidenote-type of box discussing an excellent parable regarding joins.  Paul’s analogy comes to mind almost every time I create a JOIN.  Remembering it alone is sufficient to refresh my memory on which JOIN does what.  It’s easy to remember, makes it’s point, and is succinct; what more could you ask?

So, I thought I would re-tell Paul’s example in order that this may possibly reach an individual or two that doesn’t have this book.  This will help both new and old SQL’ites alike.

Consider a very old style, ultra-conservative church where men are expected to sit on one side of the aisle (the right for the sake of this example), and women on the other (the left).  Married couples can indeed attend this church, but they still must sit on their respective sides.  The minister  instructs all married individuals who’s spouse is on the other side of the aisle, to stand up, grab their partner’s hand, and file out of the church.  This is our INNER JOIN.  Only matched “pieces” from each side are considered in the output.

Now imagine that instead of married individuals only, the minister instructs all the men to stand up.  Those that are married are to grab the hand of their spouse on the other side of the aisle, and then all men and only the married women are exited from the church.  Here is our OUTER JOIN; a RIGHT one in this case.  Our output consists of all members of the right group and only matches from the left.

Conversely,  perhaps the minister instead does the opposite.   He instructs all women to stand, grab their husband’s hand, and then all women and their partners exit the building.  This is our LEFT OUTER JOIN.  All members from the left group are included and as well only matches from the right group come along.

Or instead, the minister instructs everyone to stand up, all the married folks to grab hands, and then everyone leaves the building.  This is our FULL [OUTER] JOIN!  Everything is included in the output, but matched couples are still sent out together.

That’s it.  Simple, but extremely useful when any bit of doubt creeps into your mind about which JOIN may be appropriate in a specific situation.  Again, I give credit where it is due; and this time it is to Paul Nielsen for teaching me this wonderful memory aid.

Another post coming in the next couple days to announce a job change for me that I’m very very excited about.  Look for that by this time next week.

Have a grateful day…Troy

Tags: , , , ,
Posted in T-SQL | Comments (0)

Which is Faster; the Chicken or the Egg?

December 2nd, 2010

Due to a company policy, all UPDATEs at my office eventually headed to prod, are required to be based solely off a primary key and must be written out one line at a time.  This pertains if the PKs are sequential or not.  Yesterday I received a script for approval that contained 14,000 individual UPDATE statements based off an identity key and comprising 100% of the table.  So imagine if you will (said in my best Rod Serling voice) 14,000 of these…

UPDATE Customer SET firstName = ‘Foo’ WHERE id = 1
UPDATE Customer SET firstName = ‘Foo’ WHERE id = 2
UPDATE Customer SET firstName = ‘Foo’ WHERE id = 3

It occurred to me that as long as we have sequential ID’s to update off of, then it had to be substantially quicker to do a single UPDATE using a BETWEEN clause. So, 14,000 of the above statements turn in to just one…

UPDATE Customer SET firstName = ‘Foo’ WHERE id BETWEEN 1 AND 14000

But just how much faster is the BETWEEN vs. individual UPDATEs? Turns out the answer is "way way way". Unfortunately, "way way way" isn’t extremely scientific. So the geek in me had to see exactly how much faster and wrote some scripts to test it out.

On a local SQL Server 2008 R2 installation (no network interference), I filled two local temp tables with random numbers between 1 & 50. I then simply updated this value to ’51’ both ways. First one row at a time based off the incrementing PK, and secondly all at once using BETWEEN. The difference even surprised me. At all levels I tested, it was over 99% faster, and the larger the sample size, the more significant the time savings. The results…

SAMPLE SIZE INDIV UPDATES AVG BETWEEN AVG RATIO PCT. FASTER
500 641.8 4.8 0.75% 99.25%
5000 6837.8 19.6 0.29% 99.71%
10000 15742.4 51.6 0.33% 99.67%
15000 28546.4 63.0 0.22% 99.78%
20000 63462.6 95.8 0.15% 99.85%

Ok, so what conclusions can we make from this data?

  1. Not only is it faster using the BETWEEN function, but it’s significantly faster. Over 99% faster in the sample sizes I used
  2. The time savings (percentage-wise) tends to increase as the number of UPDATEs increase as well. Good to know because this may be something to take into account when doing that pesky 1,000,000 row update; but not so much when just on a couple rows.
  3. The back-and-forth, back-and-forth, back-and-forth REALLY adds up over a larger sample size. For example; going from just 500 to 5000 UPDATEs, we’ve increased the sample size 10-fold, the BETWEEN was 4x as long, yet the individual UPDATEs took over 10x as long!
  4. We bloggers will write about anything SQL related no matter how obvious it may be if it offers even a glimmer of additional insight into our product.

Have a Grateful day…Troy

Tags: , , , , ,
Posted in Performance Tuning, T-SQL | Comments (3)

Dropping Multiple Tables at Once Via T-SQL

November 26th, 2010

From time to time we have the need to drop multiple tables within a single stored procedure. Were you aware that it can be done in single DROP TABLE statement? SELECTing multiple columns, constants, variables, etc. in a single SELECT is one of the first things we learn as SQL newbies. But dropping two or more tables at the same time, while slightly less common, is just as convenient.

So, instead of…

DROP TABLE A;
DROP TABLE B;
DROP TABLE C;

we can…
DROP TABLE A, B, C;

Not only can we do this with normal tables, we also have the ability to drop temp tables; both local and global, as well at the same time. How convenient is that?

This little block of code will create normal, local temp, & global temp tables, fill them in with sample data, show the data’s there, drop the tables using this “new” syntax, then show that all three tables are indeed gone.

 
USE tempdb
SET NOCOUNT ON

-- Create a local temp table
CREATE TABLE #localTempTable (i INT)

-- Fill the local temp table with sample data
INSERT INTO #localTempTable
SELECT id FROM sysobjects

-- Show that the local temp table is filled
SELECT COUNT(i) localTempTable FROM #localTempTable

GO

-- Create a global temp table
CREATE TABLE ##globalTempTable (i INT)

-- Fill the global temp table with sample data
INSERT INTO ##globalTempTable
SELECT id  FROM sysobjects

-- Show that the global temp table is filled
SELECT COUNT(i) globalTempTable FROM ##globalTempTable

GO

-- Create a real table
CREATE TABLE RealTable (i INT)

-- Fill the real table with sample data
INSERT INTO RealTable
SELECT id  FROM sysobjects

-- Show that the real table is filled
SELECT COUNT(i) realTable FROM RealTable

GO

-- Drop the tables using the "new" syntax
DROP TABLE #localTempTable, ##globalTempTable, RealTable

GO

-- Show that the objects are indeed gone
IF OBJECT_ID(N'tempdb..#localTempTable', N'U') IS NULL
  SELECT '#localTempTable is gone'
ELSE
  SELECT '#localTempTable is still there'
  
IF OBJECT_ID(N'tempdb..##globalTempTable', N'U') IS NULL
  SELECT '##globalTempTable is gone'
ELSE
  SELECT '##globalTempTable is still there'
  
IF OBJECT_ID(N'tempdb..RealTable', N'U') IS NULL
  SELECT 'RealTable is gone'
ELSE
  SELECT 'RealTable is still there'

And while these days it seems like we DELETE less and archive a lot more, it’s still nice to know that we do have a couple options when it comes to performing this most basic, but vital operation in SQL Server.

Have a grateful day…Troy

Tags: , , , ,
Posted in T-SQL | Comments (1)

COUNTing with the DISTINCT Clause

October 4th, 2010
One ringy-dingy...

Ok, that’s not the count I’m referring to, but one of the many important tasks we DBAs are charged with from time to time is to convert INSERTS/UPDATES/DELETES (IUDs) into “count scripts” in order to verify the amount of records the DML statement would affect before it goes into production. While this may seem like one of the most basic things we have to do daily, the importance cannot be stressed enough. A proper and thorough series of checks and balances is vital in not only the biggest of organizations, but small companies as well. Nobody wants to run scripts against prod only to find out later that instead of affecting the 100 rows you thought it would, your dev left out the WHERE clause and you just updated every row in the table. Eeek! Now you have to take time away from other important things you could/should be doing, and have to test out your data recovery strategy. People (specially Vice Presidents) get extremely irritated at situations like this. Jobs WILL be on the line. Even if you were not the dev who wrote the script, you were still the DBA who let it slide past unchecked. This can be very hazardous to your career.

And since The Grateful DBA wants you to have a long and prosperous SQL career, I offer this little tip regarding one count script situation you might find yourself in; that of an IUD based off a SELECT DISTINCT statement. It’s a bit trickier than you might suspect, but since my readers are way above average intelligence, I have no doubts it’ll be eazy-peazy for you.

Here’s an example of a typical customer INSERT you may have to convert to a count script:

INSERT INTO Customer
     (ID,
     firstName,
     lastName,
     nickname,
     address1,
     address2,
     city,
     stateID,
     zip,
     entryDate)
SELECT DISTINCT
     1,
     t.firstName,
     t.lastName,
     t.firstName,
     t.address1,
     t.address2,
     @city,
     s.longStateName,
     ISNULL(t.zip,'00000'),
     getdate()
FROM CustomerTemp t
     INNER JOIN States s ON s.ID = t.ID

We have a little of everything here: constants, joined values, repeats, functions, & local variables. Let’s see what it would take to convert this to an accurate count of exactly how many records we’re inserting. Here are all the rules you need to remember…

CONSTANTS – All constants must be removed from the list. This includes things like numbers and string literals for instance. These will naturally return the same value each and every time iterated. In the case of our example script, this means we’ll need to remove the “1”.

REPEATS – Values in the final count list cannot be repeated more than once or else an error will indeed be thrown. In our example, we must remove one of the two “t.firstName”s.

FUNCTIONS – Function calls such as the ISNULL in our example are allowed and valid, but must be aliased. Even if it’s exactly the same as the fieldname itself, an alias is required. As well, deterministic functions such as “getdate()” must be removed.

LOCAL VARIABLES – Similarly to the rule concerning constants, local variables will also naturally insert the same value each time, so must be removed from the script. In our example, the “@city” is a goner.

THE WRAPPER – No, I’m not referring to Jay-Z or Eminem; I mean how the count statement is formed. From my (albeit brief) research, there are apparently several “legal” ways of forming the final statement. The method I’ve always used, and to be honest, have found easiest to read and grasp what’s being accomplished, is to wrap the new, slimmer select in it’s own “SELECT COUNT(*) FROM”. The inside SELECT must be surrounded by parenthesis and must have it’s own unique alias. Even if you have no intention on ever using the alias for anything important, nonetheless it’s required.

This is what our example above turns into after we’ve trimmed it down…

SELECT COUNT(*)
FROM
     (SELECT DISTINCT t.firstName,
          t.lastName,
          t.address1,
          t.address2,
          s.longStateName,
          ISNULL(t.zip,'00000') zip
     FROM CustomerTemp t
          INNER JOIN States s ON s.ID = t.ID) A

I hope this helps when you find yourself faced with a similar task to accomplish. As always, I welcome feedback.

Have a Grateful day…Troy

Tags: , , ,
Posted in T-SQL | Comments (1)

Get Adobe Flash player