Due to a company policy, all UPDATEs at my office eventually headed to prod, are required to be based solely off a primary key and must be written out one line at a time. This pertains if the PKs are sequential or not. Yesterday I received a script for approval that contained 14,000 individual UPDATE statements based off an identity key and comprising 100% of the table. So imagine if you will (said in my best Rod Serling voice) 14,000 of these…
UPDATE Customer SET firstName = ‘Foo’ WHERE id = 1 UPDATE Customer SET firstName = ‘Foo’ WHERE id = 2 UPDATE Customer SET firstName = ‘Foo’ WHERE id = 3
It occurred to me that as long as we have sequential ID’s to update off of, then it had to be substantially quicker to do a single UPDATE using a BETWEEN clause. So, 14,000 of the above statements turn in to just one…
UPDATE Customer SET firstName = ‘Foo’ WHERE id BETWEEN 1 AND 14000
But just how much faster is the BETWEEN vs. individual UPDATEs? Turns out the answer is "way way way". Unfortunately, "way way way" isn’t extremely scientific. So the geek in me had to see exactly how much faster and wrote some scripts to test it out.
On a local SQL Server 2008 R2 installation (no network interference), I filled two local temp tables with random numbers between 1 & 50. I then simply updated this value to ’51′ both ways. First one row at a time based off the incrementing PK, and secondly all at once using BETWEEN. The difference even surprised me. At all levels I tested, it was over 99% faster, and the larger the sample size, the more significant the time savings. The results…
|SAMPLE SIZE||INDIV UPDATES AVG||BETWEEN AVG||RATIO||PCT. FASTER|
Ok, so what conclusions can we make from this data?
- Not only is it faster using the BETWEEN function, but it’s significantly faster. Over 99% faster in the sample sizes I used
- The time savings (percentage-wise) tends to increase as the number of UPDATEs increase as well. Good to know because this may be something to take into account when doing that pesky 1,000,000 row update; but not so much when just on a couple rows.
- The back-and-forth, back-and-forth, back-and-forth REALLY adds up over a larger sample size. For example; going from just 500 to 5000 UPDATEs, we’ve increased the sample size 10-fold, the BETWEEN was 4x as long, yet the individual UPDATEs took over 10x as long!
- We bloggers will write about anything SQL related no matter how obvious it may be if it offers even a glimmer of additional insight into our product.
Have a Grateful day…Troy