T-SQL Copy & Paste Pattern – Increasing a performance problem

Disclaimer: The title is my assumption because I saw it in the past happening this way.

This blog post aims to make you remember something: something that is obvious to you, might not be obvious to others.

Scenario:

A client has a process which consists of a stored procedure that wraps a bunch of other stored procedures.
The process runs for about 10 hours.

Taking a look…what is running right now?

I was curious about the process, I’ve seen this running before but never explored the code. After a couple of days of seeing it running for so long, I decided to see what would be the random query I would get executing.

I had some luck and saw one with a pattern that I knew clearly that could be rewritten it and make it faster.

sp_WhoIsActive to the rescue

If you don’t know what sp_WhoIsActive (Adam Machanic‘s creation) stored procedure is, let me copy the short definition from the whoisactive.com website:

sp_whoisactive is a comprehensive activity monitoring stored procedure that works for all versions of SQL Server from 2005 through 2017.

You can download it from the download page or if you use dbatools you can use the command that will download it for you and install it. You can read more about it in my previous blog post New Version Of sp_WhoIsActive (V11.20) Is Available – Deployed On 123 Instances In Less Than 1 Minute

Using sp_WhoIsActive to get the current running query

EXEC sp_WhoIsActive

The query

The query that was being run has the following structure

SELECT column1
  FROM table1
 WHERE EXISTS (
               SELECT column2 FROM table2 WHERE table2.column1 = table1.column2
              )
   AND (
              EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 = @param1 and table3.column3 = 0)
           OR EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 = @param2 and table3.column3 = 0)
           OR EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 = @param3 and table3.column3 = 0)
           OR EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 = @param4 and table3.column3 = 0)
           OR EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 = @param5 and table3.column3 = 0)
           OR EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 IN (@param6, @param7) and table3.column3 = 0)
       )

However this default call does not bring the execution plan, for that you need to run using the @get_plans parameter:

EXEC sp_WhoIsActive @get_plans = 1

I have copied the XML that contains the sqlplan to SentryOne Plan Explorer and this was what I saw

Within the red circle we can see a concatenation operator (first one on top left) which will get the result of each EXISTS sub query (8 in total).

If we want to know which outer batch or stored procedure call was issued by the application or user we can use the @get_outer_command parameter

EXEC sp_WhoIsActive @get_plans = 1, @get_outer_command = 1

However, if we have nested calls, this will not show the current batch or stored procedure where the code of the current query belongs.

To get that we need to use the @get_full_inner_text parameter:

EXEC sp_WhoIsActive @get_plans = 1, @get_outer_command = 1, @get_full_inner_text = 1

This way, the sql_text column will contain the whole batch or stored procedure where the query belongs.

Back to the query – The pattern

Can you see the pattern? A lot of OR EXISTS() conditions. That is odd indeed, it wouldn’t be so odd if each OR EXISTS() was accessing different tables…oh, wait…they are not 🙂 and that is where the problem is.

Easy to improve

We can easily re-write the query without changing the logic or affecting the output data.

I have re-written the code in the following way:

SELECT column1
  FROM table1
 WHERE EXISTS (
               SELECT column2 FROM table2 WHERE table2.column1 = table1.column2
              )
   AND EXISTS (
               SELECT column1 
                 FROM table3 
                WHERE table3.column1 = table1.column1 
                  AND (
                           table3.column2 = @param1 
                        OR table3.column2 = @param2
                        OR table3.column2 = @param3
                        OR table3.column2 = @param4
                        OR table3.column2 = @param5
                        OR table3.column2 IN (@param6, @param7)
                      )
                  AND table3.column3 = 0
           )
       )

This way we will just hit the table3 once instead of one time per OR EXISTS().

The actual plan seems to have a much better shape:

A different approach would be a single IN () condition with all variables comma separated. However, I preferred this way as it’s easy to show to the developement team the differences between now and before.

In fact, when we use the IN operator the optimizer will expand it to various OR conditions. Example:

Result

With this change, I have improved the query by 99%.
Query went down from ~4340 seconds to less than 30 seconds.

NOTE: The table had ~46M records.

Also, the number of logical reads for the table dive-bombed!

Before (optimizer used a Worktable):

Table ‘Worktable’. Scan count 924230, logical reads 1045662012

Table ‘table3’. Scan count 6, logical reads 4967238

SQL Server Execution Times:
CPU time = 3824887 ms, elapsed time = 4344872 ms.

After:

Table ‘table3’. Scan count 9, logical reads 829734,

SQL Server Execution Times:
CPU time = 86345 ms, elapsed time = 26074 ms.

This means that on the whole process we have saved 1h!

How’d this happen?

As said in my title and initial disclaimer, this smells like a copy & paste pattern. Maybe something similar to:

Client: “Hey can we have another validation for a different value?”
Dev: “Sure, it’s pretty easy to do so”
Also Dev: copy & paste existing OR EXISTS () change parameter, commit to source control and push it into QA test (with few data) and it’s good to go into PROD.
Client: “Thanks it’s working just a little bit slower but today things are slower in general”

Wrap

When you find these kind of patterns, invest a couple of minutes to test it with better logic.
You may end saving a “couple” of CPU cycles and saving a lot of time.

Thanks for reading!

dbachecks – Going parallel – Cut off your execution times

On one of the last clients I have worked, I have implemented dbachecks in order to get the state of art and know how cool or bad the environments are before start knocking down the bad practices.

This client has seven different environments with more than 100 instances and more than 2000 databases.

Serial execution

A non-parallel execution (single session) took more than 2 hours.
This is not a big problem when we run it out of hours and we don’t want/need to be looking or waiting for it to finish.
I set up it to run at 6 in the morning and when I arrive at the office I can refresh the Power BI dashboard a get a recent state of art.

Well, 2 hours seems not too much if we compare with some other dbachecks users

What if we want to lower down that 2 hours execution? Or the Hiram Fleitas’ (b | t) 4,6 hours?

Going parallel

First let me remember that this doesn’t come for free! I mean, if you set multiple checks in parallel (different sessions), you will consume more resources (CPU, memory, etc) on the machine where you are running them. Test the process and find a reasonable limit for it, otherwise this can become slower that the serial execution.

This brought some new needs.

By default, dbachecks works with the values previously saved (for that we use Set-DbcConfig or the Import-DbcCheck). This means when we start a new session and the last session have changed any configuration, that configuration is the one that will be used in the new session.

Can you see the problem?

Imagine that I want to check for databases in Full Recovery Model on the production environment and I want to start (in parallel) a new check for the development environment where I want to check for Simple Recovery Model if this setting is not changed in the correct time frame, we can end checking for Full Recovery Model on the development environment where we want the Simple Recovery Model.

The first time I tried to run tests for some environments in parallel, that had the need to change some configs, I didn’t realise about this detail so I ended up with much more failed tests than the expected! The bell rang when the majority of the failed tests were from a specific test…the one I had changed the value.

-Temporary parameter for the rescue!

On my last dbachecks blog post – dbachecks – Setting temporary configuration values I have explained how this works so if you haven’t read it yet, take a look before continuing.

Ok, now that you know you can use the -Temporary to run the tests without interfering with the persisted, you may already notice what we will do next..

My recipe to run in parallel

Disclaimer: First, let me say that this is just one option you can come up with a different one. Please drop a comment so I, and others, can become aware of different approaches.

  1. If you don’t have a configuration file for the environment yet, start by configuring all the settings and use Export-DbcConfig to save them.
  2. You need to do a split of your instances/hosts in one or more groups that can share the exact same configurations.
  3. Start a new powershell session, set (using Set-DbcConfig) or import (using Import-DbcConfig) your configurations (set up on number 1) but don’t forget to use the -Temporary parameter.
  4. Run the Invoke-DbcCheck
  5. Repeat steps 1, 2 and 3 as many times as you want – I encourage you to start with just 2 sessions and monitoring your computer resources. Then if you still have room, add one more.
  6. Grab a coffee, a beer or any other drink of your choice and wait until it finishes.

Again, take a look on your resources and then you can test with one more session. Do it until you find the sweet number of parallel sessions.

Here is the code you can use:
For 1st point:

#PROD environment
Set-DbcConfig -Name policy.recoverymodel.type -Value Full -Temporary
Export-DbcConfig -Path "D:\dbachecks\Prod_Configs.json"

 

#DEV environment
Set-DbcConfig -Name policy.recoverymodel.type -Value Simple -Temporary
Export-DbcConfig -Path "D:\dbachecks\Dev_Configs.json"

2nd, 3rd and 4th point together:

#PROD instances
$sqlInstances = "prod1", "prod2", "prod3"

#Import Prod_Configs.json with -Temporary
Import-DbcConfig -Path "D:\dbachecks\Prod_Configs.json" -Temporary

#Run the checks - Don't forget to add all the parameters you usually use
Invoke-DbcCheck -SqlInstance $sqlInstances

 

#DEV instances
$sqlInstances = "dev1", "dev2", "dev3"

#Import Dev_Configs.json with -Temporary
Import-DbcConfig -Path "D:\dbachecks\Dev_Configs.json" -Temporary

#Run the checks - Don't forget to add all the parameters you usually use
Invoke-DbcCheck -SqlInstance $sqlInstances

Save this scripts in two different ps1 files. Then, open two different PowerShell sessions and call each script on different session. Let it flow 🙂

Results

On my case I was able to drop from 2 hours to about 1 hour with 3 parallel sessions. Adding a 4th session made the whole process slower.

Wrap

We saw that we may have problems if we try to run more than one dbachecks session when using different configured values. Using -Temporary parameter when setting the values come in handy for this scenario.
This way we can run two or more sessions in parallel and each one on different environments without messing each other and hopefully, cut off our execution times.

Hope this helps! I would love to hear if you were able to drop down your execution times and what they are before and after.

Thanks for reading!

New version of sp_WhoIsActive (v11.20) is available – Deployed on 123 instances in less than 1 minute

Last night, I received Adam Machanic’s (b | t) newsletter “Announcing sp_whoisactive v11.20: Live Query Plans”.

For those who don’t know about it, sp_WhoIsActive is a stored procedure that provides detailed information about the sessions running on your SQL Server instance.
It is a great tool when we need to troubleshoot some problems such as long-running queries or blocking. (just two examples)

This stored procedure works on any version/edition since SQL Server 2005 SP1. Although, you only will be able to see the new feature (live query plans) if you run it on SQL Server 2016 or 2017.

If you don’t receive the newsletter you can read this announcement here and subscriber to receive the next ones here.

You can read the release notes on the download page.

Thank you, Adam Machanic!

The show off part

Using the dbatools open source PowerShell module I can deploy the new latest version of the stored procedure.

By running the following two lines of code, I updated my sp_WhoIsActive to the latest version (we always download the newest one) on my 123 instances in less than one minute (to be precise, in 51,717 seconds).

$SQLServers = Invoke-DbaQuery -SqlInstance "CentralServerName" -Query "SELECT InstanceConnection FROM CentralDB.dbo.Instances" | Select-Object -ExpandProperty InstanceConnection
Install-DbaWhoIsActive -SqlInstance $SQLServers -Database master

The first line will retrieve from my central database all my instances’ connection string.
The second one will download the latest version, and compile the stored procedure on the master database on each of the instances in that list (123 instances).

Thanks for reading