Category Archives: Performance Tuning

SQL Server: How Local Variables Can Reduce Query Performance

It’s a common practice by database developers to use local variables in stored procedures and scripts to place filter on basis of these local variables. YES, these local variables can slowdown your queries. Let’s prove it.
Create a new table and insert dummy rows.

USEAdventureWorks

GO
CREATE TABLE TempTable
      (tempID UNIQUEIDENTIFIER,tempMonth INT,tempDateTime DATETIME )
GO

INSERT INTO TempTable (tempID, tempMonth, tempDateTime)

SELECT NEWID(),(CAST(100000*RAND() AS INT) % 12) + 1 ,GETDATE()
GO 100000 — (EXECUTE THIS BATCH 100000 TIME)

— Create an index to support our query

CREATE NONCLUSTERED INDEX[IX_tempDateTime] ON [dbo].[TempTable]
([tempDateTime] ASC)
INCLUDE ( [tempID]) WITH ( ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
Now let’s execute a simple query with hard coded values in WHERE clause

SET STATISTICS IO ON

GO
SELECT * FROM TempTable
WHEREtempDateTime > ‘2012-07-10 03:18:01.640’
——————————————————————————————-
Table ‘TempTable’. Scan count 1, logical reads 80,physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Check out its execution plan and index seeks properties. You can find that estimated rows are double to actual rows but that’s not a big difference to affect execution plan and resultantly optimizer has selected a proper plan to execute this query.

Query optimizer has estimated number of rows from its base statistics histogram i.e.  EQ_ROWS + AVG_RANGE_ROWS (77 + 88.64286)

DBCC SHOW_STATISTICS (‘dbo.TempTable’, IX_tempDateTime)

Now, let’s modify our SELECT query and use local variable and execute it. You will find that query optimizer has selected a different plan this time, a more costly plan. WHY ??

DECLARE@RequiredDate DATETIME
SET@RequiredDate = ‘2012-07-10 03:18:01.640’

SELECT * FROM TempTable
WHEREtempDateTime  >@RequiredDate
——————————————————————————————
Table ‘TempTable’. Scan count 1, logical reads 481, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Huge difference of estimated and actual number of rows clearly indicating that query optimizer was unable to proprly estimate number of rows and with this wrong estimation, it has selected a more costly execution plan. Basically Query Optimizer does not know the value of local variable at the time of optimization and resultantly can’t use histogram of statistics. It behaves differently with inequality and equality operators.

In Case of Inequality Operator:
In our case of inequality operator in query, query optimizer used a simple formula of 30% of total rows.

Estimated Rows =(Total Rows * 30)/100 = (100000*30)/100 = 30000

 In Case of Equality Operator:

DECLARE@RequiredDate DATETIME
SET@RequiredDate = ‘2012-07-10 03:18:01.640’

SELECT * FROM TempTable
WHEREtempDateTime  =@RequiredDate

If equality operator is used with local variables, query optimizer gets estimated rows figure from a different formula i.e.  Density * Total Number of Table Rows. Execute following query to get density value.

DBCC SHOW_STATISTICS (‘dbo.TempTable’, IX_tempDateTime)

All Density = 0.0007358352

Total Number of Rows in Table = 100000
Estimated Rows = Density * Total Number =  0.0007358352 *  100000 = 73.5835

Drop table when not required

DROP TABLETempTable

SQL Server: Smarter Way of Query Load Testing at Testing Server

Most import decision by Database Administrator is that a query on development server, where only hundred or thousand of rows exists, can also perform flawless according to given benchmarks when we will deploy same query on production server, where number of rows could be in millions.

One solution is to insert millions of rows in testing environment and then check execution plan. But it’s really painful.

Thanks to SQL Server, which has provided a better solution, since SQL Server 2005. Yes, you can test a query that base table contains only dozen of rows but can act like they have million of rows (or as much as you want). Let’s try with a simple query at Adventure Works.

SELECT  p.ProductID, p.Name, pm.Name AS ProductModel, pmx.CultureID,
        pd.Description
FROM    Production.Product AS p
        INNER JOIN Production.ProductModel AS pm
        ON p.ProductModelID = pm.ProductModelID
        INNER JOIN Production.ProductModelProductDescriptionCulture AS pmx
        ON pm.ProductModelID = pmx.ProductModelID
        INNER JOIN Production.ProductDescription AS pd
        ON pmx.ProductDescriptionID =pd.ProductDescriptionID
WHERE   pm.Name = ‘Road-150’

How many rows each table (in above query) contains, check with following query.

SELECT OBJECT_NAME(object_id),rows FROM sys.partitions

WHERE object_id IN
(object_id(‘Production.ProductModel’),
object_id(‘Production.ProductModelProductDescriptionCulture’) ,
object_id(‘Production.ProductDescription’) ,
object_id(‘Production.Product’))        
AND index_id = 1

On execution of first query, you can find that in execution plan, SQL Server Optimizer took number of rows estimate from its table statistics and its showing correct estimated and actual number of rows.

Now we will deceive SQL Server Optimizer for number of rows of ‘Production.Product’ table. Simple use following update statistics BUT with undocumented options i.e. ROWCOUNT and PAGECOUNT

 UPDATE STATISTICS Production.Product WITH ROWCOUNT = 10000000, PAGECOUNT = 1000000

Execute our first SELECT query but withDBCC FREEPROCCACHE” and it will show a different execution plan, as SQL Server Optimizer thought now number of rows are 10000000.
Now we have a better picture, that what will be the execution plan if number of rows are increased to 10000000 and it will be helpful to place new indexes and to take deceision like applying partition scheme.

To restore, actual number of rows, just rebuild all indexes on ‘Production.Product’ table.
ALTER INDEX ALLON Production.ProductDescription REBUILD

Note: Don’t use this method on production server.

SQL Server has encountered occurrence(s) of I/O requests taking longer than 15 seconds to complete

This morning, while going through my regular SQL Server Logs reports, for one of our production server, I found a different error.
SQL Server has encountered 52 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\TempDB.mdf] in database [TempDB] (2). The OS file handle is 0x00000884. The offset of the latest long I/O is: 0x00000457490000
First thing that I searched about this error was that is this a critical message?
And answer I found was YES. Basically when talk about I/O in SQL Server, we always have measurements of mille seconds in our mind and waits of several seconds is considered too odd. SQL Server I/O wait time can be examined by following query:
SELECT  *
FROM    sys.dm_os_wait_stats
WHERE   wait_type LIKE ‘PAGEIOLATCH%’
How to check you hard drive performance?
To check, server IO subsystems I trust on Performance Monitor IO Counter PhysicalDisk Object: Avg. Disk Queue Length. Monitor this counter for at least 10 minutes. If the Avg. Disk Queue Length exceeds 2 for next ten minutes for each individual disk drive in an array, then it is sure that you have IO bottleneck.

Who is the culprit, SQL Server or Operating System?
Problem is only your SAN or Local disk IO subsystem. In my case, I found that few other applications were also installed by client on same drive and which were pushing SQL Server to wait for too long to complete its IO requests.

SQL Server: Too High Difference in CPU and Elapsed Time (Duration) , Don’t Blame IO Every Time

There could be different causes behind too high difference in CPU and Elapsed Time (Duration) value for a query executed by SQL Server.  One of the most common reasons is IO problem.  This can easily be observed by executing following query:
SELECT  *
FROM    sys.dm_os_wait_stats
WHERE   wait_type LIKE‘PAGEIOLATCH%’
If number of waits and average wait time is too high then there is something wrong on IO side. To get the root cause, you have to check different things like queries without proper indexes (high page read by queries), pressure on TempDB side, hard drive and memory performance etc.
Normally when you are facing up to two times higher elapsed time value as compared to CPU then IO waits could be a cause, but what if your query elapsed time is 5, 10 or more times high. 
Recently on one of our production server, a query gave me amazing time stats.


I know it’s a simple query, indexes are properly applied and it return results with fast response. Query executed under 1 sec of response time (SSMS properties can show up to seconds), but Time Stats showing that query elapsed time is more than 6 Seconds.
Why this happening:
This happened because my production server CPUs counter are not synchronized with each other. It could be confirmed from SQL Server Log. (SQL Server 2005 Service Pack 2 and higher edition show this message)
The time stamp counter of CPU on scheduler id 13 is not synchronized with other CPUs.
Good thing is that, there is nothing wrong with your SQL Server performance and everything will work fine. Only problem you can face is that, your performance tuning process will be affected as you can’t collect correct information regarding query execution time.
This basically happens when you make changes in power polices or install utilities that can affect CPU performance and can easily be corrected by setting you machine, power options to “Always On” or “Max Performance”.

SQL Server: Idera SQL Doctor

If you don’t know much about SQL Server internals and its performance tuning techniques, but still need your SQL Server performance, right upto the mark, then you must try SQL Doctor, a really helpful tool by Idera. 
SQL doctor is a revolutionary technology that analyzes the performance of SQL Server and provides recommendations for improving performance.

SQL Server: Does Unwanted Tables in a Query or View Affect Performance

Recently a friend of mine asked, that is it true that presence of extra tables in joins section of a query, will affect query performance. Extra tables means,tables which can be skipped from query without affecting query result. For example following query has extra tables (other than vendor and contact tables) in join section
USE AdventureWorks
GO

SELECT Vendor.Name,
Contact.Title,
Contact.FirstName,
Contact.MiddleName
FROM Person.Address AS a
INNER JOIN Purchasing.VendorAddress AS VendorAddress
ON a.AddressID = VendorAddress.AddressID
INNER JOIN Person.StateProvince AS StateProvince
ON StateProvince.StateProvinceID = a.StateProvinceID
INNER JOIN Person.CountryRegion AS CountryRegion
ON CountryRegion.CountryRegionCode = StateProvince.CountryRegionCode
INNER JOIN Purchasing.Vendor AS Vendor
INNER JOIN Purchasing.VendorContact AS VendorContact
ON VendorContact.VendorID = Vendor.VendorID
INNER JOIN Person.Contact AS Contact
ON Contact.ContactID = VendorContact.ContactID
INNER JOIN Person.ContactType AS ContactType
ON VendorContact.ContactTypeID = ContactType.ContactTypeID
ON VendorAddress.VendorID = Vendor.VendorID
Though this is NOT common to have extra tables in our usual queries but it could be possible in views. A view can be created with multiple tables and selecting columns from each joined table. And later on when we will query this view we can use only few columns in our select statement. So when we will execute above query SQL Server Query Analyzer will skip all those tables which are not part of game. Here is execution plan of above query.

Same query with more columns, pushing all tables in action.

SELECT Vendor.Name,
ContactType.Name AS ContactType,
Contact.Title,
Contact.FirstName,
Contact.MiddleName,
a.AddressLine1,
a.AddressLine2,
a.City,
StateProvince.Name AS StateProvinceName,
a.PostalCode,
CountryRegion.Name AS CountryRegionName,
Vendor.VendorID
FROM Person.Address AS a
INNER JOIN Purchasing.VendorAddress AS VendorAddress
ON a.AddressID = VendorAddress.AddressID
INNER JOIN Person.StateProvince AS StateProvince
ON StateProvince.StateProvinceID = a.StateProvinceID
INNER JOIN Person.CountryRegion AS CountryRegion
ON CountryRegion.CountryRegionCode = StateProvince.CountryRegionCode
INNER JOIN Purchasing.Vendor AS Vendor
INNER JOIN Purchasing.VendorContact AS VendorContact
ON VendorContact.VendorID = Vendor.VendorID
INNER JOIN Person.Contact AS Contact
ON Contact.ContactID = VendorContact.ContactID
INNER JOIN Person.ContactType AS ContactType
ON VendorContact.ContactTypeID = ContactType.ContactTypeID
ON VendorAddress.VendorID = Vendor.VendorID

If we create a view using our second query and use our view in following style then execution plan will be same to our first query.

SELECT Name,
Title,
FirstName,
MiddleName
FROM vw_MyView
Hence, SQL Server Query Analyser is quite smart and work on only those tables which are part of actual game and it doesn’t matter that extra tables are part of your query or a view.

SQL Server: Does Unwanted Tables in a Query or View Affect Performance

Recently a friend of mine asked, that is it true that presence of extra tables in joins section of a query, will affect query performance. Extra tables means, tables which can be skipped from query without affecting query result. For example following query has extra tables (other than vendor and contact tables) in join section
USE AdventureWorks
GO

SELECT  Vendor.Name,
        Contact.Title,
        Contact.FirstName,
        Contact.MiddleName
FROM    Person.Address AS a
        INNER JOIN Purchasing.VendorAddress AS VendorAddress 
        ON a.AddressID = VendorAddress.AddressID
        INNER JOIN Person.StateProvince AS StateProvince 
        ON StateProvince.StateProvinceID = a.StateProvinceID
        INNER JOIN Person.CountryRegion AS CountryRegion 
        ON CountryRegion.CountryRegionCode = StateProvince.CountryRegionCode
        INNER JOIN Purchasing.Vendor AS Vendor
        INNER JOIN Purchasing.VendorContact AS VendorContact 
        ON VendorContact.VendorID = Vendor.VendorID
        INNER JOIN Person.Contact AS Contact 
        ON Contact.ContactID = VendorContact.ContactID
        INNER JOIN Person.ContactType AS ContactType 
        ON VendorContact.ContactTypeID = ContactType.ContactTypeID 
        ON VendorAddress.VendorID = Vendor.VendorID
Though this is NOT common to have extra tables in our usual queries but it could be possible in views. A view can be created with multiple tables and selecting columns from each joined table. And later on when we will query this view we can use only few columns in our select statement. So when we will execute above query SQL Server Query Analyzer will skip all those tables which are not part of game. Here is execution plan of above query.

Same query with more columns, pushing all tables in action.

SELECT  Vendor.Name,
        ContactType.Name AS ContactType,
        Contact.Title,
        Contact.FirstName,
        Contact.MiddleName,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        StateProvince.Name AS StateProvinceName,
        a.PostalCode,
        CountryRegion.Name AS CountryRegionName,
        Vendor.VendorID
FROM    Person.Address AS a
        INNER JOIN Purchasing.VendorAddress AS VendorAddress 
        ON a.AddressID = VendorAddress.AddressID
        INNER JOIN Person.StateProvince AS StateProvince 
        ON StateProvince.StateProvinceID = a.StateProvinceID
        INNER JOIN Person.CountryRegion AS CountryRegion 
        ON CountryRegion.CountryRegionCode = StateProvince.CountryRegionCode
        INNER JOIN Purchasing.Vendor AS Vendor
        INNER JOIN Purchasing.VendorContact AS VendorContact 
        ON VendorContact.VendorID = Vendor.VendorID
        INNER JOIN Person.Contact AS Contact 
        ON Contact.ContactID = VendorContact.ContactID
        INNER JOIN Person.ContactType AS ContactType 
        ON VendorContact.ContactTypeID = ContactType.ContactTypeID 
        ON VendorAddress.VendorID = Vendor.VendorID

If we create a view using our second query and use our view in following style then execution plan will be same to our first query.

SELECT  Name,
        Title,
        FirstName,
        MiddleName
FROM    vw_MyView

Hence, SQL Server Query Analyser is quite smart and work on only those tables which are part of actual game and it doesn’t matter that extra tables are part of your query or a view.

>SQL Server: What I Have Learned in May 2011

Dedicated to Most Energetic SQL Expert Gail Shaw

SQL Tips & Tricks:
Third Party Tools:
Performance Tuning:
SQL Server Management Studio:
Database Management Views & Functions:

>SQL Server: Which Performs Better, IN or EXISTS

>

It’s true that to solve a problem in TSQL you have many choices, and adopting any of these techniques you can get your desired results. Today we will discuss two very commonly used IN and EXISTS clause to filter query result set based on records from a sub-query. As compared to EXISTS, IN is more commonly used, but which one is more efficient? That’s what we are looking for, today.
 IN and EXISTS perform in a same way if we use them with single column search. Like as follow:
USE AdventureWorks
GO
–IN
SELECT * FROM Production.Product pr
WHERE ProductID IN
      (SELECT ProductID FROM Purchasing.ProductVendor)
–EXISTS
SELECT * FROM Production.Product pr
WHERE EXISTS
      (SELECT 1 FROM Purchasing.ProductVendor  
              WHERE ProductID = pr.ProductID)
Logical reads and query elapsed time is also same for both queries.
In situations where you need to filter records based on more than one columns existence in sub-query, you will find EXISTS much better in performance. To observe this, let’s create two temporary tables.
CREATE TABLE #Cars (Make VARCHAR(50), Color VARCHAR(30), Seats INT)
CREATE TABLE #CarIssuance (Make VARCHAR(50),Color VARCHAR(30),IssuanceDate DATETIME)
GO
INSERT INTO #Cars
SELECT ‘Honda’,‘Black’,2 union all
SELECT ‘Honda’,‘White’,2 union all
SELECT Toyota,‘Black’,4 union all
SELECT Toyota,‘Silver’,4 union all
SELECT ‘BMW’, ‘Red’,2
GO
INSERT INTO #CarIssuance
SELECT ‘Honda’,‘Black’,‘2011-05-20’ union all
SELECT ‘BMW’,‘Red’,‘2011-05-03’ union all
SELECT Toyota,‘Black’,‘2011-05-03’
If we need to get all records from #Cars table, where records exists in #CarIssuance table on basis of “make” and “color” columns. Let’s first try traditional IN clause.
SELECT * FROM #Cars
WHERE Color  IN
 (SELECT Color FROM #CarIssuance)
 AND Make IN
      (SELECT Make FROM #CarIssuance)
Table ‘#CarIssuance Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0
Table ‘#Cars Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0
SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 40 ms.
Now execute same query with EXISTS option
SELECT * FROM #Cars Cr
WHERE
 EXISTS
 (SELECT 1 FROM #CarIssuance CI
                        WHERE Make = Cr.Make
                              AND Color = Cr.Color)
Table ‘#CarIssuance Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0
Table ‘#Cars Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0
SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.
Exists performed much better (We got result in 1ms instead of 40ms) just because we have used single sub-query but in case of “IN”, we have used multiple sub-queries. So if somehow, we change our first query of “IN” clause so that we can use only one sub-query, performance will be same as to EXISTS. But code will be bit mixed up and in real life sometime even its not possible.
SELECT * FROM #Cars Cr
WHERE Color  IN
 (SELECT Color FROM #CarIssuance WHERE Make = Cr.Make)
Summary: Exists performs much better when used for more then one column filter from a sub-query but as a good programming practice EXISTS must be preferred even when handling with single column filter.

>SQL Server: sp_WhoIsActive, a Must Tool for a DBA

>

A mail with subject “Low Database Response” or a call with words “What happened to the database?” is normal in a SQL Server DBA’s life. To cope with such bad situations normally a DBA keeps his drawer filled with necessary tools and scripts. But like me, most of DBAs first query to check “What is happening actually” is shortest query in SQL Server i.e. sp_who or sp_who2. Second returns almost same data to sp_who but with more detail information.
sp_who 
sp_who2 
Though both system stored procedures are helpful to collect basic information to find out problematic sessions of database but to reach your desired rows, you have to skim through large number of rows and you will find more then 95% of rows useless. Some time information provided by sp_who or sp_who2 is not enough to understand the real problem which leads you to query some other DMVs for such information.
Thanks to Adam Machanic who resolved this problem. His stored procedure WhoIsActive is a must tool for a DBAs kit. This single stored procedure collects almost all necessary information for a DBA, which will be helpful to understand the real problem of a database. The beauty of this stored procedure is that it collects information using almost 15 DMVs but still returns only necessary rows.