Is the Index hurting or helping?

The concept of index is to help retrieve data faster and hence to improve query performance. But, it is not always the case, as poorly designed index could have negative performance impact. On a high level, Non Clustered Index is like a sub table formatted in B Tree with columns defined in the index definition. Every insert,delete,update on the base table will also have to do the same operation on the non clustered index for those columns referred in the index definition. So, all this means additional work for SQL server as well as more storage. Hence, it is important to make sure that indexes are being more valuable compared to what they are costing.

Below SQL Code gives a general idea on how useful the indexes are. I am not checking for clustered index, since it exists on the base table and probably almost all tables need a clustered index.

Select DB_Name() as [DatabaseName], Schema_Name(Schema_id)+'.'+Object_Name(A.Object_ID) as [TableName], A.Name as [IndexName],A.type_desc as [IndexType],IndexSize_MB,User_updates as [IndexCostCount], user_seeks+user_scans+user_lookups as [IndexUsageCount],Last_User_Update as Last_Cost_Operation,
(Select Max(v) from (Values (last_user_lookup),(last_user_scan),(last_user_seek)) As Value(v)) as [Last_Useful_Operation]
From sys.indexes A LEFT OUTER JOIN
sys.dm_db_index_usage_stats B on A.index_id=B.index_id and A.object_id=B.object_id and B.database_ID=db_ID()
INNER JOIN sys.tables C on C.object_id=a.object_id and A.type_desc not in ('Heap','CLUSTERED')
INNER JOIN (Select (SUM(used_page_count) * 8/1024) AS IndexSize_MB,index_id,Object_id FROM sys.dm_db_partition_stats GROUP BY object_id,index_id) D ON
D.index_id=A.index_id AND D.object_id=A.object_id
order by IndexSize_MB

I am summing the user seeks,scans and lookups as to get one total value for index for usage.

Higher the IndexUsageCount compared to IndexCostCount,it generally implies that the index is useful. Of course, whether the usage and operation(seek\scan\lookup) of a particular index in a query is optimal or not cannot be decided without investigating the actual execution plan.

This information is not persisted after restart of SQL Server or database is offline.

Index Selectivity – Left Based

There are several types of indexes available but I will be referring to only clustered and non clustered indexes.

Clustered Index is a type of index where the logical and physical order of the table records are in the same order.

Non Clustered Index is a type of index where the logical and physical order of the table records are NOT in the same order. This index typically consists of specific columns defined in the index creation unlike clustered index which applies to the entire base table.

You can refer to BOL link here

There can only be one clustered index and up to 249 non clustered indexes on the table. All non clustered indexes will also include either cluster index key or ROWID depending on whether the table is clustered or heap that act as pointer back to base table. This pointer helps in creating a “look up” if additional columns are need to be fetched from the base table by the query.

The basic purpose of Indexes is to improve the performance of SQL Query and hence the design of indexes is very important as selectivity of index depends on how well the index and query logic is defined.

Query optimizer uses indexes and performs either scan or seek operation on them to fetch the data.Generally, seek is better for fetching single rows and small subset of rows,scan is better for pulling large set of rows.

While it is possible to tell the query optimizer to either use seek or scan using query hints, it is generally best practice to leave it up to the query optimizer and better investigate further to see why the optimizer is using not-so optimal plan(parameter sniffing,statistics update etc). It is also important to look the index definitions and the way queries are begin written.

I am using Adventure Works 2012 for this example,and you can download it from here.In AdventureWorks2012 database,there is a table called Person.Person and it has non clustered non unique index on LastName,FirstName,MiddleName.

Consider the below example:

Select * from Person.Person where FirstName='Gail' and MiddleName='A'
This performs a non clustered index scan but since the query returns only very small subset of data, an index seek would be an ideal operation here. The reason to do index scan is because the index is defined over LastName,FirstName and MiddleName but the query predicate is filtering only on FirstName and MiddleName. The Execution plan looks like this:


Select * from Person.Person where LastName='Erickson' and MiddleName='A' This leads to a non clustered index seek which is an optimal plan.


So,the take away from this blog is if the left most index columns are used in the query predicate, the query optimizer comes up with an optimal plan. If the left the most column is not used in the query predicate, it will always lead to index scan whether it is optimal or not. That’s why it is important to have right indexes defined so the query optimizer can make best use of the index.