Index Selectivity – Left Based

There are several types of indexes available but I will be referring to only clustered and non clustered indexes.

Clustered Index is a type of index where the logical and physical order of the table records are in the same order.

Non Clustered Index is a type of index where the logical and physical order of the table records are NOT in the same order. This index typically consists of specific columns defined in the index creation unlike clustered index which applies to the entire base table.

You can refer to BOL link here

There can only be one clustered index and up to 249 non clustered indexes on the table. All non clustered indexes will also include either cluster index key or ROWID depending on whether the table is clustered or heap that act as pointer back to base table. This pointer helps in creating a “look up” if additional columns are need to be fetched from the base table by the query.

The basic purpose of Indexes is to improve the performance of SQL Query and hence the design of indexes is very important as selectivity of index depends on how well the index and query logic is defined.

Query optimizer uses indexes and performs either scan or seek operation on them to fetch the data.Generally, seek is better for fetching single rows and small subset of rows,scan is better for pulling large set of rows.

While it is possible to tell the query optimizer to either use seek or scan using query hints, it is generally best practice to leave it up to the query optimizer and better investigate further to see why the optimizer is using not-so optimal plan(parameter sniffing,statistics update etc). It is also important to look the index definitions and the way queries are begin written.

I am using Adventure Works 2012 for this example,and you can download it from here.In AdventureWorks2012 database,there is a table called Person.Person and it has non clustered non unique index on LastName,FirstName,MiddleName.

Consider the below example:

Select * from Person.Person where FirstName='Gail' and MiddleName='A'
This performs a non clustered index scan but since the query returns only very small subset of data, an index seek would be an ideal operation here. The reason to do index scan is because the index is defined over LastName,FirstName and MiddleName but the query predicate is filtering only on FirstName and MiddleName. The Execution plan looks like this:

NC_IndexScan

Select * from Person.Person where LastName='Erickson' and MiddleName='A' This leads to a non clustered index seek which is an optimal plan.

NC_IndexSeek

So,the take away from this blog is if the left most index columns are used in the query predicate, the query optimizer comes up with an optimal plan. If the left the most column is not used in the query predicate, it will always lead to index scan whether it is optimal or not. That’s why it is important to have right indexes defined so the query optimizer can make best use of the index.