SSIS Packages in SQL 2012/14

From SQL 2012 onwards, ssis packages can be deployed in project model. This requires creating a catalog and enabling ‘CLR’ integration. Once the packages are deployed in project model, they can be browsed in the ‘Integration Services Catalog’ in the management studio. Below code retrieves the packages along with the folder and packages present in the integration services catalog.

select A.name as [FolderName],B.name as[ProjectName],C.name as [PackageName]
from [catalog].[folders] A
INNER JOIN [catalog].[projects] B ON A.folder_id=B.folder_id
INNER JOIN [catalog].[packages] C on C.project_id=B.project_id

Query to retrieve column information using catalog views

The below query helps to retrieve information on columns in a table:

Select A.Name,B.Name,D.name,A.max_length,A.precision,
A.is_identity,A.is_nullable,E.[definition]
from sys.columns A inner join sys.tables B on A.object_id=B.object_id
inner join sys.schemas C on C.schema_id=B.schema_id
inner join sys.types D on D.user_type_id=A.user_type_id
left outer join sys.default_constraints E on E.parent_column_id=A.column_id and A.object_id=E.parent_object_id
where C.name='<<SchemaName>>' and B.name='<<TableName>>'

sp_columns system stored procedure can also be used to get the information.

System Stored Procedures

There are several catalog views that can be used to query database related info. We also have system stored procedures to get same basic information.

Some of the commonly used one’s are:
1.sp_helpdb
2.sp_help
3.sp_helpfile
4.sp_databases
5.sp_tables: list the table names
sp_tables @table_owner='HumanResources',@table_type="'Table','view'"
6.sp_columns: lists the column names and types and other columns details
sp_columns @table_owner='Person',@table_name='Person'
7.sp_pkeys: lists the primary key of the table
sp_pkeys @table_owner='Person',@table_name='Person'
8.sp_fkeys: lists all the tables where the key is referred.
sp_fkeys @pktable_name='Employee',@pktable_owner='HumanResources'

Inequality Join Predicate- Nested Loop

I was reading about Joins in Kalen Delaney’s Microsoft SQL Server Internals 2012 and came across a point “a nested loops join is the only join algorithm that SQL Server can use without atleast one equijoin predicate”. This means that regardless of the dataset size or indexes, if no equality predicate exists in join condition, the optimizer will choose only ‘Nested Loop’ physical operator. Below test case verifies the same:

Create Table #Test1(sno int identity(1,1) primary key,sname varchar(6) default 'table1')
Create Table #Test2(sno int identity(5000,1) primary key,sname varchar(6) default 'table1')
GO
Insert into #Test1 Values (default)
Go 1000
Insert into #Test2 Values (default)
Go 1000
/*If no equality predicate is present in the join condition, then nested loop join is the only physical operator optimizer can use*/
Select A.sname,B.sname from #Test1 A Inner Join #Test2 B on A.sno<>B.sno and A.sname<>b.sname

Drop table #Test1
Drop table #Test2 

Below is the execution plan:
ExecutionPlan_NestedLoop

Simple Column Level Encryption

Column Level Encryption can be achieved in many ways and one of the simplest ways to accomplish is by encrypting the column with a pass phrase. This is very easy way to implement as encryption and decryption can be done with the same key(symmetric).

Create Table ColumnEncrypt(SEncrypt varbinary(max))
--Encrypt the column
Insert into ColumnEncrypt(SEncrypt)
select ENCRYPTBYPASSPHRASE('My$tr@^gP@$$W0R%','This is confidental')

select * from ColumnEncrypt
--Decrypt the data
Select convert(varchar,DECRYPTBYPASSPHRASE('My$tr@^gP@$$W0R%',SEncrypt)) as [DecryptedData] from ColumnEncrypt

The Encrypted and Decrypted SQL statement is not seen in the profiler or extended event, so, other people cannot track pass phrase.

Is the Index hurting or helping?

The concept of index is to help retrieve data faster and hence to improve query performance. But, it is not always the case, as poorly designed index could have negative performance impact. On a high level, Non Clustered Index is like a sub table formatted in B Tree with columns defined in the index definition. Every insert,delete,update on the base table will also have to do the same operation on the non clustered index for those columns referred in the index definition. So, all this means additional work for SQL server as well as more storage. Hence, it is important to make sure that indexes are being more valuable compared to what they are costing.

Below SQL Code gives a general idea on how useful the indexes are. I am not checking for clustered index, since it exists on the base table and probably almost all tables need a clustered index.

Select DB_Name() as [DatabaseName], Schema_Name(Schema_id)+'.'+Object_Name(A.Object_ID) as [TableName], A.Name as [IndexName],A.type_desc as [IndexType],IndexSize_MB,User_updates as [IndexCostCount], user_seeks+user_scans+user_lookups as [IndexUsageCount],Last_User_Update as Last_Cost_Operation,
(Select Max(v) from (Values (last_user_lookup),(last_user_scan),(last_user_seek)) As Value(v)) as [Last_Useful_Operation]
From sys.indexes A LEFT OUTER JOIN
sys.dm_db_index_usage_stats B on A.index_id=B.index_id and A.object_id=B.object_id and B.database_ID=db_ID()
INNER JOIN sys.tables C on C.object_id=a.object_id and A.type_desc not in ('Heap','CLUSTERED')
INNER JOIN (Select (SUM(used_page_count) * 8/1024) AS IndexSize_MB,index_id,Object_id FROM sys.dm_db_partition_stats GROUP BY object_id,index_id) D ON
D.index_id=A.index_id AND D.object_id=A.object_id
order by IndexSize_MB

I am summing the user seeks,scans and lookups as to get one total value for index for usage.

Higher the IndexUsageCount compared to IndexCostCount,it generally implies that the index is useful. Of course, whether the usage and operation(seek\scan\lookup) of a particular index in a query is optimal or not cannot be decided without investigating the actual execution plan.

This information is not persisted after restart of SQL Server or database is offline.

Parameter Sniffing

Parameter Sniffing is a typical problem when using stored procedures which could result in bad query plans. So, when the procedure is executed for the first time, a query plan is created and stored in procedure cache for reuse. The initial plan generated will be in perspective of parameter passed in the first execution and this plan is saved in the procedure cache and reused for subsequent procedure calls until the plan is invalidated by the update statistics or when procedure cache is cleared or code recompile happens.

So,once the plan is created, it could be reused even for different parameters which is where, it can cause problems because there is no guarantee that the plan generated for first parameter is also good plan for the subsequent parameters. This is called as Parameter Sniffing.

--Below is the scenario for Parameter Sniffing.
Create Procedure usp_PersonNamePhone(@Id int)
as
Begin
Select A.FirstName,A.Lastname, B.PhoneNumber
from [Person].[Person] A
INNER JOIN [Person].[PersonPhone] B on A.BusinessEntityID=B.BusinessEntityID
where A.BusinessEntityID<@Id
End

--You can see same plan being used in all three procedure calls.
usp_PersonNamePhone @Id= 2
usp_PersonNamePhone @Id= 200
usp_PersonNamePhone @Id= 20000

There are several ways parameter sniffing problem can be addressed and below some of the commonly used methods.Keep in mind that each method has it pros and cons.

1.Recompile every time the procedure is called and hence, generates query plan as per the input parameter.
Create Procedure usp_PersonNamePhone_V1(@Id int) WITH RECOMPILE
as
Begin
Select A.FirstName,A.Lastname, B.PhoneNumber
from [Person].[Person] A
INNER JOIN [Person].[PersonPhone] B on A.BusinessEntityID=B.BusinessEntityID
where A.BusinessEntityID<@Id
End

--Different Query plans is generated based on Input.
usp_PersonNamePhone_V1 @Id= 2
usp_PersonNamePhone_V1 @Id= 200
usp_PersonNamePhone_V1 @Id= 20000

2.Use Statement level recompile. This is better option, if the procedure has several statements and only some statements are having issues.
Create Procedure usp_PersonNamePhone_V2(@Id int)
as
Begin
Select A.FirstName,A.Lastname, B.PhoneNumber
from [Person].[Person] A
INNER JOIN [Person].[PersonPhone] B on A.BusinessEntityID=B.BusinessEntityID
where A.BusinessEntityID<@Id Option (Recompile)
End

--Different Query plans is generated based on Input.
usp_PersonNamePhone_V2 @Id= 2
usp_PersonNamePhone_V2 @Id= 200
usp_PersonNamePhone_V2 @Id= 20000

3.Optimize for unknown value by assigning a parameter value to a variable.
Create Procedure usp_PersonNamePhone_V3(@Id int)
as
Begin
declare @EntityID int
Set @EntityID=@Id
Select A.FirstName,A.Lastname, B.PhoneNumber
from [Person].[Person] A
INNER JOIN [Person].[PersonPhone] B on A.BusinessEntityID=B.BusinessEntityID
where A.BusinessEntityID<@EntityID
End

--Generalized Query plan.
usp_PersonNamePhone_V3 @Id= 2
usp_PersonNamePhone_V3 @Id= 200
usp_PersonNamePhone_V3 @Id= 20000

4.Creates plan for the specified 'optimize for' value.
Create Procedure usp_PersonNamePhone_V4(@Id int)
as
Begin
Select A.FirstName,A.Lastname, B.PhoneNumber
from [Person].[Person] A
INNER JOIN [Person].[PersonPhone] B on A.BusinessEntityID=B.BusinessEntityID
where A.BusinessEntityID<@Id Option (Optimize for (@Id=200))
End

--Query plan is created for @id=200.
usp_PersonNamePhone_V4 @Id= 2
usp_PersonNamePhone_V4 @Id= 200
usp_PersonNamePhone_V4 @Id= 20000

5.Creates plan for a Unknown value. The unknown value is calculated based on statistical data.More on this here.
Create Procedure usp_PersonNamePhone_V5(@Id int)
as
Begin
Select A.FirstName,A.Lastname, B.PhoneNumber
from [Person].[Person] A
INNER JOIN [Person].[PersonPhone] B on A.BusinessEntityID=B.BusinessEntityID
where A.BusinessEntityID<@Id Option (Optimize for unknown)
End

--Query plan is created for a unknown value.
usp_PersonNamePhone_V5 @Id= 2
usp_PersonNamePhone_V5 @Id= 200
usp_PersonNamePhone_V5 @Id= 20000

TSQL code to view Job Activity

SQL Server Agent is a very key component of SQL Server and typically used to schedule jobs,alerts etc. Whenever we need to restart SQL Server agent, it is good practice to make sure there are no running jobs. We can check this using job activity monitor or using the TSQL code below.

I prefer the TSQL code as it is easier and also can run it on central management server(CMS) to get the list of running jobs across multiple servers.

--currently running jobs in job activity monitor(SQL 2005 and above)
Select @@SERVERNAME as [DBServer],C.Name as [JobName], run_requested_date,start_execution_date,stop_execution_date
from (Select max(Session_id) as Session_Id from msdb.dbo.syssessions) A
INNER JOIN msdb.dbo.sysjobactivity B on A.Session_id=B.Session_ID
INNER JOIN msdb.dbo.sysjobs C on B.job_id=C.Job_ID
where B.stop_execution_date is null AND B.run_requested_date is not null

Deleting Old Backup Information from MSDB

MSDB is a system database which stores information on backup history,sql jobs,alerts, database email, job history …etc.With regards to backups,It stores all the backup information unless you have some delete routine to delete the information from the backup tables. Generally, there will be a retention policy which defines how long old backup files are to be retained. This can be anywhere between couple of weeks to years depending on the requirement. Once you determine your retention requirement,you might not need backup information in MSDB beyond your retention policy, as you do not have the files, this information does not really help you. Also, this will keep MSDB database size in check and will also result in smaller MSDB backup file.

So, to delete this information, there are bunch of backup tables where the information is stored. The following pic shows the corresponding tables that store backup information.
MSDB Backup Diagram

Now, Create the below procedure that deletes the data from the above tables.

If Exists(Select 1 from sys.procedures where name='usp_DeleteBackupInformation')
Begin
Drop Procedure usp_DeleteBackupInformation
End
GO
Create Procedure usp_DeleteBackupInformation(@Purgedate datetime)
as
Begin
Set nocount on
Begin Try
Begin Tran

--deletes backupfile info
Delete A from msdb.dbo.backupfile A INNER JOIN msdb.dbo.backupset B
on A.backup_set_id=B.backup_set_id Where backup_finish_date < @PurgeDate

--deletes backupfilegroup info
Delete A from msdb.dbo.backupfilegroup A INNER JOIN msdb.dbo.backupset B
on A.backup_set_id=B.backup_set_id Where backup_finish_date < @PurgeDate

--deletes restorefile info
Delete A from msdb.dbo.restorefile A INNER JOIN msdb.dbo.restorehistory B
on A.restore_history_id=b.restore_history_id INNER JOIN msdb.dbo.backupset C
on C.backup_set_id=B.backup_set_id Where backup_finish_date < @PurgeDate

--deletes restorefilegroup info
Delete A from msdb.dbo.restorefilegroup A INNER JOIN msdb.dbo.restorehistory B
on A.restore_history_id=b.restore_history_id INNER JOIN msdb.dbo.backupset C
on C.backup_set_id=B.backup_set_id Where backup_finish_date < @PurgeDate

--deletes restorehistory info
Delete A from msdb.dbo.restorehistory A INNER JOIN msdb.dbo.backupset B
on A.backup_set_id=B.backup_set_id Where backup_finish_date < @PurgeDate

--delete backupset info
Delete from msdb.dbo.backupset where backup_finish_date < @PurgeDate

--deletes backupmedia info
Delete A from msdb.dbo.backupmediafamily A where media_set_id not in
(Select distinct media_set_id from msdb.dbo.backupset)

--deletes backupmediaset info
Delete A from msdb.dbo.backupmediaset A where media_set_id not in
(Select distinct media_set_id from msdb.dbo.backupset)

Commit
End Try
Begin Catch
Select ERROR_NUMBER as ErrorNumber, ERROR_MESSAGE() as ErrorMessage,
ERROR_SEVERITY() as ErrorSeverity
Rollback
End Catch
End

Example : usp_DeleteBackupInformation '20070101'
In the above example, it will delete all the backup information prior to Jan 1st 2007.

Database Restore Report

It is very typical to frequently have a copy of production database restored on to DEV\QA\TEST Instances, so as to work with more real time data. The below query helps in figuring out details such as when the database was restored,source server of restored database, user who restored it, which backup file was used to restore… etc.

Also, this query is useful to generate a report of all backup files that were restored along with other details while testing your backups by physically restoring them.

NOTE: At this point, the script works correctly if your backup types are Full, Log, Differential.Also, it is a good practice to delete unwanted backup history from MSDB database. So, if the backup\restore history data got purged, we might not see results from some databases.

Select A.destination_database_name as [Restored_Database_Name],
A.user_name as [RestoredBy_User_Name],
A.Restore_Date,A.replace as [Replace],
Case When A.restore_type='D' then 'Full'
When A.restore_type='I' then 'Differential'
When A.restore_type='L' then 'Log' End as [Restore_Type],
B.database_name as [Backup_DatabaseName],
B.database_creation_date as [Backup_DB_Creation_Date],
B.server_name as [Backup_ServerName],
B.user_name as [Backuped_by_User],
B.backup_start_date as [Backup_DB_Start_Date],
B.backup_finish_date as [Backup_DB_Finish_Date],
Case When B.[Type]='D' then 'Full'
When B.[Type]='I' then 'Differential'
When B.[Type]='L' then 'Log' End as [Restore_Type],
D.physical_device_name as [BackupFile],Position as [Backup_File_Position],
D.family_sequence_number as [Split_Backup_Sequence]
from msdb.dbo.restorehistory A
INNER JOIN msdb.dbo.backupset B on A.backup_set_id=B.backup_set_id
INNER JOIN msdb.dbo.backupmediafamily D on D.media_set_id=B.media_set_id
INNER JOIN (Select Max(restore_history_id) as restore_history_id ,
Destination_Database_Name
from msdb.dbo.RestoreHistory
where Restore_Type='D'
Group by destination_database_name) C on C.destination_database_name=A.destination_database_name
and A.restore_history_id >=C.restore_history_id