Since the inception of SSIS, the deployment of SSIS was always tricky and eventually, there were many challenges which came up for the deployment of SSIS to automate/schedule their execution. However, this helps Microsoft a lot as we have seen solution deployment in SSIS 2005/2008 to Project deployment in SSIS 2012. It is pointless to say that configurations have played a major role in providing the great heights for SSIS of what it is today being one of the most recognized ETL tool.
But personally I like the SSIS 2012 because it has solved a lot of problems for me which I have faced in my previous projects. So let me define the problem statement for you.
“How can I deploy a list of more than 70 packages in one shot and with a least number of changes in parameters”?
The SSIS 2005/2008 has helped in identifying this problem and after a quite workaround, it has been found that if we change the configurations in different packages, we can bind the packages with common parameters. But the problem was that solution is that we cannot deploy the projects separately and all the projects has to be deployed along with the solution with the common configurations. However with SSIS 2012, the opportunity knocked in with a separation of Solution Deployment and Package Deployment. The uniqueness of the Project Deployment model is that we can define different configurations and parameters for respective projects which allows a developer to design the project based on the defined requirements for an ETL.
This deployment model has solved the purpose of distributing the projects at various levels. For example, if a developer wants to load the Stage database from Source System, we can define various levels like Logging (at DB level), Auditing (at DB level), Extraction, Transformation, Business Logic Implementation, Cleansing and Load. And all the above layers can be executed in one shot by making a Parent Project consisting of Master packages calling the master packages for the respective layers.
The benefit of the above approach is that it is object and subject oriented. Also, as per the specific needs of the layer, the developer can define the various parameters to handle the load in a particular manner.
Now the solution can be simple, by the using the metadata of the packages. How? Let’s find out the same below.
1) Define a parent for each project and called it as <Project Name>_Master.dtsx
2) Create a metadata based table with the following attributes.
CREATE TABLE [dbo].[PackageMetadata](
[PackageID] [int] IDENTITY(1,1) NOT NULL,
[PackageName] [varchar](100) NULL,
[PackageArea] [varchar](100) NULL,
[PackageDomainName] [varchar](100) NULL,
[IsActive] [bit] NULL,
[CreatedDate] [smalldatetime] NULL,
[ModifiedDate] [smalldatetime] NULL,
[SortKey] [smallint] NULL)
The above table is going to hold the metadata for the packages including the package Name, the area of the package that it belongs to (For example, the project name), the domain of the package (For example, the Stage), whether the package is active or not, the timestamps and the sort Key. The SoryKey is one of the important attribute of the table as it will define the flow the data load.
Note that this is a custom made table and we can use the SSIS Catalog based tables too for doing the same job but advantage of custom table is to alter and reuse the data (or metadata) based on needs of the project/solution.
3) Insert the relevant data in the metadata table and define the SortKeys. For example, SortKey can be defined as 1 for all the master packages (Parent Packages) and can be defined as 2 for all the dependent packages (Child packages)
4) While implementing the Master package, define a data flow task with ADO.net object as the source. The source should be connected to the metadata table with relevant filters on area and the domain with a sorting on the key.
5) Introduce a loop container in the package and create a loop on running the packages based on the above query from data flow task.
Now we are all set and one parent will trigger the child packages and the load will be performed successfully. The benefits of this strategy is that it is easy to maintain and can be expanded for the other packages with just one record insertion. Along with that, if logging is enabled, the load performance and the error check can be easily performed.