Archive for the Category ◊ Databases ◊

Wednesday, July 08th, 2015

Since the inception of SSIS, the deployment of SSIS was always tricky and eventually, there were many challenges which came up for the deployment of SSIS to automate/schedule their execution. However, this helps Microsoft a lot as we have seen solution deployment in SSIS 2005/2008 to Project deployment in SSIS 2012. It is pointless to say that configurations have played a major role in providing the great heights for SSIS of what it is today being one of the most recognized ETL tool.

But personally I like the SSIS 2012 because it has solved a lot of problems for me which I have faced in my previous projects. So let me define the problem statement for you.

“How can I deploy a list of more than 70 packages in one shot and with a least number of changes in parameters”?

The SSIS 2005/2008 has helped in identifying this problem and after a quite workaround, it has been found that if we change the configurations in different packages, we can bind the packages with common parameters. But the problem was that solution is that we cannot deploy the projects separately and all the projects has to be deployed along with the solution with the common configurations. However with SSIS 2012, the opportunity knocked in with a separation of Solution Deployment and Package Deployment. The uniqueness of the Project Deployment model is that we can define different configurations and parameters for respective projects which allows a developer to design the project based on the defined requirements for an ETL.

This deployment model has solved the purpose of distributing the projects at various levels. For example, if a developer wants to load the Stage database from Source System, we can define various levels like Logging (at DB level), Auditing (at DB level), Extraction, Transformation, Business Logic Implementation, Cleansing and Load. And all the above layers can be executed in one shot by making a Parent Project consisting of Master packages calling the master packages for the respective layers.

The benefit of the above approach is that it is object and subject oriented. Also, as per the specific needs of the layer, the developer can define the various parameters to handle the load in a particular manner.

Now the solution can be simple, by the using the metadata of the packages. How? Let’s find out the same below.

1)      Define a parent for each project and called it as <Project Name>_Master.dtsx

2)      Create a metadata based table with the following attributes.

CREATE TABLE [dbo].[PackageMetadata](

   [PackageID] [int] IDENTITY(1,1) NOT NULL,

   [PackageName] [varchar](100) NULL,

   [PackageArea] [varchar](100) NULL,

   [PackageDomainName] [varchar](100) NULL,

   [IsActive] [bit] NULL,

   [CreatedDate] [smalldatetime] NULL,

   [ModifiedDate] [smalldatetime] NULL,

   [SortKey] [smallint] NULL)

The above table is going to hold the metadata for the packages including the package Name, the area of the package that it belongs to (For example, the project name), the domain of the package (For example, the Stage), whether the package is active or not, the timestamps and the sort Key. The SoryKey is one of the important attribute of the table as it will define the flow the data load.

Note that this is a custom made table and we can use the SSIS Catalog based tables too for doing the same job but advantage of custom table is to alter and reuse the data (or metadata) based on needs of the project/solution.

3)      Insert the relevant data in the metadata table and define the SortKeys. For example, SortKey can be defined as 1 for all the master packages (Parent Packages) and can be defined as 2 for all the dependent packages (Child packages)

4)      While implementing the Master package, define a data flow task with object as the source. The source should be connected to the metadata table with relevant filters on area and the domain with a sorting on the key.

5)      Introduce a loop container in the package and create a loop on running the packages based on the above query from data flow task.

 Parent Child Pattern in SSIS

Now we are all set and one parent will trigger the child packages and the load will be performed successfully. The benefits of this strategy is that it is easy to maintain and can be expanded for the other packages with just one record insertion. Along with that, if logging is enabled, the load performance and the error check can be easily performed.

Thursday, April 23rd, 2015

Phonetic algorithm:

Phonetic algorithm is an algorithm which searches for the word with similar sound. In other words, it matches the words with same\similar pronunciation. When we search for names in database, it is difficult to find same names with different spellings. For example, “John -Jhon” , “Thomas – Tomas”, “Steven – Stephen”, “Rakesh – Rakhesh”This problem can be solved using phonetic algorithm.


Some of the phonetic algorithms are as below:

Soundex: This creates a four character string with three numbers preceded by a character.
Daitch-Mokotoff Soundex: Extended version of above algorithm; creates six numbers as the output code.
Beider-Morse Phonetic Matching: Unlike above two algorithms, it does not check the “sound” test by spelling, but by the linguistic properties of the language.
Kölner Phonetik: Focussed for German words with the same Soundex logic.

In this article, I will talk about Soundex algorithm which is popular among all other phonetic algorithms.


This algorithm was developed by Robert C Russell and Margaret K. Odell and was patented in 1918. This algorithm breaks down the name to a four characters (single letter followed by three numbers) code.
Let us try to create code for the below word with this algorithm


Algorithm is as follows:

1. Keep first letter of the word as is.
                backscratcher -> B

2. Ignore all the vowels (A,E,I,O,U) and H,W,Y. Replace these characters with number ’0′.
                backscratcher -> B0CKSCR0TC00R

3. Replace following letters with corresponding number (as mentioned below)
                1 – B, F, P, V
                2 – C, G, J, K, Q, S, X, Z
                3 – D, T
                4 – L
                5 – M, N
6 – R

B0CKSCR0TC00R -> B02226032006

4. Remove consequently repeated numbers
                B02226032006 -> B02603206

5. Remove all zero and truncate the characters to four digits. If the output is less than three characters, then fill the rest of the characters with zeros.
                B02603206 -> B26326 -> B263

Below are some sample outputs when we search for names with different spellings:

Names Code
John – Jhon J500
Marie – Mary M600
Raghu – Ragu R200
Prabu – Prabhu P610
Marc – Mark M620


Many databases like Oracle, SQL, MySQL etc.. has implemented this algorithm as a built-in function.

Syntax for soundex in SQL:



Query to search the people with name pronounced as ‘Stephen’

Select name from PeopleTable where soundex(name) = soundex(‘Stephen’)

Category: Databases, General  | One Comment
Thursday, February 26th, 2015

Need for Database Unit Testing

  • Database Unit Testing ensures iterative changes don’t break existing functionality
  • Generates automated tests for programmable objects (Functions, Stored Procedures,Triggers etc..)

  • Develop a group of tests to run with every database build

  • SQL Server unit tests do establish a baseline state for your database and then helps to verify any subsequent changes that you make to database object

How to achieve?


  • Need to have latest SQL Server Data Tools integrated with Visual Studio
  • Target platform/Database needs to be set appropriately and compatible mode(while using VS2012 it sets SQL Server 2014 as default, hence change the target server version accordingly)
  • NuGet package manager is must while dealing with remote data server
  • Make sure existence of MSBuild engine(above two things will be taken care if we have VS2012 ultimate version)

What next after Unit testing?

  • Integration with daily builds(C.I)
  • Running automated unit test of DB along with application unit testing parallel or serial
  • Getting Code coverage, Code analysis metrics
Category: Databases  | Leave a Comment
Monday, December 01st, 2014

Data Warehousing is a big world to visit but fortunately its a flat world where we have only two ends. One end, where the data is available in the source as raw and unstructured data and the other end, where the data is presentable and structured with the history available for the analysis.

In this article, I am going to summarize about the very first manager (The Load Manager), which is required to do the basic operations but mostly load the data into the databases from different types of data sources.


- This Component performs the operations required to extract and load process.
- The size and complexity of load manager varies between specific solutions from data warehouse to data warehouse.

The load manager does the following functions.
- Extract the data from source system.
- Fast Load the extracted data into temporary data store.
- Perform simple transformations into structure similar to the one in the data warehouse.

Please find the picture below which is summarizing the basic functionalities for the load manager


The data is extracted from the operational databases or the external information providers. Gateways is the application programs that are used to extract data. It is supported by underlying DBMS and allows client program to generate SQL to be executed at a server. Open Database Connection( ODBC), Java Database Connection (JDBC), are examples of gateway.

In order to minimize the total load window the data need to be loaded into the warehouse in the fastest possible time.
The complex transformations affects the speed of data processing. It is more effective to load the data into relational database prior to applying complex transformations and checks.
Gateway technology proves to be not suitable, since they tend not be performant when large data volumes are involved. But certainly there are gateways still exist too which can perform the faster bulk load provided the hardware is supporting for all the possible operations.

While loading it may be required to perform simple transformations. After this has been completed we are in position to do the complex checks. Suppose we are loading the EPOS sales transaction we need to perform the following checks.
- Strip out all the columns that are not required within the warehouse.
- Convert all the values to required data types. etc.

For More details and references, please refer the book  - Data Warehousing in Real World

Friday, November 28th, 2014

Problem Statement - I want to send a flash report in the email to my customers (Data Driven Email Notifications)

Definition of Flash Report – The report which is summarizing the business with the help of very few facts and information

Requirements – MS SQL Server 2008/2012 (Database Email configured i.e. SMTP), HTML Support

Solution -

  1. First generate a SQL query to get the data which is supposed to be displayed in the flash report
  2. Use the below query template and replace the information as per the need

DECLARE @Body varchar(max)
declare @TableHead varchar(max)
declare @TableTail varchar(max)
declare @mailitem_id as int
declare @statusMsg as varchar(max)
declare @Error as varchar(max)
declare @Note as varchar(max)

Set NoCount On;
set @mailitem_id = null
set @statusMsg = null
set @Error = null
set @Note = null
Set @TableTail = ‘</table></body></html>’;

–HTML layout–
Set @TableHead = ‘<html><head>’ +
‘<H1 style=”color: #000000″>HEADER OF TABLE</H1>’ +
‘<style>’ +
‘td {border: solid black 1px;padding-left:5px;padding-right:5px;padding-top:1px;padding-bottom:1px;font-size:9pt;color:Black;} ‘ +
‘</style>’ +
‘</head>’ +
‘<body><table cellpadding=0 cellspacing=0 border=0>’ +
‘<tr bgcolor=#F6AC5D>’+
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td></tr>’;

–Select information for the Report–
Select @Body= (Select
Column As [TD],
Column As [TD],
Column As [TD],
Column As [TD],
Column As [TD]

FROM [DB].[dbo].[Table]
where -condition-
(whatever you want to do else …)

For XML raw(‘tr’), Elements)

– Replace the entity codes and row numbers
Set @Body = Replace(@Body, ‘_x0020_’, space(1))
Set @Body = Replace(@Body, ‘_x003D_’, ‘=’)
Set @Body = Replace(@Body, ‘<tr><TRRow>1</TRRow>’, ‘<tr bgcolor=#C6CFFF>’)
Set @Body = Replace(@Body, ‘<TRRow>0</TRRow>’, ”)

Set @Body = @TableHead + @Body + @TableTail

– return output–
Select @Body

EXEC msdb.dbo.sp_send_dbmail
@profile_name =email’, <–This is the mail account to sent from.
@mailitem_id = @mailitem_id out,
@subject = ‘subject Email’,
@body = @Body,
@body_format = ‘HTML’;

3. Create a Stored Procedure with the parameters to be passed (For example- recipients, subject etc)

4. Schedule the Stored Procedure on the SQL Server Agent

Result – The scheduled procedure will email the report to the recipient box and provide the updated information with every schedule.

Alternative Use – This concept can be used to provide the regular notifications based on the success/failure of the process/scripts in the project and it is helpful to act faster.

Possible Additions – The script above can be altered to use the CSS and can provide a more professional report if required.

Category: Databases, HTML5, Others | Tags:  | Leave a Comment
Friday, October 31st, 2014

Generally we come across scenarios wherein queries take long time to execute. Most of the times the reason would be that there could be some missing indexes. The below query would help to find list of missing indexes.

The query returns four fields:

  1. [Cost]: Is determined based on user seek and user scan on a table.
  2. [Database Name]: Name of the database where the table is present
  3. [equality_columns]: displays columns using equality predicates, e.g.  “Select * from student  where usn = 1”
  4. [inequality_columns]: displays columns using inequality predicates, e.g.  “Select * from student  where usn > 10”
  5. [included_columns] : Columns that could be included while creating the index



MID.[statement] AS [Table Name]

, as [Database Name]




,((MIGS.user_seeks + MIGS.user_scans) * MIGS.avg_total_user_cost * MIGS.avg_user_impact) AS [Cost]

FROM sys.dm_db_missing_index_groups MIG

INNER JOIN sys.dm_db_missing_index_details MID

ON MID.index_handle = MIG.index_handle

INNER JOIN sys.databases D

ON D.database_id = MID.database_id

INNER JOIN sys.dm_db_missing_index_group_stats MIGS

ON MIGS.group_handle = MIG.index_group_handle


Category: Databases  | Leave a Comment
Wednesday, October 29th, 2014

Data Compression is a technique used to reduce physical file sizes in sql which can greatly enhance the performance of many database applications. Data compression reduces disk I/O thereby helping in enhancing the performance. Data Compression is only available in Enterprise (SQL Server 2008 & SQL Server 2008 R2) edition or higher.

When data compression is turned on for a table, the data is compressed and stored in a compressed format. Whenever data is read, it needs to uncompressed. In other words, every time data has to be passed to or from the Storage Engine, it has to be compressed or uncompressed. Even though it is an overhead on CPU, the amount of disk I/O saved by compression increases overall performance of SQL Server. It’s a good practice to enable table compression only when you notice that size of a table is huge.




  1. Reduces physical disk space
  2. Improves performance by reducing the number of logical reads on a table.




  1. Compression adds on additional CPU cycles. If you have CPU constraints, it’s better to turn off table compression


There are two types of compression:


Row-level Data Compression: Row-level data compression converts fixed length data types into variable length data types. This ensures freeing up empty space thereby saving additional space. Since data is compressed, more additional rows can fit into a single 8kb page.


Page-level Data Compression: Page-level compression offers increased data compression over row-level compression. So page level compression is equal to (row level compression + prefix and dictionary compression)


Steps you need to follow to enable data compression:


1. Select and right click on a table which needs to be compressed.

2. Go to Storage -> Manage Compression

3. Select a compression type

4. Click on calculate to compare amount of space saved when you adopt for row level compression or page level compression.

5. The Data compression wizard will prompt you to save a query which could be used for compression of the selected table.

Category: Databases  | Leave a Comment
Thursday, October 09th, 2014

Oracle uses the concept of the ROWID to uniquely identify each row in each table in the database. This is represented as the pseudo-column ROWID

Oracle8 introduced the concept of OBJECTS. OBJECTS have identifiers which are added to the ROWID giving an EXTENDED ROWID format which is 10 bytes long.

ROWID is a VARCHAR2 representation of a base 64 number.

Note:base 64 starts at B, not A, A is used to designate zero

The ROWID is displayed as : OOOOOO.FFF.BBBBBB.SSS

    OOOOOO This is the Data Object Number
    FFF This is the Relative File Number
    BBBBBB This is the Block Number
    SSS This is the Slot Number

Multiple files can have the same relative file number because Oracle assigns the number based on TABLESPACE.

This means that these numbers are only unique per tablespace which indicates that we cannot derive an absolute address directly from the new ROWID.

This means that the ROWID addressing schema is tablespace relative.

ROWID contains the data object number which increments when an object’s version changes.
An object’s version changes whenever a table is truncated or a partition is moved.
The data object number is not related to the Object ID.

Category: Databases  | Leave a Comment
Wednesday, October 08th, 2014

PL/SQL Server Pages (PSP) are server-side scripts that include dynamic content, including the results of SQL queries, inside web pages.

We can create the web pages in an HTML authoring tool and insert blocks of PL/SQL code.

We can compile and load a PL/SQL server page into the database with the loadpsp command-line utility.

This command loads simple.psp into the schema (example HR schema), replacing the show_employees procedure if it exists.

loadpsp -replace simple.psp

Browser users can run the show_employees procedure through a URL. An HTML page that displays the last and first names of employees in the hr.employees table is returned to the browser through the PL/SQL gateway.

Deploying content through PL/SQL Server Pages has these advantages:

    For developers familiar with PL/SQL, the server pages are the easiest way to create professional web pages that include database-generated content. You can develop web pages as you usually do and then embed PL/SQL code in the HTML
    PL/SQL Server Pages can be more convenient than using the HTP and HTF packages to write out HTML content line by line
    Processing is performed on the database server, the client browser receives a plain HTML page with no special script tags supporting all browsers and browser levels
    Network traffic is efficient because use of PL/SQL Server Pages minimizes the number of database round-trips
    Contents can be written quickly and follow a rapid, iterative development process. Maintaining central control of the software, with only a web browser required on the client system

Prerequisites for Developing and Deploying PL/SQL Server Pages

    To write a PL/SQL server page we need access to a text editor or HTML authoring tool for writing the script. No other development tool is required
    To load a PL/SQL server page we need:

    An account on the database in which to load the server pages.

    Execution rights to the loadpsp command-line utility, which is located in $ORACLE_HOME/bin.


Category: Databases  | Leave a Comment
Monday, August 25th, 2014

Table-Valued Parameters is a new feature introduced in SQL SERVER 2008. In earlier versions of SQL SERVER it was not possible to pass a table variable to a stored procedure as a parameter. But Sql Server 2008 allows us to pass table valued parameter to send multiple rows of data to a stored procedure or a function without creating a temporary table or passing so many parameters.

Many of us may come across a situation wherein we pass a huge comma/semi-colon separated string as parameter to a stored procedure and split that string using some function and insert them into a temp table for further usage in the stored procedure.

We can overcome such hassles by using a table valued parameter. We need to follow the below steps in order to use a table valued parameter:

  • Declare the table and user defined table type
  • Create a stored procedure which accepts user defined table type as a parameter
  • Create a data table or structure the same as your table valued parameter. Remember that all columns of the data table are parallel to the table data type
  • Pass the data to the stored procedure as a SqlParameter and the type of this parameter must be in parallel to table data type


Below contains few sample code snippets which illustrate creation of table valued parameters and pass the same to a stored procedure:

Declare the table and user defined table type

OrganizationId INT NOT NULL,
OrganizationName VARCHAR (MAX),
PRIMARY KEY (OrganizationId)

Create a stored procedure which accepts user defined table type as a parameter

INSERT INTO ORGANIZATIONS (OrganizationId , OrganizationName)
SELECT OrganizationId, OrganizationName  FROM @TableParameter

Create a data table or structure the same as your table valued parameter. Remember that all columns of the data table are parallel to the table data type:

DataTable CreateTable()
var dt = new DataTable();
dt.Columns.Add (“OrganizationId”, typeof(Int32));
dt.Columns.Add (“OrganizationName”, typeof(string));
return dt;

Pass the data to the stored procedure as a sql parameter and the type of this parameter must be in parallel to table data type

//Create Table
DataTable dtOrganization = CreateTable();

// Add New Rows to the table
dtOrganization.Rows.Add(1, “Prowareness”);
dtOrganization.Rows.Add(2, ”Airtel”);
dtOrganization.Rows.Add(3, ”Vodafone”);

using (var connection = new SqlConnection(connectionString))
using (var cmd = new SqlCommand(“proc_ADD_ORGANIZATION”, connection))
cmd.CommandType = CommandType.StoredProcedure;

//Pass table valued parameter to Store Procedure
SqlParameter sqlParam = cmd.Parameters.AddWithValue(“@TableParameter”, dtOrganization);
sqlParam.SqlDbType = SqlDbType.Structured;

Category: .Net, Databases  | Leave a Comment