Archive for the Category ◊ Databases ◊

Wednesday, August 31st, 2016

PRAGMA is a keyword given to the compiler by Oracle to execute in a specific way.
Now lets revise what is a compiler?
A compiler is a computer program (or a set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language), with the latter often having a binary form known as object code.

How pragma functions?
Being the part of source code, the pragma statement tips the compiler to differently compile the program unit, thus manipulating the behavior of the program unit, currently under compilation. Note that the pragma is interpreted only at the time of compilation and not at the time of execution.

Types of pragma :-
1.  AUTONOMOUS_TRANSACTION – Compiler allows schema objects like subprograms, PL/SQL blocks, or triggers to commit their transaction within their scope.
An oracle feature, which allowed having simultaneous active transactions in the same database session. The autonomous pragma allows a separate active transaction, which is independent of the main transaction in the session. The COMMIT/ROLLBACK done within the autonomous object does not affect the main transaction and vice versa. All in all, it is an individual transaction and not the part of main transaction.
Practical Example:- Lets take an example of batch process which processes 100 records. In the 3rd record we get an exception or an error but we want to log the same and simultaneously complete processing 100 records. In such a case another PRAGMA  AUTONOMOUS_TRANSACTION procedure can be created which will log the errors in a separate session.

2. RESTRICT_REFERENCES – Allows the compiler to scan the database purity during package compilation. It can be used in packages only.
The pragma was introduced to control and preserve the database state. It worked as a watchdog for the packaged function so that they should not modify the package or database state. It restricted the creation of the package, if its member function violated the specified purity level.

The pragma checks purity level of the function in four states

WNDS – DML operations restricted on any table in the database
WNPS – Modification restricted on package variables
RNDS – SELECT query restricted on the database tables
RNPS – Restriction on the selection of package variables

Example:-
CREATE OR REPLACE PACKAGE PACKAGE ABC
IS
NAME VARCHAR2(50) := ‘AKS’;
FUNCTION NAME RETURN NUMBER;
PRAGMA RESTRICT_REFERENCES<NAME,WNPS>; — Modification restricted on package variables.
END;
/

3.EXCEPTION_INIT – Compiler associates a user defined exception name with an exception number.
The pragma EXCEPTION INIT allows the association of a self chosen error number with a user defined exception. The error number can be any of the system recognized error number.
ORD_EXCEPTION EXCEPTION;
PRAGMA EXCEPTION_INIT(ORD_EXCEPTION, -100);

4. SERIALLY_REUSABLE – Compiler deduces the package state from session level to just a server call. It can be used in Package spec only.
The pragma SERIALLY_REUSABLE enables the package state initialization for each and every server call. As per the conventional behavior of the packages, their state is initialized only during the first time they are referenced in a session. This state remains persistent for the complete session and is flushed off only at the termination of the session.

Category: Databases  | Leave a Comment
Wednesday, August 31st, 2016

What is an Index?
An index is a data structure (most commonly a B- tree) that stores the values for a specific column in a table. An index is created on a column of a table. So, the key points to remember are that an index consists of column values from one table, and that those values are stored in a data structure. Simply put, an index is a pointer to data in a table.

For example, if you want to reference all pages in a book that discuss a certain topic, you first refer to the index, which lists all topics alphabetically and are then referred to one or more specific page numbers.
An index helps speed up SELECT queries and WHERE clauses, but it slows down data input, with UPDATE and INSERT statements. Indexes can be created or dropped with no effect on the data.

How does it helps in enhancing performance of a database?
Because an index is basically a data structure that is used to store column values, looking up those values becomes much faster. And, if an index is using the most commonly used data structure type – a B- tree – then the data structure is also sorted. Having the column values be sorted can be a major performance enhancement.
Suppose a B- tree index on the Customer_Name column. This means that when we search for customers named “Abhishek”, then the entire customer table does not have to be searched to find customers named “Abhishek”. Instead, the database will use the index to find employees named “Abhishek” because the index will presumably be sorted alphabetically by the Customer’s name. It’s also important to note that the index also stores pointers to the table row so that other column values can be retrieved – read on for more details on that.

Types of Indexes and when to use them :

Single-Column Indexes : A single-column index is one that is created based on only one table column.
Unique Indexes: Unique indexes are used not only for performance, but also for data integrity. A unique index does not allow any duplicate values to be inserted into the table.

Composite Indexes:A composite index is an index on two or more columns of a table.
Whether to create a single-column index or a composite index, take into consideration the column(s) that you may use very frequently in a query’s WHERE clause as filter conditions.
Should there be only one column used, a single-column index should be the choice. Should there be two or more columns that are frequently used in the WHERE clause as filters, the composite index would be the best choice.

When should indexes be avoided?
Although indexes are intended to enhance a database’s performance, there are times when they should be avoided. The following guidelines indicate when the use of an index should be reconsidered:

=> Indexes should not be used on small tables.
=> Columns that are frequently manipulated should not be indexed.

Another performance hit with indexes is the fact that whenever you add, delete, or update rows in the corresponding table, the same operations will have to be done to your index. An index needs to contain the same data as whatever is in the table column(s) that the index covers.

Category: Databases  | Leave a Comment
Tuesday, June 07th, 2016

When setting up CD Pipeline in MS TFS for database, we would need to answer few questions such as

  • What are best practices for database deployments in this scenario
  • Do you deploy a new database to development and test and upgrade scripts to acc and prod
  • How to keep track of the db changes
  • How to deal with branches

Here’s my two cents on the questions

  • What are best practices for database deployments in this scenario?
      • There’re two ways we can deal with the Database deployment scenario
        • Source controlled Scripts
        • Source controlled database objects – preferred for complex enterprise scenarios

This article deals with the pros and cons of the above mentioned approach – here

  •  Tools to use
    • Data tier application in MS Stack is best suited and Available out of box
    • From my past experience, Source controlled database objects work very well in delivering a CD Pipeline and this can be achieved using Data Tier Applications available from MS Stack. Most significant of them is SQLPROJ and DACPAC
    • RM supports deployment of databases out of box
      • SQLPROJ
        • We can develop, Manage, Compare and Deploy changes to database
        • Has ability to reverse engineer existing database
      • DAC
        • Data tier application, which groups database objects, which has following operations deploy, register, unregister  and extract
        • More on this – here
  •  Do you deploy a new database to development and test and upgrade scripts to acc and prod?
  • Data tier applications provide a package (DACPAC), which is a self-contained unit, though which same version of database code can be deployed across multiple environments, thus removing the need for manual scripting etc.
  • How to keep track of the db changes?
  • Data Tier application, provides way of versioning databases

How to deal with branches?

If database objects are source controlled and physical databases are versioned, then branching would just be the same way we branch the source code.

Note:

When the customer is using MS TFS as their ALM tool, then Release Management(RM) plays a crucial role CD Pipeline. RM has inbuilt commands and actions to deploy database using data tier applications. All the thoughts I’ve put forward here are from my hands on experience in using RM to deploy source controlled database to multiple environments in a CD Pipeline. Please let me know if you’ve more questions/thoughts

Thursday, March 10th, 2016

Many times we have been in situation where we need compare/manipulate the in a query based on different rows in the dataset. If we want to process current record by referring to record previous to current record or may be next record.

In such situation we can immediately think of using a cursor. But as we know using a cursor is not wise decision as it has performance impact.

Therefore let’s look at a very powerful feature introduced in   MSSSQL server 2012 onwards. SQL Server 2012 introduces new analytical function LEAD() and LAG(). These functions accesses data from a subsequent row (for lead) and previous row (for lag) in the same result set without the use of a self-join.

1

This can be achieved by just following simple query

SELECT s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty,
LEAD(SalesOrderDetailID) OVER (ORDER BY SalesOrderDetailID) LeadValue,
LAG(SalesOrderDetailID) OVER (ORDER BY SalesOrderDetailID) LagValue

With above query we have access to only immediate after or immediate after row.  What if you had to access records that are N rows away from current record?

SELECT s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty,
LEAD(SalesOrderDetailID,2) OVER (ORDER BY SalesOrderDetailID) LeadValue,
LAG(SalesOrderDetailID,2) OVER (ORDER BY SalesOrderDetailID) LagValue
FROM Sales.SalesOrderDetail s 

2

If sqlserver is processing current record which is the first row, and if we are using lag function, then we know there are no records that lie before the first row, hence the LAG function will return NULL. If we want to put a default value in such situation with can pass a 3rd argument to replace NULL with that value LAG(SalesOrderDetailID,2,0).

The real power of lead and lag functions can be seen when we are dealing with records that need to group and then some actions has to be performed.

3

This can be achieved using partition by in combination with lead or lag functionality

SELECT s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty,
LEAD(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID
ORDER BY SalesOrderDetailID) LeadValue,
LAG(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID
ORDER BY SalesOrderDetailID) LagValue
FROM Sales.SalesOrderDetail s

Category: Databases  | Leave a Comment
Tuesday, February 02nd, 2016

There are quite a few advantages we are aware of using inheritance in C#, JAVA and other languages.

For example say we have base class Asset, and you have derived classes such as all the assets belong to an organization e.g. Buildings, rooms, desks, computers, phones etc. these all can be managed by just using the Asset base class.

What if we could achieve same with Database objects (or table) if we could have a mechanism to define a base class (or base table) and all other objects could derive from it, it would add so much flexibility and sense to the overall design. Also saving a lot of code complexity in handling different scenarios.

Say there are people in the organization who are allowed to use some of the assets assigned to him/her. In a typical scenario we would have one table for each buildings, rooms, desks, computers, phones and one table for storing people. Now if we have to track the allocation of the assets to a person, then we would need to have 5 other mapping tables (e.g. person to buildings, person to rooms, person to computers etc.).

If we have a single table to track all these asset then we just need 1 mapping table i.e. “Person to Asset” this can be achieved by using table inheritance concepts in database.

1

There are mainly 3 types of inheritance

Table per hierarchy (TPH)

Here all the data is stored in a single table with union of all the columns in all the derived class and base class.

Table:

2

Here one row in the table are used per entity e.g. if we are storing a Room then the columns Id, Name, Capacity, Type columns are populated with data, column ‘IsServer’ which does not belong to this entity is inserted with NULL. Discriminator column is used to identify the type of the entity i.e. it will be populated with B for buildings, R – Rooms, D – Desks and C – Computers.

Advantages

  • Flat structure faster read/writing
  • Simpler queries in case you are using ORM tools for data access
  • All the entity column should be nullable in table, not null logic should be handled at ORM layer
  • Columns that do not participate in the entity being stored the space allocations still takes place
  • Table are not in normalized form

Limitations

  • All the entity column should be nullable in table, not null logic should be handled at ORM layer
  • Columns that do not participate in the entity being stored the space allocations still takes place
  • Table are not in normalized form

Table per type (TPT)

Here the data is stored in one table per class, so the data for each class resides in its own table. The relations with the base class is maintained by primary/foreign keys.

3

Each entity here is a new row in assets table with an auto generated primary key, a corresponding entry will be made in one of the child table depending on what type of child it is having same primary/foreign key that was auto generated in Assets table.

Advantages

  • Nullable constrain can be added in DB
  • No wastage of space as entity reside in its own type table
  • Due to normalization more join statements are required hence low performance
  • Tables are in Normalized form

Limitations

  • Due to normalization more join statements are required hence low performance 

 

Table per concrete class (TPC)

Here there is no table for base type, every type has its own table with all the column of base class as well its own properties

 

4

Advantages

  • Nullable constrain can be added in DB
  • No wastage of space as entity reside in its own type table
  • Flat structure faster read/writing
  • Simpler queries in case you are using ORM tools for data access
  • In case want to define a relation with respect to base class it’s not possible with single mapping table, we need to have 4 mapping table to map it with a other entity.
  • Tables are in Normalized form

Limitations

  • In case want to define a relation with respect to base class it’s not possible with single mapping table, we need to have 4 mapping table to map it with a other entity.

These concepts are supported out of box by entity framework. For more details please read at: http://www.entityframeworktutorial.net/code-first/inheritance-strategy-in-code-first.aspx

 

Author:
Wednesday, July 08th, 2015

Since the inception of SSIS, the deployment of SSIS was always tricky and eventually, there were many challenges which came up for the deployment of SSIS to automate/schedule their execution. However, this helps Microsoft a lot as we have seen solution deployment in SSIS 2005/2008 to Project deployment in SSIS 2012. It is pointless to say that configurations have played a major role in providing the great heights for SSIS of what it is today being one of the most recognized ETL tool.

But personally I like the SSIS 2012 because it has solved a lot of problems for me which I have faced in my previous projects. So let me define the problem statement for you.

“How can I deploy a list of more than 70 packages in one shot and with a least number of changes in parameters”?

The SSIS 2005/2008 has helped in identifying this problem and after a quite workaround, it has been found that if we change the configurations in different packages, we can bind the packages with common parameters. But the problem was that solution is that we cannot deploy the projects separately and all the projects has to be deployed along with the solution with the common configurations. However with SSIS 2012, the opportunity knocked in with a separation of Solution Deployment and Package Deployment. The uniqueness of the Project Deployment model is that we can define different configurations and parameters for respective projects which allows a developer to design the project based on the defined requirements for an ETL.

This deployment model has solved the purpose of distributing the projects at various levels. For example, if a developer wants to load the Stage database from Source System, we can define various levels like Logging (at DB level), Auditing (at DB level), Extraction, Transformation, Business Logic Implementation, Cleansing and Load. And all the above layers can be executed in one shot by making a Parent Project consisting of Master packages calling the master packages for the respective layers.

The benefit of the above approach is that it is object and subject oriented. Also, as per the specific needs of the layer, the developer can define the various parameters to handle the load in a particular manner.

Now the solution can be simple, by the using the metadata of the packages. How? Let’s find out the same below.

1)      Define a parent for each project and called it as <Project Name>_Master.dtsx

2)      Create a metadata based table with the following attributes.

CREATE TABLE [dbo].[PackageMetadata](

   [PackageID] [int] IDENTITY(1,1) NOT NULL,

   [PackageName] [varchar](100) NULL,

   [PackageArea] [varchar](100) NULL,

   [PackageDomainName] [varchar](100) NULL,

   [IsActive] [bit] NULL,

   [CreatedDate] [smalldatetime] NULL,

   [ModifiedDate] [smalldatetime] NULL,

   [SortKey] [smallint] NULL)

The above table is going to hold the metadata for the packages including the package Name, the area of the package that it belongs to (For example, the project name), the domain of the package (For example, the Stage), whether the package is active or not, the timestamps and the sort Key. The SoryKey is one of the important attribute of the table as it will define the flow the data load.

Note that this is a custom made table and we can use the SSIS Catalog based tables too for doing the same job but advantage of custom table is to alter and reuse the data (or metadata) based on needs of the project/solution.

3)      Insert the relevant data in the metadata table and define the SortKeys. For example, SortKey can be defined as 1 for all the master packages (Parent Packages) and can be defined as 2 for all the dependent packages (Child packages)

4)      While implementing the Master package, define a data flow task with ADO.net object as the source. The source should be connected to the metadata table with relevant filters on area and the domain with a sorting on the key.

5)      Introduce a loop container in the package and create a loop on running the packages based on the above query from data flow task.

 Parent Child Pattern in SSIS

Now we are all set and one parent will trigger the child packages and the load will be performed successfully. The benefits of this strategy is that it is easy to maintain and can be expanded for the other packages with just one record insertion. Along with that, if logging is enabled, the load performance and the error check can be easily performed.

Thursday, April 23rd, 2015

Phonetic algorithm:

Phonetic algorithm is an algorithm which searches for the word with similar sound. In other words, it matches the words with same\similar pronunciation. When we search for names in database, it is difficult to find same names with different spellings. For example, “John -Jhon” , “Thomas – Tomas”, “Steven – Stephen”, “Rakesh – Rakhesh”This problem can be solved using phonetic algorithm.

 

Some of the phonetic algorithms are as below:

Soundex: This creates a four character string with three numbers preceded by a character.
Daitch-Mokotoff Soundex: Extended version of above algorithm; creates six numbers as the output code.
Beider-Morse Phonetic Matching: Unlike above two algorithms, it does not check the “sound” test by spelling, but by the linguistic properties of the language.
Kölner Phonetik: Focussed for German words with the same Soundex logic.

In this article, I will talk about Soundex algorithm which is popular among all other phonetic algorithms.

Soundex:

This algorithm was developed by Robert C Russell and Margaret K. Odell and was patented in 1918. This algorithm breaks down the name to a four characters (single letter followed by three numbers) code.
Let us try to create code for the below word with this algorithm

“backscratcher”

Algorithm is as follows:

1. Keep first letter of the word as is.
                backscratcher -> B

2. Ignore all the vowels (A,E,I,O,U) and H,W,Y. Replace these characters with number ‘0’.
                backscratcher -> B0CKSCR0TC00R

3. Replace following letters with corresponding number (as mentioned below)
                1 – B, F, P, V
                2 – C, G, J, K, Q, S, X, Z
                3 – D, T
                4 – L
                5 – M, N
6 – R

B0CKSCR0TC00R -> B02226032006

4. Remove consequently repeated numbers
                B02226032006 -> B02603206

5. Remove all zero and truncate the characters to four digits. If the output is less than three characters, then fill the rest of the characters with zeros.
                B02603206 -> B26326 -> B263

Below are some sample outputs when we search for names with different spellings:

Names Code
John – Jhon J500
Marie – Mary M600
Raghu – Ragu R200
Prabu – Prabhu P610
Marc – Mark M620

 

Many databases like Oracle, SQL, MySQL etc.. has implemented this algorithm as a built-in function.

Syntax for soundex in SQL:

SOUNDEX(string)

Example:

Query to search the people with name pronounced as ‘Stephen’

Select name from PeopleTable where soundex(name) = soundex(‘Stephen’)

Category: Databases, General  | One Comment
Thursday, February 26th, 2015

Need for Database Unit Testing

  • Database Unit Testing ensures iterative changes don’t break existing functionality
  • Generates automated tests for programmable objects (Functions, Stored Procedures,Triggers etc..)

  • Develop a group of tests to run with every database build

  • SQL Server unit tests do establish a baseline state for your database and then helps to verify any subsequent changes that you make to database object

How to achieve?

Challenges

  • Need to have latest SQL Server Data Tools integrated with Visual Studio
  • Target platform/Database needs to be set appropriately and compatible mode(while using VS2012 it sets SQL Server 2014 as default, hence change the target server version accordingly)
  • NuGet package manager is must while dealing with remote data server
  • Make sure existence of MSBuild engine(above two things will be taken care if we have VS2012 ultimate version)

What next after Unit testing?

  • Integration with daily builds(C.I)
  • Running automated unit test of DB along with application unit testing parallel or serial
  • Getting Code coverage, Code analysis metrics
Category: Databases  | Leave a Comment
Author:
Monday, December 01st, 2014

Data Warehousing is a big world to visit but fortunately its a flat world where we have only two ends. One end, where the data is available in the source as raw and unstructured data and the other end, where the data is presentable and structured with the history available for the analysis.

In this article, I am going to summarize about the very first manager (The Load Manager), which is required to do the basic operations but mostly load the data into the databases from different types of data sources.

LOAD MANAGER

Highlights
– This Component performs the operations required to extract and load process.
– The size and complexity of load manager varies between specific solutions from data warehouse to data warehouse.

LOAD MANAGER ARCHITECTURE
The load manager does the following functions.
– Extract the data from source system.
– Fast Load the extracted data into temporary data store.
– Perform simple transformations into structure similar to the one in the data warehouse.

Please find the picture below which is summarizing the basic functionalities for the load manager

dwh_loadManager

EXTRACT DATA FROM SOURCE
The data is extracted from the operational databases or the external information providers. Gateways is the application programs that are used to extract data. It is supported by underlying DBMS and allows client program to generate SQL to be executed at a server. Open Database Connection( ODBC), Java Database Connection (JDBC), are examples of gateway.

FAST LOAD
In order to minimize the total load window the data need to be loaded into the warehouse in the fastest possible time.
The complex transformations affects the speed of data processing. It is more effective to load the data into relational database prior to applying complex transformations and checks.
Gateway technology proves to be not suitable, since they tend not be performant when large data volumes are involved. But certainly there are gateways still exist too which can perform the faster bulk load provided the hardware is supporting for all the possible operations.

SIMPLE TRANSFORMATIONS
While loading it may be required to perform simple transformations. After this has been completed we are in position to do the complex checks. Suppose we are loading the EPOS sales transaction we need to perform the following checks.
– Strip out all the columns that are not required within the warehouse.
– Convert all the values to required data types. etc.

For More details and references, please refer the book  – Data Warehousing in Real World

Author:
Friday, November 28th, 2014

Problem Statement – I want to send a flash report in the email to my customers (Data Driven Email Notifications)

Definition of Flash Report – The report which is summarizing the business with the help of very few facts and information

Requirements – MS SQL Server 2008/2012 (Database Email configured i.e. SMTP), HTML Support

Solution

  1. First generate a SQL query to get the data which is supposed to be displayed in the flash report
  2. Use the below query template and replace the information as per the need

DECLARE @Body varchar(max)
declare @TableHead varchar(max)
declare @TableTail varchar(max)
declare @mailitem_id as int
declare @statusMsg as varchar(max)
declare @Error as varchar(max)
declare @Note as varchar(max)

Set NoCount On;
set @mailitem_id = null
set @statusMsg = null
set @Error = null
set @Note = null
Set @TableTail = ‘</table></body></html>’;

–HTML layout–
Set @TableHead = ‘<html><head>’ +
‘<H1 style=”color: #000000″>HEADER OF TABLE</H1>’ +
‘<style>’ +
‘td {border: solid black 1px;padding-left:5px;padding-right:5px;padding-top:1px;padding-bottom:1px;font-size:9pt;color:Black;} ‘ +
‘</style>’ +
‘</head>’ +
‘<body><table cellpadding=0 cellspacing=0 border=0>’ +
‘<tr bgcolor=#F6AC5D>’+
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td></tr>’;

–Select information for the Report–
Select @Body= (Select
Column As [TD],
Column As [TD],
Column As [TD],
Column As [TD],
Column As [TD]

FROM [DB].[dbo].[Table]
where -condition-
(whatever you want to do else …)

For XML raw(‘tr’), Elements)

— Replace the entity codes and row numbers
Set @Body = Replace(@Body, ‘_x0020_’, space(1))
Set @Body = Replace(@Body, ‘_x003D_’, ‘=’)
Set @Body = Replace(@Body, ‘<tr><TRRow>1</TRRow>’, ‘<tr bgcolor=#C6CFFF>’)
Set @Body = Replace(@Body, ‘<TRRow>0</TRRow>’, ”)

Set @Body = @TableHead + @Body + @TableTail

— return output–
Select @Body

–Email
EXEC msdb.dbo.sp_send_dbmail
@profile_name =email’, <–This is the mail account to sent from.
@mailitem_id = @mailitem_id out,
@recipients=’blah@blah.co.za’,
@subject = ‘subject Email’,
@body = @Body,
@body_format = ‘HTML’;

3. Create a Stored Procedure with the parameters to be passed (For example- recipients, subject etc)

4. Schedule the Stored Procedure on the SQL Server Agent

Result – The scheduled procedure will email the report to the recipient box and provide the updated information with every schedule.

Alternative Use – This concept can be used to provide the regular notifications based on the success/failure of the process/scripts in the project and it is helpful to act faster.

Possible Additions – The script above can be altered to use the CSS and can provide a more professional report if required.

Category: Databases, HTML5, Others | Tags:  | Leave a Comment