Archive for the Category ◊ Databases ◊

Thursday, April 23rd, 2015

Phonetic algorithm:

Phonetic algorithm is an algorithm which searches for the word with similar sound. In other words, it matches the words with same\similar pronunciation. When we search for names in database, it is difficult to find same names with different spellings. For example, “John -Jhon” , “Thomas – Tomas”, “Steven – Stephen”, “Rakesh – Rakhesh”This problem can be solved using phonetic algorithm.

 

Some of the phonetic algorithms are as below:

Soundex: This creates a four character string with three numbers preceded by a character.
Daitch-Mokotoff Soundex: Extended version of above algorithm; creates six numbers as the output code.
Beider-Morse Phonetic Matching: Unlike above two algorithms, it does not check the “sound” test by spelling, but by the linguistic properties of the language.
Kölner Phonetik: Focussed for German words with the same Soundex logic.

In this article, I will talk about Soundex algorithm which is popular among all other phonetic algorithms.

Soundex:

This algorithm was developed by Robert C Russell and Margaret K. Odell and was patented in 1918. This algorithm breaks down the name to a four characters (single letter followed by three numbers) code.
Let us try to create code for the below word with this algorithm

“backscratcher”

Algorithm is as follows:

1. Keep first letter of the word as is.
                backscratcher -> B

2. Ignore all the vowels (A,E,I,O,U) and H,W,Y. Replace these characters with number ’0′.
                backscratcher -> B0CKSCR0TC00R

3. Replace following letters with corresponding number (as mentioned below)
                1 – B, F, P, V
                2 – C, G, J, K, Q, S, X, Z
                3 – D, T
                4 – L
                5 – M, N
6 – R

B0CKSCR0TC00R -> B02226032006

4. Remove consequently repeated numbers
                B02226032006 -> B02603206

5. Remove all zero and truncate the characters to four digits. If the output is less than three characters, then fill the rest of the characters with zeros.
                B02603206 -> B26326 -> B263

Below are some sample outputs when we search for names with different spellings:

Names Code
John – Jhon J500
Marie – Mary M600
Raghu – Ragu R200
Prabu – Prabhu P610
Marc – Mark M620

 

Many databases like Oracle, SQL, MySQL etc.. has implemented this algorithm as a built-in function.

Syntax for soundex in SQL:

SOUNDEX(string)

Example:

Query to search the people with name pronounced as ‘Stephen’

Select name from PeopleTable where soundex(name) = soundex(‘Stephen’)

Category: Databases, General  | One Comment
Thursday, February 26th, 2015

Need for Database Unit Testing

  • Database Unit Testing ensures iterative changes don’t break existing functionality
  • Generates automated tests for programmable objects (Functions, Stored Procedures,Triggers etc..)

  • Develop a group of tests to run with every database build

  • SQL Server unit tests do establish a baseline state for your database and then helps to verify any subsequent changes that you make to database object

How to achieve?

Challenges

  • Need to have latest SQL Server Data Tools integrated with Visual Studio
  • Target platform/Database needs to be set appropriately and compatible mode(while using VS2012 it sets SQL Server 2014 as default, hence change the target server version accordingly)
  • NuGet package manager is must while dealing with remote data server
  • Make sure existence of MSBuild engine(above two things will be taken care if we have VS2012 ultimate version)

What next after Unit testing?

  • Integration with daily builds(C.I)
  • Running automated unit test of DB along with application unit testing parallel or serial
  • Getting Code coverage, Code analysis metrics
Category: Databases  | Leave a Comment
Author:
Monday, December 01st, 2014

Data Warehousing is a big world to visit but fortunately its a flat world where we have only two ends. One end, where the data is available in the source as raw and unstructured data and the other end, where the data is presentable and structured with the history available for the analysis.

In this article, I am going to summarize about the very first manager (The Load Manager), which is required to do the basic operations but mostly load the data into the databases from different types of data sources.

LOAD MANAGER

Highlights
- This Component performs the operations required to extract and load process.
- The size and complexity of load manager varies between specific solutions from data warehouse to data warehouse.

LOAD MANAGER ARCHITECTURE
The load manager does the following functions.
- Extract the data from source system.
- Fast Load the extracted data into temporary data store.
- Perform simple transformations into structure similar to the one in the data warehouse.

Please find the picture below which is summarizing the basic functionalities for the load manager

dwh_loadManager

EXTRACT DATA FROM SOURCE
The data is extracted from the operational databases or the external information providers. Gateways is the application programs that are used to extract data. It is supported by underlying DBMS and allows client program to generate SQL to be executed at a server. Open Database Connection( ODBC), Java Database Connection (JDBC), are examples of gateway.

FAST LOAD
In order to minimize the total load window the data need to be loaded into the warehouse in the fastest possible time.
The complex transformations affects the speed of data processing. It is more effective to load the data into relational database prior to applying complex transformations and checks.
Gateway technology proves to be not suitable, since they tend not be performant when large data volumes are involved. But certainly there are gateways still exist too which can perform the faster bulk load provided the hardware is supporting for all the possible operations.

SIMPLE TRANSFORMATIONS
While loading it may be required to perform simple transformations. After this has been completed we are in position to do the complex checks. Suppose we are loading the EPOS sales transaction we need to perform the following checks.
- Strip out all the columns that are not required within the warehouse.
- Convert all the values to required data types. etc.

For More details and references, please refer the book  - Data Warehousing in Real World

Author:
Friday, November 28th, 2014

Problem Statement - I want to send a flash report in the email to my customers (Data Driven Email Notifications)

Definition of Flash Report – The report which is summarizing the business with the help of very few facts and information

Requirements – MS SQL Server 2008/2012 (Database Email configured i.e. SMTP), HTML Support

Solution -

  1. First generate a SQL query to get the data which is supposed to be displayed in the flash report
  2. Use the below query template and replace the information as per the need

DECLARE @Body varchar(max)
declare @TableHead varchar(max)
declare @TableTail varchar(max)
declare @mailitem_id as int
declare @statusMsg as varchar(max)
declare @Error as varchar(max)
declare @Note as varchar(max)

Set NoCount On;
set @mailitem_id = null
set @statusMsg = null
set @Error = null
set @Note = null
Set @TableTail = ‘</table></body></html>’;

–HTML layout–
Set @TableHead = ‘<html><head>’ +
‘<H1 style=”color: #000000″>HEADER OF TABLE</H1>’ +
‘<style>’ +
‘td {border: solid black 1px;padding-left:5px;padding-right:5px;padding-top:1px;padding-bottom:1px;font-size:9pt;color:Black;} ‘ +
‘</style>’ +
‘</head>’ +
‘<body><table cellpadding=0 cellspacing=0 border=0>’ +
‘<tr bgcolor=#F6AC5D>’+
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td>’ +
‘<td align=center><b>Name of column</b></td></tr>’;

–Select information for the Report–
Select @Body= (Select
Column As [TD],
Column As [TD],
Column As [TD],
Column As [TD],
Column As [TD]

FROM [DB].[dbo].[Table]
where -condition-
(whatever you want to do else …)

For XML raw(‘tr’), Elements)

– Replace the entity codes and row numbers
Set @Body = Replace(@Body, ‘_x0020_’, space(1))
Set @Body = Replace(@Body, ‘_x003D_’, ‘=’)
Set @Body = Replace(@Body, ‘<tr><TRRow>1</TRRow>’, ‘<tr bgcolor=#C6CFFF>’)
Set @Body = Replace(@Body, ‘<TRRow>0</TRRow>’, ”)

Set @Body = @TableHead + @Body + @TableTail

– return output–
Select @Body

–Email
EXEC msdb.dbo.sp_send_dbmail
@profile_name =email’, <–This is the mail account to sent from.
@mailitem_id = @mailitem_id out,
@recipients=’blah@blah.co.za’,
@subject = ‘subject Email’,
@body = @Body,
@body_format = ‘HTML’;

3. Create a Stored Procedure with the parameters to be passed (For example- recipients, subject etc)

4. Schedule the Stored Procedure on the SQL Server Agent

Result – The scheduled procedure will email the report to the recipient box and provide the updated information with every schedule.

Alternative Use – This concept can be used to provide the regular notifications based on the success/failure of the process/scripts in the project and it is helpful to act faster.

Possible Additions – The script above can be altered to use the CSS and can provide a more professional report if required.

Category: Databases, HTML5, Others | Tags:  | Leave a Comment
Friday, October 31st, 2014

Generally we come across scenarios wherein queries take long time to execute. Most of the times the reason would be that there could be some missing indexes. The below query would help to find list of missing indexes.

The query returns four fields:

  1. [Cost]: Is determined based on user seek and user scan on a table.
  2. [Database Name]: Name of the database where the table is present
  3. [equality_columns]: displays columns using equality predicates, e.g.  “Select * from student  where usn = 1”
  4. [inequality_columns]: displays columns using inequality predicates, e.g.  “Select * from student  where usn > 10”
  5. [included_columns] : Columns that could be included while creating the index

 

SELECT

MID.[statement] AS [Table Name]

,D.name as [Database Name]

,equality_columns

,inequality_columns

,included_columns

,((MIGS.user_seeks + MIGS.user_scans) * MIGS.avg_total_user_cost * MIGS.avg_user_impact) AS [Cost]

FROM sys.dm_db_missing_index_groups MIG

INNER JOIN sys.dm_db_missing_index_details MID

ON MID.index_handle = MIG.index_handle

INNER JOIN sys.databases D

ON D.database_id = MID.database_id

INNER JOIN sys.dm_db_missing_index_group_stats MIGS

ON MIGS.group_handle = MIG.index_group_handle

ORDER BY [Cost] DESC

Category: Databases  | Leave a Comment
Wednesday, October 29th, 2014

Data Compression is a technique used to reduce physical file sizes in sql which can greatly enhance the performance of many database applications. Data compression reduces disk I/O thereby helping in enhancing the performance. Data Compression is only available in Enterprise (SQL Server 2008 & SQL Server 2008 R2) edition or higher.

When data compression is turned on for a table, the data is compressed and stored in a compressed format. Whenever data is read, it needs to uncompressed. In other words, every time data has to be passed to or from the Storage Engine, it has to be compressed or uncompressed. Even though it is an overhead on CPU, the amount of disk I/O saved by compression increases overall performance of SQL Server. It’s a good practice to enable table compression only when you notice that size of a table is huge.

 

Advantages:

 

  1. Reduces physical disk space
  2. Improves performance by reducing the number of logical reads on a table.

 

Disadvantage:

 

  1. Compression adds on additional CPU cycles. If you have CPU constraints, it’s better to turn off table compression

 

There are two types of compression:

 

Row-level Data Compression: Row-level data compression converts fixed length data types into variable length data types. This ensures freeing up empty space thereby saving additional space. Since data is compressed, more additional rows can fit into a single 8kb page.

 

Page-level Data Compression: Page-level compression offers increased data compression over row-level compression. So page level compression is equal to (row level compression + prefix and dictionary compression)

 

Steps you need to follow to enable data compression:

 

1. Select and right click on a table which needs to be compressed.

2. Go to Storage -> Manage Compression

3. Select a compression type

4. Click on calculate to compare amount of space saved when you adopt for row level compression or page level compression.

5. The Data compression wizard will prompt you to save a query which could be used for compression of the selected table.

Category: Databases  | Leave a Comment
Thursday, October 09th, 2014

Oracle uses the concept of the ROWID to uniquely identify each row in each table in the database. This is represented as the pseudo-column ROWID

Oracle8 introduced the concept of OBJECTS. OBJECTS have identifiers which are added to the ROWID giving an EXTENDED ROWID format which is 10 bytes long.

ROWID is a VARCHAR2 representation of a base 64 number.

Note:base 64 starts at B, not A, A is used to designate zero

The ROWID is displayed as : OOOOOO.FFF.BBBBBB.SSS

    OOOOOO This is the Data Object Number
    FFF This is the Relative File Number
    BBBBBB This is the Block Number
    SSS This is the Slot Number

Multiple files can have the same relative file number because Oracle assigns the number based on TABLESPACE.

This means that these numbers are only unique per tablespace which indicates that we cannot derive an absolute address directly from the new ROWID.

This means that the ROWID addressing schema is tablespace relative.

ROWID contains the data object number which increments when an object’s version changes.
An object’s version changes whenever a table is truncated or a partition is moved.
The data object number is not related to the Object ID.

Category: Databases  | Leave a Comment
Wednesday, October 08th, 2014

PL/SQL Server Pages (PSP) are server-side scripts that include dynamic content, including the results of SQL queries, inside web pages.

We can create the web pages in an HTML authoring tool and insert blocks of PL/SQL code.

We can compile and load a PL/SQL server page into the database with the loadpsp command-line utility.

This command loads simple.psp into the schema (example HR schema), replacing the show_employees procedure if it exists.


loadpsp -replace simple.psp

Browser users can run the show_employees procedure through a URL. An HTML page that displays the last and first names of employees in the hr.employees table is returned to the browser through the PL/SQL gateway.

Deploying content through PL/SQL Server Pages has these advantages:

    For developers familiar with PL/SQL, the server pages are the easiest way to create professional web pages that include database-generated content. You can develop web pages as you usually do and then embed PL/SQL code in the HTML
    PL/SQL Server Pages can be more convenient than using the HTP and HTF packages to write out HTML content line by line
    Processing is performed on the database server, the client browser receives a plain HTML page with no special script tags supporting all browsers and browser levels
    Network traffic is efficient because use of PL/SQL Server Pages minimizes the number of database round-trips
    Contents can be written quickly and follow a rapid, iterative development process. Maintaining central control of the software, with only a web browser required on the client system

Prerequisites for Developing and Deploying PL/SQL Server Pages

    To write a PL/SQL server page we need access to a text editor or HTML authoring tool for writing the script. No other development tool is required
    To load a PL/SQL server page we need:

    An account on the database in which to load the server pages.

    Execution rights to the loadpsp command-line utility, which is located in $ORACLE_HOME/bin.

simple.psp

Category: Databases  | Leave a Comment
Monday, August 25th, 2014

Table-Valued Parameters is a new feature introduced in SQL SERVER 2008. In earlier versions of SQL SERVER it was not possible to pass a table variable to a stored procedure as a parameter. But Sql Server 2008 allows us to pass table valued parameter to send multiple rows of data to a stored procedure or a function without creating a temporary table or passing so many parameters.

Many of us may come across a situation wherein we pass a huge comma/semi-colon separated string as parameter to a stored procedure and split that string using some function and insert them into a temp table for further usage in the stored procedure.

We can overcome such hassles by using a table valued parameter. We need to follow the below steps in order to use a table valued parameter:

  • Declare the table and user defined table type
  • Create a stored procedure which accepts user defined table type as a parameter
  • Create a data table or structure the same as your table valued parameter. Remember that all columns of the data table are parallel to the table data type
  • Pass the data to the stored procedure as a SqlParameter and the type of this parameter must be in parallel to table data type

EXAMPLE:

Below contains few sample code snippets which illustrate creation of table valued parameters and pass the same to a stored procedure:

Declare the table and user defined table type

CREATE TYPE dbo.TEMP_ORGANIZATIONS AS TABLE
(
OrganizationId INT NOT NULL,
OrganizationName VARCHAR (MAX),
PRIMARY KEY (OrganizationId)
)

Create a stored procedure which accepts user defined table type as a parameter

CREATE PROC proc_ADD_ORGANIZATION
(@TableParameter AS dbo.TEMP_ORGANIZATIONS READONLY)
AS
BEGIN
INSERT INTO ORGANIZATIONS (OrganizationId , OrganizationName)
SELECT OrganizationId, OrganizationName  FROM @TableParameter
END

Create a data table or structure the same as your table valued parameter. Remember that all columns of the data table are parallel to the table data type:

DataTable CreateTable()
{
var dt = new DataTable();
dt.Columns.Add (“OrganizationId”, typeof(Int32));
dt.Columns.Add (“OrganizationName”, typeof(string));
return dt;
}

Pass the data to the stored procedure as a sql parameter and the type of this parameter must be in parallel to table data type

//Create Table
DataTable dtOrganization = CreateTable();

// Add New Rows to the table
dtOrganization.Rows.Add(1, “Prowareness”);
dtOrganization.Rows.Add(2, ”Airtel”);
dtOrganization.Rows.Add(3, ”Vodafone”);

using (var connection = new SqlConnection(connectionString))
{
connection.Open();
using (var cmd = new SqlCommand(“proc_ADD_ORGANIZATION”, connection))
{
cmd.CommandType = CommandType.StoredProcedure;

//Pass table valued parameter to Store Procedure
SqlParameter sqlParam = cmd.Parameters.AddWithValue(“@TableParameter”, dtOrganization);
sqlParam.SqlDbType = SqlDbType.Structured;
cmd.ExecuteNonQuery();
}
}

Category: .Net, Databases  | Leave a Comment
Monday, August 25th, 2014

This is my second post on full text search. In the first post of the series I discussed on full text search basics and configuration.In this post I’ll be discussing on various kind of full text queries.

For running full text queries you use either full text predicates viz. Contains and FreeText or full text functions viz. ContainsTable and FreeTextTable.

Full text predicates are used in WHERE clause of the query and return true or false whereas Full text functions return Tables and hence are used in FROM clause of the query.

Full text queries queries are quite powerful and can be of following types:

  1. Simple Term : Searching for a particular word or phrase
  2. Prefix Term : searching for prefix word or phrase
  3. Proximity Term: Looking for words or phrases close to each other
  4. Generation Term : Looking for inflectional forms i.e. if  the  word is a verb the search will look for  various tenses of word .E.g  if you search for  “Start”  you will get results for Start ,Started,Starting etc.Similarly if word you are looking for is a Noun you will get singular and plural forms of it.
  5. Thesaurus : Looks for synonyms of a particular word based on thesaurus.E.g if you search for Start, search will look for Start ,Begin etc.
  6. Weighted Term: Looks for various search terms based on the weights associated with them

Now we will see how to use predicates to search in various ways mentioned above.

FreeText and Contains

FreeText is more restrictive predicate which by default does a search based on various forms of a word or phrase (that means it by default includes Inflectional forms as well as thesaurus).

Contains, unlike FreeText, gives you flexibility to do various forms of search separately.

Let’s see some examples below to make it more clear how different searches are done through FreeText and Contains.

CONTAINS

As I mentioned above, Contains provides the flexibility of running various forms of queries separately. Let’s go one by one through few examples of each form to get better understanding.

Simple Term

Return records with exact match for word “Clothe” in column EmployeeJobSector and EmployeeResume


SELECT * FROM EMPLOYEEPROFILE WHERE CONTAINS ((EMPLOYEEJOBSECTOR ,EMPLOYEERESUME), 'CLOTHE')

 Return records with exact match for word “Clothe” in any of the columns in the table EmployeeProfile


SELECT * FROM EMPLOYEEPROFILE WHERE CONTAINS (*, 'CLOTHE')

Prefix Term

Return records with values starting with word “Cloth” in column EmployeeJobSector


SELECT * FROM EMPLOYEEPROFILE WHERE CONTAINS (EMPLOYEEJOBSECTOR, 'CLOTH*')

Proximity Term

Return records where Word “Extensive” and “Experience” are at max 10 letters apart.


SELECT * FROM EMPLOYEEPROFILE WHERE CONTAINS (EMPLOYEERESUME,’NEAR ((EXTENSIVE, EXPERIENCE), 10)')

Return records where Word “Extensive” and “Experience” are at max 10 letters apart and in the same sequence (i.e. Extensive comes before Experience)


SELECT * FROM EMPLOYEEPROFILE WHERE CONTAINS (EMPLOYEERESUME,’NEAR ((EXTENSIVE, EXPERIENCE), 10, TRUE)')

Generation Term

Return records with all the inflectional forms of the word “Start” like Start, Started, and Starting etc.


SELECT * FROM EMPLOYEEPROFILE WHERE CONTAINS (*, ‘FORMSOF (INFLECTIONAL, Start)')

Thesaurus

Return records with all the synonyms of the word “Start” like Start, Begin etc.


SELECT * FROM EMPLOYEEPROFILE WHERE CONTAINS (*, ‘FORMSOF (THESAURUS, Start)')

NOTE: For this query to work you must have a Thesaurus available in you SQL Server data folder <Program Files>\Microsoft SQL Server\MSSQL11.<SQL Instance Name>\MSSQL\FTData.Usually SQL Server does not ship with a default thesaurus but it can be easily downloaded and once put in the appropriate folder queries should work fine.Refer this post for details.

FREETEXT

FreeText predicate, as I mentioned earlier, does not provide much variations and comes by default with inflectional and thesaurus search. Syntax for using FreeText is exactly the same as Contains i.e. Just replace CONTAINS with FREETEXT.

OK. So now you guys must be thinking why we need FREETEXT.

Well from my personal experience one of the case where I found it useful (and I am sure there may be more cases) is when you need to do an Inflectional, Thesaurus and NEAR search together in a single query. Below is an example

Return records with all forms of word “Start” and “Teach” (both inflectional and synonyms) occuring within 10 letter spaces of each other


SELECT * FROM EMPLOYEEPROFILE WHERE FREETEXT (EMPLOYEERESUME,’NEAR ((START, TEACH), 10)')

Above query will search for all the forms of the word like Start, Started, Teach, Taught, Teaching etc.

This kind of search is not simple to do using CONTAINS as combining FORMSOF and NEAR is not permitted in a single query.

 

CONTAINSTABLE and FREETEXTTABLE

Whenever you use full text functions the result table returns you a key column which has the unique key on which full text index is defined and a Rank column which specifies the relative rank of the result.Like predicates you have two  full text functions viz. CONTAINSTABLE (similar to CONTAINS predicate)  and FREETEXTTABLE (similar to FREETEXT predicate).

Again the difference between the two functions is that CONTAINSTABLE allows you to run various types of queries  separately whereas FREETEXTTABLE by default does a search on various forms i.e. it includes inflectional as well as Thesaurus search.

Lets see some examples of using full text functions.

Query to return all rows having words “Communication”,”Chemicals”  or “Jewellers” in either column EmployeeProfile or EmployeeJobSector with highest preference given to “Communication” and lowest to “Jewellers”.


SELECT * FROM
EmployeeProfile EP
INNER JOIN
CONTAINSTABLE(EmployeeProfile, EmployeeJobSector,
'ISABOUT ("Communications" WEIGHT(1),"Chemicals" WEIGHT(0.6)
,"Jewellers" WEIGHT(0.3)) ') AS KEY_TBL
ON EP.EmployeeId = KEY_TBL.[KEY]
ORDER BY RANK

This is an example of weight based ranking.You use ISABOUT and WEIGHT terms to  achieve this kind of ranking.Also value of weight should be between 0.0 and 1.0.

Below is the result of above query.

image

Query to return all rows having inflectional forms of word “Teach” i.e. Teach,Taught , Teaching etc in either column EmployeeProfile or EmployeeJobSector .


SELECT * FROM
EmployeeProfile EP
INNER JOIN
CONTAINSTABLE(EmployeeProfile, EmployeeResume,
'FORMSOF(INFLECTIONAL,teach) ') AS KEY_TBL
ON EP.EmployeeId = KEY_TBL.[KEY]
ORDER BY RANK DESC

Below are the query results.It seems ranking algorithm here gives higher rank to present and past tense.Not sure why but looks like thats how it works.

image

Query to return all rows having word   Extensive and Experience near to each other in column EmployeeProfile .


SELECT * FROM
EmployeeProfile EP
INNER JOIN
CONTAINSTABLE(EmployeeProfile,
'NEAR(extensive,experience)') AS KEY_TBL
ON EP.EmployeeId = KEY_TBL.[KEY]
ORDER BY RANK desc

Notice below that relevance or ranking of results is based on proximity of the words.

image

Finally below is the query using FREETEXTTABLE to show how different forms of word can be searched in the same query.

Query to return all rows having word   Start (and all the derived forms) in column EmployeeProfile .

image

Note the results include both inflectional as well as thesaurus forms.