Archive for the Category ◊ Databases ◊

Sunday, July 20th, 2014

This is the first post in a two part series which I am planning to write  on full text search .In this post I will give a brief introduction on full text search and explain how to configure it in SQL Server 2012.In the next post I’ll discuss on running various kind of full text queries.I will be using SQL commands rather than the graphical interface provided by SQL Server Management studio for configuration.

What is Full Text Search

As per MSDN:

Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase.

Basically there are lot of things which you can do using Full Text Search which are either not possible using normal keyword search or are tedious.Below are couple of scenarios.

  1. Ordering or Ranking of results based on the match relevance
  2. Search in all or multiple columns of table at once
  3. Search different forms of a word based on the language e.g searching all the tenses of the word i.e search for “start” and get all the rows having “start”,”started”,”starting” etc.
  4. Search for word and all its synonyms e.g. search for “start” and get rows having “start”,”begin” etc.
  5. Searching huge data with improved performance due to index support

There are many other things which are possible using combination of different kind of full text queries.

The information mentioned below is primarily applicable to SQL Server 2012 but should work with minor or no changes for SQL Server 2008 as well.

  • Pre-requisites

To use full text search the feature should be installed while installing SQL Server 2012.If you have an existing installation which does not have this feature installed you can run the setup again and choose the option when setup asks you to add features to existing installation.

Full-Text Search Feature Selection

If you are using SQL Server 2012 Express edition (as I am using for this post) then you will need SQL Server 2012 Express edition with Advanced Services to have full text search related features.It can be downloaded from this link.

I am using a sample database (viz. EmployeeProfile) that I created and populated with some dummy values for the purpose of this post.Only requirement here is that the table on which you want to enable full text search should have a unique index defined on a non nullable column (or just define a primary key column).This is called Key Index.

image

  • Create Full Text Catalog

First step is to create a full text catalog.Full text catalog is just a mechanism for organizing full text indexes.One full text catalog usually have many full text indexes but one full text index can only be part of one catalog.

Run below command in SQL Server management studio to create a full text catalog:


CREATE FULLTEXT CATALOG EmployeeProfileCatalog;

You can see this catalog created under storage node of your database.

image

  • Create Full Text Index

Second step is the create a full text index on the columns which you want to enable for full text search.Only one full text index can be defined on  a Table.


CREATE FULLTEXT INDEX ON EmployeeProfile
 ( 
  EmployeeJobSector
     Language 1033,
  EmployeeResume
     Language 1033
 ) 
  KEY INDEX PK_EmployeeProfile 
      ON EmployeeProfileCatalog; 

Language code 1033 in the above command specifies English language.

  • Start Full Population

Next step is to just populate your indexes by a process called Population.Basically you can run the above create index command in a way where it starts the population immediately after the index have been created.This although is discouraged in real life scenarios as population on big tables is a performance intensive task and is usually done during lean hours by DBAs.


	ALTER FULLTEXT INDEX ON EmployeeProfile
   START FULL POPULATION;

That’s it ! Now you are ready to run full text queries on your table. I will be writing a separate post on full text queries but just to give you taste of these queries i will list down some examples of using “Contain” keyword for searching.

Simple search for two keywords in all the columns of the Table.


SELECT * FROM EMPLOYEEPROFILE WHERE CONTAINS(*,'EXTENSIVE AND EXPERIENCE')

Results: 73 Rows

image

 

Search for two keywords near to each other (within 2 character distance) in “EmployeeResume” column


SELECT * FROM [EMPLOYEEPROFILE] 
WHERE CONTAINS(EMPLOYEERESUME,
'NEAR((EXTENSIVE,EXPERIENCE),2)')

Results: 42 Rows

image

Search for different forms of a same keyword in “EmployeeJobSector” column  : Clothes


SELECT * FROM [EMPLOYEEPROFILE] 
WHERE CONTAINS(EMPLOYEEJOBSECTOR, 
'FORMSOF(INFLECTIONAL, CLOTHES)')

Results: Returns both Clothes as well as Clothing

image

Author:
Thursday, July 03rd, 2014

 

 

 

 

sqlmap

http://sqlmap.org/

 

How do I test a log in protected website with sqlmap?

                use the –cookie parameter / or capture the request, pass it on with the -r parameter / or use the –auth-type=ATYPE, –auth-cred=ACRED, and –auth-cert=ACERT parameters

How do I test a website with sqlmap that requires authentication?

                use the –cookie parameter / or capture the request, pass it on with the -r parameter / or use the –auth-type=ATYPE, –auth-cred=ACRED, and –auth-cert=ACERT parameters

How do I test a website with sqlmap that uses JSON data?

                automatically works with JSON

How do I test a website with sqlmap that uses XML data?

                use the custom injection paramter pointer *

How do I test a website with sqlmap that uses SSL?

                use the –force-ssl parameter

How do I tell sqlmap to try harder?

                use the –level and –risk parameters

How do I automate a sql map scan?

                use the –batch and –crawl 3

 

Tuesday, July 01st, 2014

 

Here are some tips that I found useful over the years to optimize my SQL queries. By no means these are the perfect or best solution. There are definitely better ways or solutions to write SQL queries. Just use them as some guidelines next time you write a SQL query or do a code review.

1) Use actual column names instead of doing a SELECT *.

Reason:-

- Using the actual column names gives you more control on the query result.

Example :- At times we do have some master tables having a lot of columns.Say a table has 15-20 columns, now in your select query you plan to use only 2-3 most important or relevant fields. You are using this table as part of sub query. So while processing an addition load is what the database server needs to process.

Today’s dynamic business needs require a more or less frequent changes to the database schema,

Consider this

INSERT INTO TABLEA

SELECT * FROM TABLEB 

 

Now suppose we change (add/remove) the schema of TableB you will end up getting the below error.

“Insert Error: Column name or number of supplied values does not match table definition.”

2) Use HAVING clause only when it is necessary.

- It acts like a filter, which is created only after selecting all rows. So performance takes a hit.

3) Minimize the number of sub-queries in the query whenever possible.

4) Exists and IN operators.

- Generally IN operator has the slowest performance. It might be a better idea to use IN when you have filter criteria in sub query.

- Use EXISTS instead of DISTINCT when using joins which involves tables having one-to-many relationship.

I found a link that gives a detail walk through on the point above.

http://www.gregreda.com/2013/06/03/join-vs-exists-vs-in/

5) Use Union All instead of Union

Reason: – The difference between Union and Union all is that Union all will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.

Hence the server has to do less work. And hence Union All is faster.

6) Not a big fan of Cursors, I try to avoid cursors most of the time

Reason:-

Cursors row by row processing doesn’t work very well in terms of performance. More over cursors are long and hard to maintain.

7) Use of Merge DML statement.

Reason:-

Whenever you plan to do a synchronize operation between two tables, SQL provides Merge Statement. Check http://msdn.microsoft.com/en-in/library/bb510625.aspx for more details.

I find it very useful and efficient for bulk operations.

8) Scalar user-defined functions.

At times we have learnt while programming is to logically group and reuse as much code as possible. Wait wait.. I am not saying its wrong.. read further.

In SQL one such way is scalar user defined function. We at find its easier to hide all the dirty code under a fancy name and make the query look prettier. But there is catch with this, Its heavy in terms of performance.  When used inside a query, scalar functions are evaluated for each row. and when you have bulk data, you start seeing performance issues.

May be in this case a possible better solution may be a table valued function or may be don’t use a function and do the operation in main query.

 

Let me know your thoughts.

Category: Databases  | Leave a Comment
Author:
Sunday, March 30th, 2014

One of the cool feature of SQL/LINQ which makes developer life simple

  •   Row number: Everyone would have encountered situation to filter out records based on one particular column but are not distinct.

In such cases row_number in SQL comes handy to filter out data based on our choice

Syntax : ROW_NUMBER ( )
OVER ( [ PARTITION BY value_expression , ... [ n ] ] order_by_clause )

For ex: Lets say my dataset results are as follows
Name                       Category       MutationDate   ReviewId
Communication    CoreValue      1/1/2012           111
Communication    CoreValue      2/1/2012           111
Communication     CoreValue     3/1/2012          111
Soft Skills                CoreValue     2/1/2012          111

If I need to get top latest result where have to filter out based on mutationdate which is <= 3/1/2012
Name                      Category    MutationDate   ReviewId
Communication    CoreValue    3/1/2012        111
Soft skills               CoreValue      2/1/2012     111

I can achieve this in sql by simply using

select Row_Number() over (Partition by MutationDate order by ReviewID) as rownumber, * from tbl  which results in
rownumber  Name          Category   MutationDate ReviewId
1        Communication     CoreValue  1/1/2012     111
2        Communication     CoreValue  2/1/2012     111
3        Communication     CoreValue  3/1/2012     111
1         Soft Skills                 CoreValue  2/1/2012     111

Next step is filter out based on the rownumber
select * from (select Row_Number() over (Partition by MutationDate order by ReviewID) as rownumber, * from tbl)
where rownumber = 1 that gives me desired result

rownumber Name           Category    MutationDate ReviewId
1         Communication      CoreValue   3/1/2012      111
1         Soft skills                  CoreValue   2/1/2012      111

These above queries are simple if am using database operation in SQl.
The same can be achieved if am using entity framework for which we need to use LINQ instead of sql

Syntax in LINQ

var result =tbl.OrderBy(x => x.ReviewId).GroupBy(x => x.MutationDate)
.Select(g => new {g, count= g.Count()})
.SelectMany(t => t.g.Select(b => b)
.Zip(Enumerable.Range(1,t.count), (j,i) => new {j.Name, j.Category,j.MutationDate,j,ReviewId, rn = i}));

Based on the result , we can filtered out based on the rownumber as
result.Where(x=>x.rn ==1);

Some more cool feature in my next blog….

 

 

Category: Databases  | Leave a Comment
Thursday, February 27th, 2014

We all know ReportViewer Web Server controls display reports that you create locally in Visual Studio as well as remote reports hosted in SQL Server 2008 Reporting Services.

 

imageReport Viewer Flow diagram

 

For example we have a report which takes 120 seconds to get the data and render it.It works fine when we directly view the report  in SQL Server Business Intelligence Development Studio.

image

 

When we browse the same report from ASP.NET Web Application it starts rendering the report by displaying the Loading..  icon in the screen. But after some time it does not show any information has shown below. The problem we can see in the web browser developer tool –>  console screen “The server request timed out”

Screenshot_022714_022514_PM

 

Since ReportViewer Web server control is an ASP.NET AJAX control used to host reports in ASP.NET projects. You can refer ReportViewer Controls (Visual Studio) to read more about this control and how to use them in your projects. We use ReportViewer web server control along with the AJAX Script Manager which is required to host the report viewer web server control.The above error in the browser is caused because of the Script Manager’s AsyncPostbackTimeout. By default it is 90 seconds.

 

To fix, change the AsyncPostBackTimeout  property to 0 or an appropriate value, 0 meaning infinite.

<asp:ScriptManager ID="ScriptManager1" runat="server" AsyncPostBackTimeout="0"></asp:ScriptManager>

 

[Note: This is one way to avoid time out. But not always holds good.We need to tune up the report queries to get data quickly.]

Author:
Thursday, February 06th, 2014

Problem statement:

You are doing a lot of data fix for a corrupted database, but then when you are executing your ‘salvage script’ to restore the lost/corrupted data, amongst the events that get executed during a DML statement execution, triggers are one one of them. Sometimes you may find the triggers execution disturbing, and you would like to disable them. You may never know, the below code snippets might come in handy to enable and disable all triggers in a database.

Solution: [Ctrl+C, Ctrl+V style]

Copy, paste, and run the below.


Enable all triggers in a Microsoft SQL Server database
 

DECLARE @DisableTriggerCommand NVARCHAR(4000)
DECLARE @TriggerTableName NVARCHAR(500)

DECLARE DisableTriggerCursor CURSOR
FOR
(
		SELECT NAME AS TableName
		FROM sysobjects
		WHERE id IN (
				SELECT parent_obj
				FROM sysobjects
				WHERE xtype = 'tr'
				)
		)

OPEN DisableTriggerCursor

FETCH NEXT
FROM DisableTriggerCursor
INTO @TriggerTableName

WHILE @@fetch_status = 0
BEGIN
	SET @DisableTriggerCommand = N'ALTER TABLE ' + @TriggerTableName + ' ENABLE TRIGGER ALL'

	PRINT 'Executing:   ' + @DisableTriggerCommand + CHAR(13)

	EXECUTE sp_executesql @DisableTriggerCommand

	FETCH NEXT
	FROM DisableTriggerCursor
	INTO @TriggerTableName
END

CLOSE DisableTriggerCursor

DEALLOCATE DisableTriggerCursor

 

Disable all triggers in a Microsoft SQL Server database 

DECLARE @DisableTriggerCommand NVARCHAR(4000)
DECLARE @TriggerTableName NVARCHAR(500)

DECLARE DisableTriggerCursor CURSOR
FOR
(
		SELECT NAME AS TableName
		FROM sysobjects
		WHERE id IN (
				SELECT parent_obj
				FROM sysobjects
				WHERE xtype = 'tr'
				)
		)

OPEN DisableTriggerCursor

FETCH NEXT
FROM DisableTriggerCursor
INTO @TriggerTableName

WHILE @@fetch_status = 0
BEGIN
	SET @DisableTriggerCommand = N'ALTER TABLE ' + @TriggerTableName + ' DISABLE TRIGGER ALL'

	PRINT 'Executing:   ' + @DisableTriggerCommand + CHAR(13)

	EXECUTE sp_executesql @DisableTriggerCommand

	FETCH NEXT
	FROM DisableTriggerCursor
	INTO @TriggerTableName
END

CLOSE DisableTriggerCursor

DEALLOCATE DisableTriggerCursor

 

:q

Author:
Wednesday, December 18th, 2013

As we all know, a view is just a virtual table that consists only the rows and columns specified in CREATE VIEW statement. Even though view doesn’t store any data by itself, it’s still possible to use view in an INSERT, UPDATE or in DELETE statement.

The requirement to create up-datable view is as follows:
1. Select statement can’t include a DISTINCT clause.
2. Select statement can’t include a aggregate clause.
3. Select statement can’t include a GROUP BY or HAVING clause.
4. View can’t include the UNION ,INTERSECT operator.

Below is an example of up-datable view:

CREATE or REPLACE VIEW balance_due_view AS
SELECT vendor_name, invoice_number,
invoice_total, payment_total, credit_total
invoice_total – payment_total – credit_total AS balance_due
FROM vendors JOIN invoices ON vendors.vendor_id = invoice.vendor_id
WHERE invoice_total – payment_total – credit_total > 0;

AN UPDATE statement that uses the view to update the data

UPDATE balance_due_view
SET invoice_total = 0
WHERE invoice_number = 100;

If we want to make the above view as read only, then we can make use of WITH READ ONLY clause, which is as shown below.

CREATE or REPLACE VIEW balance_due_view AS
SELECT vendor_name, invoice_number,
invoice_total, payment_total, credit_total.
invoice_total – payment_total – credit_total AS balance_due
FROM vendors JOIN invoices ON vendors.vendor_id = invoice.vendor_id
WHERE invoice_total – payment_total – credit_total > 0
WITH READ ONLY
;

Similarly we can do delete, insert operations.

Category: Databases | Tags:  | Leave a Comment
Author:
Monday, December 02nd, 2013

We all need our applications to be fast and more responsive and there are several factors that affect the performance of it. And this article will explain how to tune a mongo-DB application using indexes.

Let’s create a collection called users and insert 10 million records into it. I am making use of JavaScript to insert data into collection, which is as follows.

for (i=0; i<10000000; i++) {
db.users.insert
(
{“i” : i,
“username” : “user”+i,
“age” : Math.floor(Math.random()*120), “created” : new Date()
}
);
}

 

Now, execute a simple find query and see how much time it takes to retrieve one record from the collection. To examine, append explain() at the end of query. For time being, we can assume explain() is a kind of diagnostic tool which provides the  information about how a query got executed. The output of explain () looks something like this:

Before_indexing

Here, ‘nscannedObjects‘ indicates the number of documents got scanned and ‘millis‘ indicates how long a query was executed for (i.e. time taken to get response from server).

Now, create an index on the username field:

>db.users.ensureIndex({“username” : 1})

Let us run the same query which we executed before creating an index.

After_Indexing

The number of documents scanned is now ’1′ and time taken is ’21′ milliseconds which is 174 times faster than the previous once.

To know more about how indexing works, you can check the below link:

http://docs.mongodb.org/manual/core/indexes-introduction/

Have fun!
-Rakesh SS

Author:
Thursday, July 18th, 2013

Schema Design in mongoDB:

In traditional database, we normally link two entities through joins, but in mongoDB we usually
don’t use joins, rather we embed one entity into another. Let me explain through an example:

Let us consider an employee document, which has fields like employee Id, name , date of birth , joining date etc. Suppose we need a report which indicates the skill set rating of an employee, we can design employee schema as shown in below diagram.

Schema Design

Employee document(collection):

In the above example, skill_set_rating is a sub document, which we are embedding into employee document. We can also achieve the above said by having separate collection like employee, skill_set etc which is similar to RDBMS approach. We will follow the second approach only if the document size is exceeding 16 MB else we always follow the first approach (where the read operation is much faster).

Finally, schema design in mongoDb depends on how we need to show the data in our UI, reports etc. Based on our requirement we can make use key value pair, arrays and sub-documents accordingly.

Happy mongoing :)

Author:
Thursday, June 20th, 2013

Cursor is a control structure that enables traversal over the records in a database. Cursors facilitate subsequent processing in conjunction with the traversal, such as retrieval, addition and removal of database records. – Wikipedia

When we execute find() method (which is similar to ‘select’ statement in sql), a cursor is constructed and returned to the mongo shell. This cursor can be assigned to a variable which allows to access and manipulate the cursor as shown in the below code snippet.

code1

cursor.hasNext() returns true, if next document is present in the cursor and cursor.next() returns next document the cursor is pointing to. Since mongo shell is an interactive javascript interpreter we can iterate the cursor like how we do in most of the programming language. We can also sort the documents in ascending / descending order using cursor sort() method as shown below.

code2

code3

We can use limit() method on a cursor to specify the maximum number of documents the cursor should return.

code4

At this point of time, we know how cursors and its methods work.  Let’s co-relate how we can use all this concepts and write queries which we can apply to real time scenarios.

Scenario1:  To get top three students from class 5th who have scored maximum marks.
Solution: db.student.find({ class: 5}).sort({ score : -1}).limit(3)

Scenario2:  To get top three students from class 5th who have scored maximum marks (list the records apart from topper)
Solution: db.student.find({ class: 5}).sort({ score : -1}).limit(3).skip(1)

Scenario 3: We can use find, limit and skip methods to fetch result set for implementing pagination.
Solution:
function printStudents(pageNumber, nPerPage) {
print(“Page: ” + pageNumber);
db.students.find().skip((pageNumber-1)*nPerPage).limit(nPerPage).forEach( function(student) { print(student.name + “<p>”); } );
}

We have to be little cautious when using skip method, since it is very expensive and more IO bound operation. Skip method requires the server to walk from the beginning of the collection to get the offset or skip position before returning the result.

Happy Mongoing :)