Showing posts with label SQL Server. Show all posts
Showing posts with label SQL Server. Show all posts

Friday 10 May 2019

Strange behavior on JSON_VALUE when table contains blank and non-blank values (JSON text is not properly formatted. Unexpected character '.' is found at position 0.)


Few days back we had a requirement to search whether a certain value exists in one of the table fields where the values are stored as JSON strings. The default constructor has been set up not to allow any NULLs but in case there’s no value, the default value has been setup as an empty string.
So basically the query would be similar to something shown below


SELECT 
 'x'
FROM
 [schema].[TableName] AS Src
WHERE
 JSON_VALUE(Src.ColumnName,'$.Root.AttributeName') LIKE 'SearchValue%'

How ever when we ran this query we got the following error

Msg 13609, Level 16, State 2, Line 36
JSON text is not properly formatted. Unexpected character '.' is found at position 0.


Initially we thought that we have typed the attribute incorrectly since it’s case sensitive. But in this case it was correct.

We investigated further and found out few things. But prior explaining them we will replicate this issue. For this I will create one simple table and insert three records.


--== Create a table ==--
CREATE TABLE dbo.Employee_Information (
 Id    INT
 ,FirstName  NVARCHAR(100)
 ,LastName  NVARCHAR(100)
 ,JsonData  NVARCHAR(MAX)
)

--== Insert few rows ==--
INSERT INTO dbo.Employee_Information (
 Id
 ,FirstName
 ,LastName
 ,JsonData
)
VALUES
(1,'John','Doe','{"Employee":{"Id":1,"FirstName":"John","LastName":"Doe"}}')
,(2,'Jane','Doe','{"Employee":{"Id":2,"FirstName":"Jane","LastName":"Doe"}}')
,(3,'Luke','Skywalker','')



Now we will use the following query to find any records which the LastName is like ‘Doe’.


SELECT 
 Id
FROM
 dbo.Employee_Information AS E
WHERE
 JSON_VALUE(E.JsonData,'$.Employee.LastName') LIKE 'Doe%'




Msg 13609, Level 16, State 2, Line 36
JSON text is not properly formatted. Unexpected character '.' is found at position 0.


**Note : The query will return results till the error occurs. Hence you will see some rows in your result tab in SSMS.

These are the observations we made during our investigation

Observation 01

If you query the table with a predicate and if that predicate doesn’t include any rows with blank values in the JSON (it’s an NVARCHAR column) field the query will executed successfully.


--== Success ==--
SELECT 
 Id
FROM
 dbo.Employee_Information AS E
WHERE
 JSON_VALUE(E.JsonData,'$.Employee.LastName') LIKE 'Doe%'
 AND Id IN (1,2)

--== Success ==--
SELECT 
 Id
FROM
 dbo.Employee_Information AS E
WHERE
 JSON_VALUE(E.JsonData,'$.Employee.LastName') LIKE 'Doe%'
 AND Id <> 3

--== Fail ==--
SELECT 
 Id
FROM
 dbo.Employee_Information AS E
WHERE
 JSON_VALUE(E.JsonData,'$.Employee.LastName') LIKE 'Doe%'
 AND Id = 3


Observation 02

Even you use a filter to fetch only rows containing a valid JSON the execution will be successful.


--== Success ==--
SELECT 
 Id
FROM
 dbo.Employee_Information AS E
WHERE
 ISJSON(E.JsonData) > 0
 AND JSON_VALUE(E.JsonData,'$.Employee.LastName') LIKE 'Doe%'


Observation 03

Even you use a filter to fetch only rows containing a non-blank value in the JSON field, it will fail.


--== Fail ==--
SELECT 
 Id
FROM
 dbo.Employee_Information AS E
WHERE
 E.JsonData <> ''
 AND JSON_VALUE(E.JsonData,'$.Employee.LastName') LIKE 'Doe%'


Observation 04

If you remove records and only keep either one type of rows (either only blank rows or only non-blank) the query will be executed successfully.


TRUNCATE TABLE dbo.Employee_Information
INSERT INTO dbo.Employee_Information (
 Id
 ,FirstName
 ,LastName
 ,JsonData
)
VALUES
(1,'John','Doe','{"Employee":{"Id":1,"FirstName":"John","LastName":"Doe"}}')
,(2,'Jane','Doe','{"Employee":{"Id":2,"FirstName":"Jane","LastName":"Doe"}}')

--== Success ==--
SELECT 
 Id
FROM
 dbo.Employee_Information AS E
WHERE
 JSON_VALUE(E.JsonData,'$.Employee.LastName') LIKE 'Doe%'


Observation 05

If you have rows only containing blank values in the JSON field the query will fail.


TRUNCATE TABLE dbo.Employee_Information
INSERT INTO dbo.Employee_Information (
 Id
 ,FirstName
 ,LastName
 ,JsonData
)
VALUES
(1,'John','Doe','')
,(2,'Jane','Doe','')


--== Fail ==--
SELECT 
 Id
FROM
 dbo.Employee_Information AS E
WHERE
 JSON_VALUE(E.JsonData,'$.Employee.LastName') LIKE 'Doe%'

Hope this might help you if you encounter this strange behavior during your development.

Note : All the above queries are executed under the following SQL Server Version (SELECT @@VERSION)




Microsoft SQL Server 2016 (SP1) (KB3182545) - 13.0.4001.0 (X64)
     Oct 28 2016 18:17:30
     Copyright (c) Microsoft Corporation
     Developer Edition (64-bit) on Windows Server 2012 Standard 6.2 (Build 9200: ) (Hypervisor)








Monday 10 September 2018

Applying database principal through out the server (for all databases) for a particular user

Ever come across a requirement which you required to give db_datareader access to a specific user across all the databases on a particular SQL Server. The task is simple as long as you don’t have many databases in the same SQL Server. How ever if the number of databases are very high this can be a very time consuming one.

This can be done either using the GUI (SSMS) or using a T-SQL script. We will consider both options.

Using SQL Server Management Studio

In order to illustrate this we will create a SQL Login ‘db_user_read_only’ with ‘public’ server role and on the user mapping, we will apply the db_datareader principal.

image

image

image

Like mentioned it would be easy to use the GUI when you have less number of databases. But if the SQL Server contains lots of databases this will be a very time consuming job. Then it would be very handy to use the latter approach.

Using T-SQL

You can use the following script to apply the db_datareader principal across all the databases on a particular server.


DECLARE 
	@Sql AS NVARCHAR(MAX)
	,@UserId AS VARCHAR(MAX) = 'YourLoginId'
SET @Sql = CONCAT('
USE [?];
IF EXISTS (SELECT 0 FROM sys.database_principals AS DP WHERE name = ''',@UserId,''')
BEGIN
	EXEC sys.sp_change_users_login ''update_one'',''',@UserId,''',''',@UserId,'''
END
ELSE
	
	CREATE USER [',@UserId,'] FOR LOGIN [',@UserId,']
	ALTER ROLE [db_datareader] ADD MEMBER [',@UserId,']
')
EXEC sys.sp_MSforeachdb 
	@command1 = @Sql
	,@replacechar = '?'

Please note the following:

  • On the above code I haven’t excluded the system databases.
  • If the login exists on the database it will map the database user using sp_change_users_login

Hope this might be very useful to you.

Friday 6 July 2018

Replacing sp_depends with sys.dm_sql_referencing_entities and sys.dm_sql_referenced_entities

sp_depends have been one of the most used system stored procedures in SQL Server. Infact many of us still use that even though Microsoft had annouced that it will be removed from the future releases.
https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-depends-transact-sql?view=sql-server-2017
image
Alternatively Microsoft has provided two dynamic management views (these have been introduced with SQL Server 2008) in order to get similar kind of information.
You can get further details on the aforementioned view by visiting the link. (links are embedded into the view name)
However if you have used sp_depends you might have already faced the issue that the results which is being returned from this stored procedure is not very accurate (most of the time it seems fine)
Otherday I was going through these two view in order to create an sp which is similar to sp_depends and thought of sharing the query so that it can be useful to anyone who depends on this sp.


DECLARE
 @objname   AS NVARCHAR(100) = 'Website.SearchForPeople'
 ,@objclass   AS NVARCHAR (60) = 'OBJECT'


  SELECT 
   CONCAT(sch.[name],'.',Obj.[name]) AS [name]
   ,(CASE Obj.type
    WHEN 'C'  THEN 'CHECK constraint'
    WHEN 'D'  THEN 'DEFAULT (constraint or stand-alone)'
    WHEN 'F'  THEN 'FOREIGN KEY constraint'
    WHEN 'PK' THEN 'PRIMARY KEY constraint'
    WHEN 'R'  THEN 'Rule (old-style, stand-alone)'
    WHEN 'TA' THEN 'Assembly (CLR-integration) trigger'
    WHEN 'TR' THEN 'SQL trigger'
    WHEN 'UQ' THEN 'UNIQUE constraint'
    WHEN 'AF' THEN 'Aggregate function (CLR)'
    WHEN 'C' THEN 'CHECK constraint'
    WHEN 'D' THEN 'DEFAULT (constraint or stand-alone)'
    WHEN 'F' THEN 'FOREIGN KEY constraint'
    WHEN 'FN' THEN 'SQL scalar function'
    WHEN 'FS' THEN 'Assembly (CLR) scalar-function'
    WHEN 'FT' THEN 'Assembly (CLR) table-valued function'
    WHEN 'IF' THEN 'SQL inline table-valued function'
    WHEN 'IT' THEN 'Internal table'
    WHEN 'P' THEN 'SQL Stored Procedure'
    WHEN 'PC' THEN 'Assembly (CLR) stored-procedure'
    WHEN 'PG' THEN 'Plan guide'
    WHEN 'PK' THEN 'PRIMARY KEY constraint'
    WHEN 'R' THEN 'Rule (old-style, stand-alone)'
    WHEN 'RF' THEN 'Replication-filter-procedure'
    WHEN 'S' THEN 'System base TABLE'
    WHEN 'SN' THEN 'Synonym'
    WHEN 'SO' THEN 'Sequence OBJECT'
    WHEN 'U' THEN 'Table (user-defined)'
    WHEN 'V' THEN 'VIEW'
    WHEN 'SQ' THEN 'Service queue'
    WHEN 'TA' THEN 'Assembly (CLR) DML trigger'
    WHEN 'TF' THEN 'SQL table-valued-function'
    WHEN 'TR' THEN 'SQL DML trigger'
    WHEN 'TT' THEN 'Table type'
    WHEN 'UQ' THEN 'UNIQUE CONSTRAINT'
    WHEN 'X'  THEN 'Extended stored procedure'
    ELSE 'Undefined'
   END) AS [type]
   ,Obj.create_date
   ,Obj.modify_date
   ,src.referenced_minor_name AS [column]
   ,IIF(src.is_selected   = 1,'yes','no') AS is_selected
   ,IIF(src.is_updated    = 1,'yes','no') AS is_updated
   ,IIF(src.is_select_all = 1,'yes','no') AS is_select_all
   ,IIF(src.is_insert_all = 1,'yes','no') AS is_insert_all
  FROM 
   sys.dm_sql_referenced_entities (@objname,@objclass) AS src
   JOIN sys.objects AS Obj
    ON src.referenced_id = Obj.[object_id]
   JOIN sys.schemas AS Sch
    ON Sch.[schema_id] = Obj.[schema_id]
  WHERE 1=1
  
  SELECT 
   CONCAT(Src.referencing_schema_name,'.',Src.referencing_entity_name) AS [name]
   ,(CASE Obj.type
    WHEN 'C'  THEN 'CHECK constraint'
    WHEN 'D'  THEN 'DEFAULT (constraint or stand-alone)'
    WHEN 'F'  THEN 'FOREIGN KEY constraint'
    WHEN 'PK' THEN 'PRIMARY KEY constraint'
    WHEN 'R'  THEN 'Rule (old-style, stand-alone)'
    WHEN 'TA' THEN 'Assembly (CLR-integration) trigger'
    WHEN 'TR' THEN 'SQL trigger'
    WHEN 'UQ' THEN 'UNIQUE constraint'
    WHEN 'AF' THEN 'Aggregate function (CLR)'
    WHEN 'C' THEN 'CHECK constraint'
    WHEN 'D' THEN 'DEFAULT (constraint or stand-alone)'
    WHEN 'F' THEN 'FOREIGN KEY constraint'
    WHEN 'FN' THEN 'SQL scalar function'
    WHEN 'FS' THEN 'Assembly (CLR) scalar-function'
    WHEN 'FT' THEN 'Assembly (CLR) table-valued function'
    WHEN 'IF' THEN 'SQL inline table-valued function'
    WHEN 'IT' THEN 'Internal table'
    WHEN 'P' THEN 'SQL Stored Procedure'
    WHEN 'PC' THEN 'Assembly (CLR) stored-procedure'
    WHEN 'PG' THEN 'Plan guide'
    WHEN 'PK' THEN 'PRIMARY KEY constraint'
    WHEN 'R' THEN 'Rule (old-style, stand-alone)'
    WHEN 'RF' THEN 'Replication-filter-procedure'
    WHEN 'S' THEN 'System base TABLE'
    WHEN 'SN' THEN 'Synonym'
    WHEN 'SO' THEN 'Sequence OBJECT'
    WHEN 'U' THEN 'Table (user-defined)'
    WHEN 'V' THEN 'VIEW'
    WHEN 'SQ' THEN 'Service queue'
    WHEN 'TA' THEN 'Assembly (CLR) DML trigger'
    WHEN 'TF' THEN 'SQL table-valued-function'
    WHEN 'TR' THEN 'SQL DML trigger'
    WHEN 'TT' THEN 'Table type'
    WHEN 'UQ' THEN 'UNIQUE CONSTRAINT'
    WHEN 'X'  THEN 'Extended stored procedure'
    ELSE 'Undefined'
   END) AS [type]
   ,Obj.create_date
   ,Obj.modify_date
  FROM 
   sys.dm_sql_referencing_entities (@objname,@objclass) AS Src
   JOIN sys.objects AS Obj
    ON Obj.[object_id] = Src.referencing_id 
I have even compiled a stored procedure using this syntax and it can be found on the following reporsitory: https://github.com/manjukefernando/sp_depends_v2

Wednesday 30 May 2018

Computed columns in SQL Server

Computed columns are type of columns which the values are derived based on one or more other columns. Hence the data type on the computed column depends on the result of the derived column values.
Computed columns is a feature which has been there in SQL Server since version 2000. But in my experience I feel that it has been a feature which's been used less compared to many other features available, and during discussions and interviews this is something which most developers slips or fails to answer.

Why do we need computed columns ?

First we will consider a case where we need to store details on a table without the usage of computed columns.
Consider we have a table which contains employee details. We have two columns to store employee’s first and last names. But we also required to have a column which we need to store their full name as well by concatenating the first and last names. So the correct way is to have the third column which contains the full name and the data needs to be inserted during the employee record is created and it should be maintained in the case where the details are updated as well. Otherwise the data integrity will be lost. (One might debate that the full name can be built from the business logic code using the first and last names. But for the illustration purpose we would consider that we are maintaining it using SQL Server)

CREATE TABLE dbo.Employee(
    Id     INT 
    ,FirstName    VARCHAR(30)
    ,LastName    VARCHAR(30)
    ,FullName    VARCHAR(61)
)
How ever we could achieve the same with the use of a computed column and with a less effort compared to the first approach.

CREATE TABLE dbo.Employee(
    Id     INT 
    ,FirstName    VARCHAR(30)
    ,LastName    VARCHAR(30)
    ,FullName AS CONCAT(FirstName,' ',LastName)
)

Let’s insert few records to the table which we created now.

INSERT INTO dbo.Employee(Id, FirstName, LastName) 
VALUES (1,’John’,’Doe'),(2,’Jane’,’Doe')

image

PERSISTED, DETERMINISTIC or NON-DETERMINISTIC ?

The values reflected on computed column can be either deterministic or persisted.
When the values are deterministic or non-deterministic the value in the column will not be saved on to the table physically. Instead it always calculated during the query execution. Hence the value could differ based on the functions you use in the formula. E.g: If you use GETDATE() in the calculated column, it will always return a different value during each execution.

CREATE TABLE dbo.Employee2(
    Id     INT 
    ,FirstName    VARCHAR(30)
    ,LastName    VARCHAR(30)
    ,CreatedDate AS GETDATE()
)

INSERT INTO dbo.Employee2(Id, FirstName, LastName) VALUES 
    (1,'John','Doe') 


And when queried the calculated column returns different values as shown below.

image
**Note: The above mentioned can be achieved using a default constraint as well. I have used that example on strictly illustration basis.
You can further read on deterministic and non-deterministic function on the following Microsoft documentation.
https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-2017
Computed column values can be persisted by adding the keyword PERSISTED when the column is created using T-SQL or by the table designer in SSMS.
We will drop ‘FullName’ column and recreate the column.

ALTER TABLE dbo.Employee DROP COLUMN FullName;
ALTER TABLE dbo.Employee 
 ADD FullName AS CONCAT(FirstName,' ',LastName) PERSISTED;
**Note: If you try to drop the ‘CreatedDate’ column on Employee2 and try to create it as PERSISTED, it will throw an error. Because computed columns can only be persisted when it’s deterministic.
Msg 4936, Level 16, State 1, Line 45
Computed column 'CreatedDate' in table 'Employee2' cannot be persisted because the column is non-deterministic.

Now when the expression is evaluated during the execution, the ‘FullName’ will be saved into the table.
The data is read-only to the developer and it’s maintained by the engine. When the data is changed on the columns which was used in the formula, the computed values will be changed.




Tuesday 20 March 2018

Data Encryption in SQL Server using T-SQL Functions (ENCRYPTBYPASSPHRASE, DECRYPTBYPASSPHRASE & HASHBYTES)

Decade ago data was just an entity which helped business to operate smoothly. By then data was considered as some sort of business related information just stored in a database, which can be retrieved based on the demand/requirement as per the demand. E.g: a bunch of products, transactions such as invoices, receipts etc. or customer details.

But today data has become an important entity, which drives business towards success. In today’s fast-moving world, companies who owned data and does analytics has become the most successful companies.

However one of the major concerns we have today is how to protect these data. Especially the sensitive ones. Since more data is being exposed to the cloud, it’s essential to protect it from going to the wrong hands and it has become a major problem since hackers nowadays are well equipped and are always on the look for stealing this valuable information whenever possible, since it’ll be a valuable asset in the open market.

But protecting the data from unauthorized access is a must. Failing to do so can have unexpected consequences. Entire business could get wiped out of the business due to this. Hence enterprises should seriously consider protecting their data and we will discuss how we can achieve this in SQL Server through data encryption.

Ways of Data Encryption in SQL Server

There are few ways of encrypting data in SQL Server. We will discuss the advantages and disadvantages of each method.

SQL Server provides following methods to encrypt data:

  • T-SQL Functions
  • Using Symmetric Keys**
  • Using Asymmetric Keys**
  • Using Certificates**
  • Transparent Data Encryption**

**Note : In this article I only plan to explain encryption/decryption functionality using T-SQL. I will talk about other methods which is mentioned about in future articles.

Using T-SQL Functions

Encrypting data using ENCRYPTBYPASSPHRASE 

Encryption is done using T-SQL function ENCRYPTBYPASSPHRASE.

ENCRYPTBYPASSPHRASE(passphrase,text_to_encrypt)

The first parameter is the passphrase which can be any text or a variable of type NVARCHAR, CHAR, VARCHAR, BINARY, VARBINARY, or NCHAR. The function uses this passphrase to generate a symmetric key.

For the illustration purpose we will create a table which to hold employee details

CREATE TABLE dbo.Employee(
    Id   			 INT
    ,EmpName   	 VARCHAR(100)
    ,EmpDOB   		 SMALLDATETIME
    ,SSN   		 VARBINARY(128)
)

This example is to demonstrate the data encryption during INSERT DML Statement

INSERT INTO dbo.Employee(
	Id
	,EmpName
	,EmpDOB
	,SSN
)
VALUES(
	1
	,'Luke'
	,'01-June-1980'
	,ENCRYPTBYPASSPHRASE('Pa$$W0rd4EnCRyPt10n','111-22-3333')
) 

image

Further details can be found in the Microsoft Documentation:
https://docs.microsoft.com/en-us/sql/t-sql/functions/encryptbypassphrase-transact-sql


Decrypting data using T-SQL function DECRYPTBYPASSPHRASE 

Will take the same details which we inserted during the first case. The encrypted data can be decrypted using SQL function DECRYPTBYPASSPHRASE. If any attempt has been made to decrypt the data without using DECRYPTBYPASSPHRASE nor providing the proper details, it will fail the operation.

Without Decryption

SELECT 
	Id,EmpName,EmpDOB,CONVERT(VARCHAR(128),SSN) AS SSN
FROM 
	dbo.Employee
WHERE 
	Id = 1

image


With Decryption (Incorrect Pass-phrase)

SELECT 
	Id
	,EmpName
	,EmpDOB
	,DECRYPTBYPASSPHRASE('IncorrectPassword',SSN ) AS SSN 
FROM 
	dbo.Employee
WHERE 
	Id = 1

image

But providing the correct pass-phrase will return the correct details

SELECT 
	Id
	,EmpName
	,EmpDOB
	,CONVERT(VARCHAR(128),DECRYPTBYPASSPHRASE('Pa$$W0rd4EnCRyPt10n',SSN )) AS SSN
FROM 
	dbo.Employee
WHERE 
	Id = 1

image

However there could be a requirement which you need to protect your data, not from stealing, but from getting updated with someone else’s.

One classic example is a login table. Suppose we have a table which stores login credentials, which is having the following structure.

*Note: In real world cases, usually it’s more secure if you hash passwords rather than encrypting them. But I am using encryption for illustration purpose.

So if a person has access to update the details on the password column, he/she can easily replace the contents with their own and log on using that. This can be stopped by providing two additional values when details are inserted to the table using ENCRYPTPASSPHRASE.

CREATE TABLE dbo.LoginCredentails(
	UserId		INT
	,UserName	VARCHAR(20)
	,Pwd		VARBINARY(128)
)

We will insert two records to the above created table.

INSERT INTO dbo.LoginCredentails(
	UserId
	,UserName
	,Pwd
)
VALUES
	(1001,'luke.skywalker',ENCRYPTBYPASSPHRASE('Pa$$W0rd4EnCRyPt10n','force be with you',1,CAST(1001 AS sysname)))
	,(1002,'darth.vader',ENCRYPTBYPASSPHRASE('Pa$$W0rd4EnCRyPt10n','i am your father',1,CAST(1002 AS sysname)))

Please note that unlike the previous example, we are now providing two additional values to the ENCRYPTBYPASSPHRASE function. The first values is 1, which indicates whether whether an authenticator will be encrypted together with the password. If the value is 1 and authenticator will be added. The second value is the data which from which to derive an authenticator. In this example we will use a value similar to a user id, so that when the value is decrypted, we could use the same value.

Following is a function to fetch the decrypted password based on the UserId. Assume we will be using this when validating the credential prior login.

CREATE FUNCTION  Fn_GetUserPwdById(@UserId AS INT)
RETURNS VARCHAR(50)
AS
BEGIN	

	DECLARE @Pwd AS VARCHAR(50)
	SELECT
		@Pwd = CONVERT(VARCHAR(50),DECRYPTBYPASSPHRASE('Pa$$W0rd4EnCRyPt10n',LC.Pwd,1,CAST(LC.UserId AS sysname))) 
	FROM
		dbo.LoginCredentails AS LC
	WHERE
		LC.UserId = @UserId

	RETURN @Pwd
END

Using the aforementioned function we will retrieve the details.

SELECT 
	UserId
	,UserName
	,dbo.Fn_GetUserPwdById(UserId) AS Pwd 
FROM
	dbo.LoginCredentails




image

But querying the data simply will get you the binary string of the encrypted value.

SELECT 
	UserId
	,UserName
	,Pwd 
FROM
	dbo.LoginCredentails


Suppose if a person has enough privileges to do an update the password with a known one (from an existing user) it’ll allow him/her to login to the system impersonating any user.

UPDATE LC SET LC.Pwd = (
	SELECT LC2.Pwd FROM dbo.LoginCredentails AS LC2 
	WHERE LC2.UserName = 'luke.skywalker'
) 
FROM dbo.LoginCredentails AS LC
WHERE
LC.UserName = 'darth.vader'




image

But if when the same function is used for decryption, it will return NULL for the updated record, preventing the login to be invalid if it’s replaced using an existing one.

image


Hashing data using HASBYTES

Apart from the above mentioned function, there’s another function which can be used to hash data. Unlike encrypting, there’s no way you can reverse the hashed data and see the raw details.

Syntax:

HASHBYTES ( 'algorithm', { @input | 'input' } )  
/*
algorithm::= MD2 | MD4 | MD5 | SHA | SHA1 | SHA2_256 | SHA2_512   
*/

There are two parameters which you require to provide. The first parameter is the algorithm which should be used for hashing. The hashing algorithm can be any of the following:

      • MD2
  • MD4
  • MD5
  • SHA
  • SHA1
  • SHA2_256
  • SHA2_512

The second parameter is the input, which needs to be hashed. This can be either a character or binary string.

The return value is VARBINARY(n). n = maximum 8000 bytes.

Example:

DECLARE 
	@TextToHash AS NVARCHAR(1000) = N'My Secret Message'

SELECT HASHBYTES('SHA1',@TextToHash) AS HashedData


image


Further details can be found in the Microsoft Documentation:
https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql


Hope this might be useful to you and please feel free to comment your ideas.

Saturday 10 March 2018

Strange behaviour converting NVARCHAR(MAX) to BINARY

Few days back I was writing a CLR function to be used for hashing string values. The only option was the CLR functions since T-SQL doesn’t have any functionality to convert a string to hashed value using a key. Using the HASHBYTES function you can only provide the algorithm.

DECLARE @Data NVARCHAR(4000);  
SET @Data = CONVERT(NVARCHAR(4000),'My Secret Message');  
SELECT HASHBYTES('SHA1', @Data);  

I have written the CLR function to achieve the requirement, but during testing the validation was failing and when I go through the code I couldn’t find any issue in the function as well. But inspecting carefully I noticed that when a variable type NVARCHAR(n) and a variable type of NVARCHAR(MAX) gives different results when it’s converted to Binary. Which was the root cause for the issue I was facing.


DECLARE 
	@Data1	AS NVARCHAR(MAX) = '1111'
	,@Data2	AS NVARCHAR(10) = '1111'

SELECT 
	CAST(@Data1 AS BINARY(30)) AS ValueMax
SELECT 
	CAST(@Data2 AS BINARY(30)) AS ValueN


image_thumb1

As you can see the above example the zero bytes are represented differently for NVARCHAR(MAX) when it’s converted to BINARY.

I do not have any explanation for this. I am sharing the information in case anyone come across this issue. Please feel free to comment.

Sunday 3 December 2017

Behaviour of IDENTITY Columns and SEQUENCES with TRANSACTIONS

Few days back, I was caught in a discussion with couple of my colleagues, regarding a problem they are facing with an IDENTITY column.

The issue was that when a transaction is rolled back the identity seed isn’t rolling back as expected. This was causing the business application to loose the id sequence.

There is no fix or a workaround for this. All that I could provide was an explanation.

I will illustrate the issue and an explanation why it’s happening.

Behaviour of IDENTITY Columns

We will create the following table to hold employee details.

CREATE TABLE dbo.EmployeeInfo(
	Id			INT IDENTITY(1,1) NOT NULL,
	EmpName		VARCHAR(100) NOT NULL
)


Now we will insert few records to the table in the following manner.

  1. Without a transaction
  2. With a transaction. But we will rollback the transaction.
  3. With a transaction. But we will commit it.


INSERT INTO dbo.EmployeeInfo (EmpName)
VALUES('John')

BEGIN TRAN
	INSERT INTO dbo.EmployeeInfo (EmpName)
	VALUES('Jane')
ROLLBACK

INSERT INTO dbo.EmployeeInfo (EmpName)
VALUES('James')

SELECT 
	EI.Id
	,EI.EmpName 
FROM
	dbo.EmployeeInfo AS EI


And when checked, you could see the following results.

image

Usually the expectation is to see the employee “James” with an Id of 2.

What you should understand here is that this isn’t a flaw or a bug. This is the exact intended behaviour and it has been explained in the following MSDN article.

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property

image


Behaviour of SEQUENCES

SEQUENCEs were introduced in SQL Server 2012. The purpose of the SEQUENCE objects were to aid in handling the auto increment numbers, in case you prefer to handle the sequence without using an IDENTITY column.

First we will create a sequence object. The minimum syntax required to create a sequence object is a name and the data type. Additionally you can mention many other attributes like starting index, increment seed etc.

CREATE SEQUENCE dbo.TempNumberSequence AS INT

Further details regarding other options can be found on the following URL:

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql


Now we will create a similar table like we created in the previous example, but without an IDENTITY column.

CREATE TABLE dbo.EmployeeInfoSeq(
	Id			INT 
	,EmpName	VARCHAR(100) NOT NULL
)

We will insert 3 records in the same way like we did in the previous example.

DECLARE @NextSeq AS INT
SELECT @NextSeq = NEXT VALUE FOR dbo.TempNumberSequence
INSERT INTO dbo.EmployeeInfoSeq (
	Id
	,EmpName
)
VALUES (
	@NextSeq
	,'John'
)
GO

DECLARE @NextSeq AS INT
SELECT @NextSeq = NEXT VALUE FOR dbo.TempNumberSequence
BEGIN TRAN
	INSERT INTO dbo.EmployeeInfoSeq (
		Id
		,EmpName
	)
	VALUES (
		@NextSeq
		,'Jane'
	)
ROLLBACK
GO


DECLARE @NextSeq AS INT
SELECT @NextSeq = NEXT VALUE FOR dbo.TempNumberSequence
INSERT INTO dbo.EmployeeInfoSeq (
	Id
	,EmpName
)
VALUES (
	@NextSeq
	,'James'
)
GO

Afterwards if you check, you will see the following results.

image

Hope this will help you in you day to day development work.

Sunday 9 October 2016

Understanding JOINs in SQL Server

During my work I get the chance reviewing lots of T-SQL Procedures and Views and I often see that the SQL joins are mis-used in them. When I enquire the developers regarding this, it’s evident that most of the time it has been the case that they don’t have the proper understanding what each JOIN exactly does or how it behaves, ultimately causing the SQL Procedure or the View to return an unexpected resultset. Therefore I thought of writing this blog post.
When we require to fetch details from multiple tables the JOIN caluse is there for the rescue. But in SQL Server there are various types of JOINs which will cater our requirement in different ways. So it’s very important to have a good understanding in these types of JOINs and their usage.
In SQL Server following types of JOINs available.
  • INNER JOIN
  • OUTER JOIN
    • LEFT OUTER JOIN
    • RIGHT OUTER JOIN
    • FULL OUTER JOIN
  • CROSS JOIN
  • CROSS APPLY
  • OUTER APPLY
We will look into the afrementioned JOINs more closely. The scope of this article is to give a high-level idea on the aforementioned  JOINs and the APPLY operator in SQL Server.
To illustrate the aforementioned JOINs I will use the following sample tables:
  • SalesRep
  • SalesDetails
  • RepRating
  • Settings
We consider a case where we have 5 Sales Reps and the details will be saved in ‘RepDetails’ table and the sales transactions which they have done is recorded under ‘SalesDetails’ table. In the SalesDetails table we have included few transactions which we don’t have a matching Sales Rep. Similarly in the RepDetails table there are couple of sales reps which we don’t have any sales infromation.

--== Create Tables ==--
CREATE TABLE RepDetails(
 RepId  INT
 ,RepName VARCHAR(30)
)

CREATE TABLE SalesDetails(
 RepId  INT
 ,SaleMonth VARCHAR(6)
 ,OrderNo VARCHAR(6)
 ,SaleValue MONEY
)

CREATE TABLE RepRating(
 RepId  INT
 ,Rate  INT
 ,YearMonth VARCHAR(6)
)

CREATE TABLE Settings(
 S_Id  INT
 ,S_Desc  VARCHAR(20)
 ,S_Value VARCHAR(20)
)


--== Populate Sample Data ==--
INSERT INTO RepDetails (
 [RepId]
 ,[RepName]
) VALUES 
 (1,'Eugene Thomas')
 ,(2,'John Wheeler')
 ,(3,'Curtis Bailey')
 ,(4,'Jeffrey Garrett')
 ,(5,'Rosemarie Hubbard')

INSERT INTO SalesDetails (
 [RepId]
 ,[SaleMonth]
 ,[OrderNo]
 ,[SaleValue]
) 
VALUES 
(7,'201607','XpyDy3',839)
,(1,'201607','NR0RTp',496)
,(4,'201607','4552T4',299)
,(6,'201607','GKhkyC',877)
,(4,'201606','iyK65Z',291)
,(6,'201606','NFCszW',446)
,(7,'201606','D238bN',135)
,(1,'201607','bERDXk',304)
,(7,'201608','nykZqB',935)
,(4,'201608','R7ea5v',352)
,(6,'201606','VVjIdo',407)
,(7,'201608','vtLT4z',977)
,(2,'201608','xnHTnO',416)
,(1,'201606','jFAJIm',674)
,(6,'201606','0Q011m',480)


INSERT INTO dbo.RepRating(
 RepId
 ,Rate
 ,YearMonth
)
VALUES
 (1,1,'201608')
 ,(3,2,'201608')
 ,(4,1,'201609')
 ,(2,2,'201609')

INSERT INTO dbo.Settings(
 S_Id
 ,S_Desc
 ,S_Value
)
VALUES
 (1,'LedgerMonth','201609')
 ,(2,'TaxRate','10%')

**Note: During the illustraion I will refer the table which is followed by the ‘FROM’ clause as the ‘Left Table’ and the table which is follwed by the JOIN clause as the ‘Right Table’.

INNER JOIN / JOIN

When we join two or more tables using an INNER JOIN, it will only return us the results when records can only be found on both left and right tables which will satisfy the condition we supply.
image


This can be illustrated using a venn diagram as follows:
image

SELECT *
FROM
 dbo.RepDetails AS RD
 JOIN dbo.SalesDetails AS SD
  ON SD.RepId = RD.RepId

image

**Please note: We have sales reps having RepId’s 1,2,3,4, & 5. But in SalesDetails table we have sales details for RepId’s 1,2,4,6 &7. So when these tables are joined the RepId’s which resides on both tables, which are 1,2, and 4 will return the details, ultimately giving us the aforementioned result set.

LEFT OUTER JOIN / LEFT JOIN

In a LEFT OUTER JOIN, unlike the INNER JOIN, it will select all the records from the ‘Left’ table and based on the JOIN condition, it will select any matching records from the ‘Right’ table and return us the results. If there are no matching details on the ‘Right’ table, columns on related to those rows will return as ‘NULL’.
image

This can be shown using a venn diagram as follows:
image

SELECT * 
FROM
 dbo.RepDetails AS RD
 LEFT JOIN dbo.SalesDetails AS SD
  ON SD.RepId = RD.RepId

image

RIGHT OUTER JOIN / RIGHT JOIN

In a RIGHT OUTER JOIN, it will select all records from the ‘Right’ table and based on the JOIN condition it will select any matching records from the left table and return. If there aren’t any matching records on the left table it will return a ‘NULL’ value.










This can be shown using a venn diagram as follows:
image

SELECT * 
FROM
 dbo.SalesDetails AS SD
 RIGHT JOIN dbo.RepDetails AS RD
  ON SD.RepId = RD.RepId

image

FULL OUTER JOIN / FULL JOIN

FULL OUTER JOIN is kind of a mx of both LEFT & RIGHT OUTER JOINs. It will return all rows from both ‘Left’ and ‘Right’ tables based on the JOIN condition. When the details aren’t matched it will return a NULL value in those respective columns.


image


This can be shown using a venn diagram as follows:
image

SELECT * 
FROM
 dbo.RepDetails AS RD
 FULL OUTER JOIN dbo.SalesDetails AS SD
  ON SD.RepId = RD.RepId


image

CROSS JOIN

CROSS JOIN will return a result set which the number of rows equal to rows in ‘Left’ table multiplied by the number of rows in ‘Right’ table. Usually this behaviour is present when there’s no condition provided in the WHERE condition. So each row in the left table is joined to each row in the right table. Usuually this behaviour is called ‘Cartisian Product’


image
SELECT * 
FROM
 dbo.RepDetails AS RD
 CROSS JOIN dbo.Settings AS S

image


But when some condition is provided via the WHERE clause CROSS JOIN will behave like an INNER JOIN
SELECT * 
FROM
 dbo.RepDetails AS RD
 CROSS JOIN dbo.Settings AS S
WHERE
 RD.RepId = S.S_Id

image
**Note: In a CROSS JOIN it’s not possible to refer to a value in the Left table along with the right table. Example following code will result in an error.
SELECT * 
FROM
 dbo.RepDetails AS RD
 CROSS JOIN (SELECT * FROM dbo.Settings AS S WHERE S.S_Id = RD.RepId ) AS ST

 

CROSS APPLY behaves like an INNER JOIN and OUTER APPLY behaves like an OUTER JOIN. But the main differnce in APPLY compared to the JOIN is that the right side of the APPLY operator can reference columns in the table which is on the left side. This is not possible in a JOIN.
For example, suppose we need to fetch sales rep details along with the maximum sale record which they have done. So the following query is not possible since it is returning an error due to the aforementioned reason.
SELECT 
 *
FROM
 dbo.RepDetails AS RD
 JOIN(
  SELECT TOP 1 * 
  FROM 
   dbo.SalesDetails AS SD 
  WHERE 
   RD.RepId = SD.RepId 
  ORDER BY  
   SD.SaleValue DESC
 ) AS SData 
  ON 1=1
It will result in an error:
Msg 4104, Level 16, State 1, Line 78
The multi-part identifier "RD.RepId" could not be bound.



The way to achieve this is by using an APPLY.

CROSS APPLY

Considering the above requirement, we can use a CROSS APPLY in order to achieve the aforementioned.
SELECT 
 *
FROM
 dbo.RepDetails AS RD
 CROSS APPLY(
  SELECT TOP 1 * 
  FROM 
   dbo.SalesDetails AS SD 
  WHERE 
   RD.RepId = SD.RepId 
  ORDER BY  
   SD.SaleValue DESC
 ) AS SData 

image


Noticed the above sample, you can see that it returned three records. But if you inspect closely, the SalesRep table consists with five Reps. But CROSS APPLY has only returned the maximum sales value if there’s a matching record in the table right side to the  APPLY operator. (Similar to an INNER JOIN)


OUTER APPLY

Using OUTER APPLY we can achieved a similar result like CROSS APPLY, but the difference is that even though there aren’t any matching records in the table right side to the APPLY operator, still it will return all the rows from the left side table, will NULL values for the columns in the right side table. We will consider the same query what we used in the above example, but changing the APPLY to an OUTER APPLY.
SELECT 
 *
FROM
 dbo.RepDetails AS RD
 OUTER APPLY(
  SELECT TOP 1 *
  FROM 
   dbo.SalesDetails AS SD 
  WHERE 
   RD.RepId = SD.RepId 
  ORDER BY  
   SD.SaleValue DESC
 ) AS SData
image
There are other capabilities which is possible using the APPLY. The following article explains these capabilites really well: http://bradsruminations.blogspot.sg/2011/04/t-sql-tuesday-017-it-slices-it-dices-it.html

Hope this will help you to understand the JOIN and the APPLY operator in SQL Server and where it can be used precisely.











Wednesday 9 March 2016

String or binary data would be truncated / Arithmetic overflow error converting numeric to data type numeric – Workaround

 

There’s nothing more annoying than getting the error ‘String or binary data would be truncated’ or ‘Arithmetic overflow error converting numeric to data type numeric’, when you need to insert data to a table using a SELECT statement. To make it more interesting, the SQL won’t be providing us the name of the column (or columns) which is causing this issue. (This is due to the SQL architecture on how it executes queries)

To illustrate this I will use a small sample.

Suppose we have a table to store some Customer details:

CREATE TABLE Customer_Data(
    CustId        TINYINT
    ,CustFName    VARCHAR(10)
    ,CustLName    VARCHAR(10)
    ,MaxCredit    NUMERIC(6,2)
)

We will try to insert details to the above table. (In reality the SELECT statement will be very complex and could fetch lots of rows)


INSERT INTO dbo.Customer_Data(
    CustId
    ,CustFName
    ,CustLName
    ,MaxCredit
)

SELECT 1 AS CustId,'John' AS CustFName,'Doe' AS CustLName,1000.00 AS MaxCredit UNION ALL
SELECT 2,'Jane','Doe',1000.00 UNION ALL
SELECT 3,'James','Whitacker Jr.',15000.00

 

This will result the following error:

Msg 8152, Level 16, State 14, Line 48
String or binary data would be truncated.

The statement has been terminated.

The challenge here is to find out actually which columns are having this issue. (As mentioned in reality number of columns could be very large)

However there is a small workaround which we can use to find out the columns which is causing the insertion to fail. You need to do the following in order to find out these columns.

1. First create a table using the same select statement. (You can either create a temporary table or an actual table based on the environment and your need). I will create two tables, one actual and one temporary to illustrate both the options.

SELECT A.*
INTO Temp_Customer_Data
FROM(
    SELECT 1 AS CustId,'John' AS CustFName,'Doe' AS CustLName,1000.00 AS MaxCredit UNION ALL
    SELECT 2,'Jane','Doe',1000.00 UNION ALL
    SELECT 3,'James','Whitacker Jr.',15000.00
) AS A


SELECT A.*
INTO #Customer_Data
FROM(
    SELECT 1 AS CustId,'John' AS CustFName,'Doe' AS CustLName,1000.00 AS MaxCredit UNION ALL
    SELECT 2,'Jane','Doe',1000.00 UNION ALL
    SELECT 3,'James','Whitacker Jr.',15000.00
) AS A

2. Use the following query to identify the issue columns

Actual Table:

;WITH Cte_Source AS (
SELECT
    C.COLUMN_NAME
    ,C.DATA_TYPE
    ,C.CHARACTER_MAXIMUM_LENGTH
    ,C.NUMERIC_PRECISION
    ,C.NUMERIC_SCALE
FROM
    INFORMATION_SCHEMA.TABLES AS T
    JOIN INFORMATION_SCHEMA.COLUMNS AS C
        ON C.TABLE_CATALOG = T.TABLE_CATALOG
        AND C.TABLE_NAME = T.TABLE_NAME
        AND C.TABLE_SCHEMA = T.TABLE_SCHEMA
WHERE
    T.TABLE_NAME = 'Temp_Customer_Data'        -- Source Table
    AND T.TABLE_SCHEMA = 'dbo'
)
,Cte_Destination AS (
SELECT
    C.COLUMN_NAME
    ,C.DATA_TYPE
    ,C.CHARACTER_MAXIMUM_LENGTH
    ,C.NUMERIC_PRECISION
    ,C.NUMERIC_SCALE
FROM
    INFORMATION_SCHEMA.TABLES AS T
    JOIN INFORMATION_SCHEMA.COLUMNS AS C
        ON C.TABLE_CATALOG = T.TABLE_CATALOG
        AND C.TABLE_NAME = T.TABLE_NAME
        AND C.TABLE_SCHEMA = T.TABLE_SCHEMA
WHERE
    T.TABLE_NAME = 'Customer_Data'        -- Destination Table
    AND T.TABLE_SCHEMA = 'dbo'
)
SELECT
    S.COLUMN_NAME
   ,S.DATA_TYPE
   ,S.CHARACTER_MAXIMUM_LENGTH
   ,S.NUMERIC_PRECISION
   ,S.NUMERIC_SCALE

   ,D.COLUMN_NAME
   ,D.DATA_TYPE
   ,D.CHARACTER_MAXIMUM_LENGTH
   ,D.NUMERIC_PRECISION
   ,D.NUMERIC_SCALE
FROM
    Cte_Source AS S
    JOIN Cte_Destination AS D
        ON D.COLUMN_NAME = S.COLUMN_NAME
WHERE
    S.CHARACTER_MAXIMUM_LENGTH > D.CHARACTER_MAXIMUM_LENGTH
    OR S.NUMERIC_PRECISION > D.NUMERIC_PRECISION

 

Temporary Table:

;WITH Cte_Source AS (
SELECT
    C.COLUMN_NAME
    ,C.DATA_TYPE
    ,C.CHARACTER_MAXIMUM_LENGTH
    ,C.NUMERIC_PRECISION
    ,C.NUMERIC_SCALE
FROM
    tempdb.sys.objects so
    JOIN tempdb.INFORMATION_SCHEMA.TABLES AS T
        ON so.name = T.TABLE_NAME
        AND so.[object_id] = OBJECT_ID('tempdb..#Customer_Data')
    JOIN tempdb.INFORMATION_SCHEMA.COLUMNS AS C
        ON C.TABLE_CATALOG = T.TABLE_CATALOG
        AND C.TABLE_NAME = T.TABLE_NAME
        AND C.TABLE_SCHEMA = T.TABLE_SCHEMA
   
WHERE
    T.TABLE_SCHEMA = 'dbo'
)
,Cte_Destination AS (
SELECT
    C.COLUMN_NAME
    ,C.DATA_TYPE
    ,C.CHARACTER_MAXIMUM_LENGTH
    ,C.NUMERIC_PRECISION
    ,C.NUMERIC_SCALE
FROM
    INFORMATION_SCHEMA.TABLES AS T
    JOIN INFORMATION_SCHEMA.COLUMNS AS C
        ON C.TABLE_CATALOG = T.TABLE_CATALOG
        AND C.TABLE_NAME = T.TABLE_NAME
        AND C.TABLE_SCHEMA = T.TABLE_SCHEMA
WHERE
    T.TABLE_NAME = 'Customer_Data'        -- Destination Table
    AND T.TABLE_SCHEMA = 'dbo'
)
SELECT
    S.COLUMN_NAME
   ,S.DATA_TYPE
   ,S.CHARACTER_MAXIMUM_LENGTH
   ,S.NUMERIC_PRECISION
   ,S.NUMERIC_SCALE

   ,D.COLUMN_NAME
   ,D.DATA_TYPE
   ,D.CHARACTER_MAXIMUM_LENGTH
   ,D.NUMERIC_PRECISION
   ,D.NUMERIC_SCALE
FROM
    Cte_Source AS S
    JOIN Cte_Destination AS D
        ON D.COLUMN_NAME = S.COLUMN_NAME
WHERE
    S.CHARACTER_MAXIMUM_LENGTH > D.CHARACTER_MAXIMUM_LENGTH
    OR S.NUMERIC_PRECISION > D.NUMERIC_PRECISION

 

Both the aforementioned queries will return the following result.

image

The reason to return the above three columns as follows:

1. CustId ==> In our destination table CustId’s data type is TINYINT. Even the select query is returning the results within the boundary, the data type which our insertion query is returning is an INT. So there could be a possibility that there could be large numbers that the destination table could not hold.

2. CustName ==> ‘Whitacker Jr.’ is exceeding the maximum length of 10 which is in the destination table.

3. MaxCredit ==> In the destination table the size of the column is numeric (6,2). Which means it can hold values up to 9999.99. But our insertion query contains a record which consists of 15000.00.

 

Hope this might be helpful to you.

Thursday 3 March 2016

Extracting Date (Excluding Time) from a DateTime value in SQL Server

 

SQL Server supports many data types where we can store the Date along with the time, such as

  • DateTime
  • SmallDateTime
  • DateTimeOffset
  • DateTime2

But in some cases it’s required only to fetch only the date portion from an aforementioned type of field.

There are few ways which we can achieve this task easily using T-SQL.

The easiest of the method is to CAST the DateTime value directly to a DATE type.

SELECT CAST(GETDATE() AS DATE)                                    --==> 2016-03-03

Also you can achieve this by using the CONVERT function providing different styles as per your requirement.

SELECT CONVERT(VARCHAR(24),GETDATE(),101)                        --==> 03/03/2016
SELECT CONVERT(VARCHAR(24),GETDATE(),102)                        --==> 2016.03.03

Please refer to the following URL (https://msdn.microsoft.com/en-sg/library/ms187928.aspx) for more details on the CONVERT function and supported styles:

But if your requirement is to return a DateTime type but having only the date portion you can use the following syntax:

SELECT CAST(FLOOR(CAST(GETDATE() AS FLOAT)) AS DATETIME)        --==> 2016-03-03 00:00:00.000
SELECT CAST(CONVERT(VARCHAR(8),GETDATE(),112) AS DATETIME)      --==> 2016-03-03 00:00:00.000

Tuesday 16 February 2016

Index REBUILD vs. REORGANIZE in SQL SERVER

Couple of days back there was an interesting statement (or rather a question) was brought up by one of the colleagues in the company. Ultimately the initial stement left us with one simple question, which is the difference between Index REBUILD and REORGANIZE and when should be exactly use it.

If you google the aforementioned you can find numerous posts/blogs regarding this. Therefore I will keep things very simple and easier way to understand.

Rebuilding an index or Reorganizing is required when index fragmentation has reached to a considerable percentage. The fragmentation percentage can be identified using the Dynamic Management View - sys.dm_db_index_physical_stats in SQL Server.

You may get more details on the view on the following link: https://msdn.microsoft.com/en-us/library/ms188917.aspx

You can get a list of fragmented indexes using the following query:

SELECT
    OBJECT_NAME(Stat.object_id)
    ,I.name
    ,Stat.index_type_desc
    ,Stat.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(),NULL,NULL,NULL,NULL) AS Stat
JOIN sys.indexes AS I
        ON Stat.index_id = I.index_id
        AND Stat.object_id = I.object_id
WHERE
    Stat.avg_fragmentation_in_percent > 30

Executing the above query will give you a list of fragmented indexes which has more than 30% fragmentation. ‘index_type_desc’ will give you a hint what sort of index is it. (clustered, non-clustered, heap etc…)

As per the guidlines provided by Microsoft, it’s the best practice to Reorganize the index if the fragmentation is less than or equal to 30% (more than 5%) and Rebuild it if it’s more than 30%

 

Rebuilding Indexes

  • Should perform this if the fragmentation is more than 30%
  • Operation can be done online or offline

Index rebuilding can be done useing the following syntax:

In order to build all the indexes on a specific table:

USE <Database_Name>
GO

ALTER INDEX ALL ON <Table_Name> REBUILD
GO

 

In order to build only a specific index:

USE <Database_Name>
GO

ALTER INDEX <Index_Name> ON <Table_Name> REBUILD
GO

 

Reorganizing Indexes

  • Should perform this if the fragmentation is more than 5% but less than or equal to 30%
  • Operation is always online

Index reorganizing can be done using the following syntx:

In order to reorganize all the indexes on a specific table:

USE <Database_Name>
GO

ALTER INDEX ALL ON <Table_Name> REORGANIZE
GO

In order to reorganize only a specific index:

USE <Database_Name>
GO

ALTER INDEX <Index_Name> ON <Table_Name> REORGANIZE
GO

 

Optionally you can set many attributes during the Rebuild or Re-Organize process (Eg: FILLFACTOR, SORT_IN_TEMPDB etc..). Please check on the following link for more details on the REBUILD options: https://msdn.microsoft.com/en-us/library/ms188388.aspx

How ever REBUILD or REORGANIZE will not have an effect on the HEAP fragmentation. In order to remove the heap fragmentation you can use the followng syntax (*** NOT THE BEST PRACTICE):

USE <Database_Name>
GO

ALTER TABLE <Table_Name> REBUILD
GO

** Eventhough the aforementioned syntax will remove the HEAP fragmentation, it is considered as bad as creating and dropping a clustered index, which will leave behind lots of fragmentation on non clustered indexes. The best practise would be to create a clustered index on the table to remove the HEAP fragmentation. You can find more details on this on the following blog post by Paul. S Randal which he had illustrated nicely.