Manjuke's Blog: SQL

Showing posts with label SQL. Show all posts

Wednesday, 19 September 2012

Introduction to SSDT (SQL Server Data Tools)

What is SSDT ?

SQL Server Data Tools (SSDT) is a toolset which provides an integrated environment for database developers to carry out all their database design work for any SQL Server platform (both on and off premise) within Visual Studio. Database developers can use the SQL Server Object Explorer in VS to easily create or edit database objects and data, or execute queries.

In a previous blog entry I have described on how to install SSDT into the VS environment. During this I will briefly describe few of it’s features.

SSDT’s intention is not to replace the SQL Server Management Studio, but provide a developer a complete development environment, which the developer need not required to leave Visual Studio IDE to do any database related development. The tool does not contain all the features which you find in SSMS, yet it’ll provide sufficient functionality which will required to most of the developers during their development tasks.

I find following features, pretty much interesting and helps to increase the productivity of the developer/team.

1. Design & Code view in a single screen.

Ability to see the design view and the code view in a single screen is a wonderful thing. You do not have to move across screens. And the changes you do to the design view will be affected to the code immediately and vice versa.

2. Ability to add Constraints, Indexes, Foreign Keys & Triggers without changing the screen

Adding constraints, indexes, etc.. are much easier. You can add those by right clicking and choosing the ‘add <object>’ menu like shown below.

3. It uses ‘Declarative – Model Based Development’

What this means is that there is always an in-memory representation of what a database looks like—an SSDT database model— and all the SSDT tools (designers, validations, IntelliSense, schema compare, and so on) operate on that model. This model can be populated by a live connected database (on-premise or SQL Azure), an offline database project under source control, or a point-in-time snapshot taken of an offline database project (you will work with snapshots in the upcoming exercises). But to reiterate, the tools are agnostic to the model’s backing; they work exclusively against the model itself. Thus, you enjoy a rich, consistent experience in any scenario—regardless of whether you’re working with on-premise or cloud databases, offline projects, or versioned snapshots.

4. Connected Development

Although SSDT places great emphasis on the declarative model, it in no way prevents you from working imperatively against live databases when you want or need to. You can open query windows to compose and execute T-SQL statements directly against a connected database, with the assistance of a debugger if desired, just as you can in SSMS.

5. Disconnected Development

The new SQL Server Object Explorer lets you connect to and interact with any database right from inside Visual Studio. But SSDT offers a great deal more than a mere replacement for the connected SSMS experience. It also delivers a rich offline experience with the new SQL Server Database Project type and local database runtime (LocalDB).

Actually this is one great feature it has. Because due to many reasons, most of the time we are requested to work offline, or to do our development locally and later on merged. So this feature allows us to maintain a local database and do all the development using that. And later on merged/published to other database. And by doing a schema comparison, it’s possible to find out the changes we have done.

6. Schema Comparison

This is a very valuable feature. This allows us to do a schema comparison between our development environment and the physical database (or vice versa) and find out the changes very quickly.

7. Saving snapshots of the Database

Sometimes it’s required to keep snapshots of your database at different stages of the development. And the best thing is, it even allows you to compare between two database snapshots. So it’s easy to see the changes you have done compared to the previous stages.

8. Ability to find any errors or reference issues in design time

When SSDT is used it’s easy to identify any syntax issue or any reference issue before deploying it to the database. E.g. assume we have one view which is referring to few columns of a table. Usually if someone change or remove any columns from the table which this view is referring, there is no way of identify that, till the view is used in our application. But when SSDT is used, it will show you these issues, when building the project. So these can be eliminated before we apply these to the deployment server.

These are some features which I have find very interesting and most developers expect. But having said that I am not saying it’s the complete set of features. There are few missing functionalities which I felt that it would have been even nicer, if those were there.

And you can find a good article in here (http://www.codeproject.com/Articles/357905/Evaluating-SQL-Server-Data-Tools) regarding the SSDT. You can find things in more depth.

Friday, 31 August 2012

Installing SQL Server Data Tools (SSDT)

What is SQL Server Data Tools?

SQL Server Data Tools (SSDT) is a toolset which provides an integrated environment for database developers to carry out all their database design work for any SQL Server platform (both on and off premise) within Visual Studio. Database developers can use the SQL Server Object Explorer in VS to easily create or edit database objects and data, or execute queries.

More details are available regarding it’s features at http://blogs.msdn.com/b/ssdt/archive/2011/11/21/what-is-sql-server-data-tools-ssdt.aspx

SSDT is not intended to be a replacement for SSMS, but instead can be viewed much more as a greatly evolved implementation of DbPro. Indeed, SSMS is alive and well in SQL Server 2012, and it continues to serve as the primary management tool for database administrators who need to configure and maintain healthy SQL Server installations.

However SSDT does not get installed with either Visual Studio or SQL Server. Instead, SSDT ships separately via the Web Platform Installer (WebPI).

Download SSDT from http://go.microsoft.com/fwlink/?LinkID=241405
Once it’s downloaded, open a command window with administrative privileges (run cmd.exe as Administrator), and execute the following command :

SSDTSetup.exe /layout <destination>

<destination> is the path which the WebPI will download all the necessary installation files and create the administrative installation point. This can be a location in either LAN, USB or your Local Drive.

Then it’ll start to download the required files to the given location.

Once everything is downloaded you can execute the ‘SSDTSetup.exe’ from the destination location. (without any arguments). Once the installation is completed successfully, you can see the tool in Visual Studio 2010 development environment. Select it from the ‘View’ menu.

Thursday, 8 March 2012

Locks and Duration of Transactions in MS SQL Server

It is a common argument which I hear among developers these days, regarding SQL locks. Some say that the ‘locks are held for the duration of the entire transaction’. But others debate that ‘locks will be only held for the duration of the statement execution’. But who is correct ?

Well both parties are correct up to a certain point. Actually lock durations are depend on the Isolation Levels.

As mentioned in the SQL-99 Standards, there are 4 Transaction Isolation Levels

Read Committed (Default)
Read Uncommitted
Repeatable Read
Serializable

SQL Server** provides following two additional isolation levels (** SQL Server 2005 & Upwards)

Snapshot
Read Committed Snapshot

There are several concurrency issues which can occur in a DBMS when multiple users try to access the same data. Each isolation level protects against a specific concurrency problem.

Lost Update
Dirty Read
Non-Repeatable Read
Phantom Reads

Lost Update – This can take place in two ways. First scenario: it can take place when data that has been updated by one transaction (Transaction A), overwritten by another transaction (Transaction B), before the Transaction A commits or rolls back. (But this type of lost update can never occur in SQL Server** under any transaction isolation level)

The second scenario is when one transaction (Transaction A) reads a record and retrieve the value into a local variable and that same record will be updated by another transaction (Transaction B). And later Transaction A will update the record using the value in the local variable. In this scenario the update done by Transaction B can be considered as a ‘Lost Update’.

Dirty Read – This is when the data which is changed by one transaction (Uncommitted) is accessed by a different transaction. All isolation levels except for the ‘Read Uncommitted’ are protected against ‘Dirty Reads’.

Non Repeatable Read – This is when a specific set of data which is accessed more than once in one transaction (Transaction A) and between these accesses, it’s being updated or deleted by another transaction (Transaction B). The repeatable read, serializable, and snapshot isolation levels protect a transaction from non-repeatable reads.

Phantom Read – This is when two queries in the same transaction, against the same table, use the same ‘WHERE’ clause, and the query executed last returns more rows than the first one. Only the serializable and snapshot isolation levels protect a transaction from phantom reads.

In order to solve the above mentioned concurrency issues, SQL Server uses the following type of locks.

Shared or S-locks - Shared locks are sometimes referred to as read locks. There can be several shared locks on any resource (such as a row or a page) at any one time. Shared locks are compatible with other shared locks.
Exclusive or X-locks - Exclusive locks are also referred to as write locks. Only one exclusive lock can exist on a resource at any time. Exclusive locks are not compatible with other locks, including shared locks.
Update or U-locks - Update locks can be viewed as a combination of shared and exclusive locks. An update lock is used to lock rows when they are selected for update, before they are actually updated. Update locks are compatible with shared locks, but not with other update locks.

Please refer to the following link to get more information regarding lock types. http://msdn.microsoft.com/en-us/library/ms175519.aspx

As I have mentioned earlier, the type of lock which the SQL server will be acquired depends on the active transactions isolation level. I will briefly describe each isolation level a bit further.

Read Committed Isolation Level – This is the default isolation level for new connections in SQL Server. This makes sure that dirty reads do not occur in your transactions. If the connection uses this isolation level, and if it encounters a dirty row while executing a DML statement, it’ll wait until the transaction which owns that row has been committed or rolled back, before continuing execution further ahead.

Read Uncommitted Isolation level - Though this is not highly recommended by experts, it's better to consider about it too. It may result in a 'dirty read', but when correctly used it could provide great performance benefits.

You should consider using this isolation level only in routines where the issue of dirty reads is not a problem. Such routines usually return information that is not directly used as a basis for decisions. A typical example where dirty reads might be allowed is for queries that return data that are only used in lists in the application (such as a list of customers) or if the database is only used for read operations.

The read uncommitted isolation level is by far the best isolation level to use for performance, as it does not wait for other connections to complete their transactions when it wants to read data that these transactions have modified. In the read uncommitted isolation level, shared locks are not acquired for read operations; this is what makes dirty reads possible. This fact also reduces the work and memory required by the SQL Server lock manager. Because shared locks are not acquired, it is no problem to read resources locked by exclusive locks. However, while a query is executing in the read uncommitted isolation level, another type of lock called a ‘schema stability lock’ (Sch-S) is acquired to prevent Data Definition Language (DDL) statements from changing the table structure. Below is an example of the behavior of this isolation level.

Repeatable Read Isolation Level - In this isolation level, it guarantees that dirty reads do not happen in your transaction. Also it makes sure that if you execute/issue two DML statements against the same table with the same where clause, both queries will return the same results. But this isolation level will protect against updates and deletes of earlier accessed rows, but not the inserts, which is known as ‘Phantom’ rows concurrency problem. Note that phantom rows might also occur if you use aggregate functions, although it is not as easy to detect.

Serializable Isolation Level – This guarantees that none of the aforesaid concurrency issues can occur. It is very much similar to the ‘repeatable read isolation level’ except that this prevents the ‘phantom read’ also. But use of this isolation level increases the risk of having more blocked transactions and deadlocks compared to ‘Repeat Read’. However it will guarantee that if you issue two DML statements against the same table with the same WHERE clause, both of them will return exactly the same results, including same number of row count. To protect the transaction from inserts, SQL Server will need to lock a range of an index over a column that is included in the WHERE clause with shared locks. If such an index does not exist, SQL Server will need to lock the entire table.

Snapshot Isolation Level – In addition to the SQL’s standard isolation levels, SQL 2005 introduced ‘Snapshot Isolation Level’. This will protect against all the above mentioned concurrency issues, like the ‘Serializable Isolation Level’. But the main difference of this is, that it does not achieve this by preventing access to rows by other transaction. Only by storing versions of rows while the transaction is active as well as tracking when a specific row was inserted.

To illustrate this I will be using a test database. It’s name is ‘SampleDB’. First you have to enable the ‘Snapshot Isolation Level’ prior using it

alter database SampleDB set allow_snapshot_isolation on;
alter database SampleDB set read_committed_snapshot off;

Now we’ll create a sample table and insert few records.

create table SampleIsolaion(
    id int,
    name varchar(20),
    remarks varchar(20) default ''
)

insert into SampleIsolaion (id,name,remarks)
select 1, 'Value A', 'Def' union
select 2, 'Value B', 'Def'

Read Committed Snapshot Isolation Level – This can be considered as a new implementation of the ‘Read Committed’ isolation level. When this option is set, this provides statement level read consistency and we will see this using some examples in the post. Using this option, the reads do not take any page or row locks (only SCH-s: Schema Stability locks) and read the version of the data using row versioning by reading the data from tempdb. This option is set at the database level using the ALTER DATABASE command

I will illustrate the use of this isolation level with a sample. First enable the required isolation level.

alter database SampleDB set read_committed_snapshot on;
alter database SampleDB set allow_snapshot_isolation on;

Now lets create a table and populate it with few sample data.

create table sample_table(
    id int,
    descr varchar(20),
    remarks varchar(20)
)

insert into sample_table
select 1,'Val A','Def' union
select 2,'Val B','Def'

Now open two query windows in SQL Server Management Studio.

--Window 1
begin tran
    update sample_table set descr = 'Val P', remarks = 'Window 1' where id = 1

Without committing execute the following in the second window

--Window 2
begin tran
    set transaction isolation level read committed    
    select * from sample_table

And you can see, even without committing, it’ll read from the older values, from the row versions which were created in the tempdb. If it was only the ‘Read Commited’ isolation level without the ‘Read Committed Snapshot’ option turned on, this select statement would have been locked.

Friday, 5 August 2011

How to insert data using SQL Views created using multiple tables

A view can be defined as a virtual table or a stored query and the data accessible through a view is not stored in the database as a distinct object. Only the select statement is stored on the database instead.

How ever views can be used and perform DML operations (Insert, Update & Delete) also.

Consider the following two tables.

CREATE TABLE STUDENT(
    STD_ID        INT,
    STD_FNAME    VARCHAR(20),
    STD_LNAME    VARCHAR(20)
)


CREATE TABLE STUDENT_PAYMENT(
    STD_ID        INT,
    PAY_AMT        MONEY,
    PAY_DATE    DATETIME
)

Now create the following views.

CREATE VIEW VW_STUDENT
AS
SELECT 
    STD_ID, 
    STD_FNAME, 
    STD_LNAME
FROM 
    STUDENT

CREATE VIEW VW_STUDENT_PAYMENT
AS
SELECT 
    STD_ID, 
    PAY_AMT, 
    PAY_DATE
FROM
    STUDENT_PAYMENT

You can insert data to the above tables using the views we have just created. And it is the same syntax that we use to insert data to tables.

INSERT INTO VW_STUDENT
SELECT 1,'Peter','Parker' UNION
SELECT 2,'James', 'Watson'

INSERT INTO VW_STUDENT_PAYMENT
SELECT 1,1000,'01/01/2011' UNION
SELECT 1,1100,'01/02/2011' UNION
SELECT 1,1200,'01/03/2011' UNION
SELECT 1,1250,'01/04/2011' UNION
SELECT 1,1375,'01/05/2011' UNION
SELECT 2,750,'01/03/2011' UNION
SELECT 2,850,'01/04/2011' UNION
SELECT 2,950,'01/05/2011'

And if you query the tables you can see that the records have inserted correctly.

Now we will create the following view. This time we will join two tables and create a somewhat complex query.

CREATE VIEW VW_LAST_PAYMENT_DETAILS AS
    WITH CTE_STD (STD_ID,MAX_PAYDATE) AS (
        SELECT SP.STD_ID, MAX(SP.PAY_DATE) AS MAX_PAYDATE
        FROM STUDENT_PAYMENT AS SP
        GROUP BY SP.STD_ID
    )
    SELECT S.STD_ID,S.STD_FNAME,S.STD_LNAME, P.PAY_AMT,P.PAY_DATE
    FROM STUDENT AS S
    JOIN STUDENT_PAYMENT AS P ON S.STD_ID = P.STD_ID 
    JOIN CTE_STD AS Q ON P.STD_ID = Q.STD_ID AND P.PAY_DATE = Q.MAX_PAYDATE
    GROUP BY S.STD_ID,S.STD_FNAME,S.STD_LNAME, P.PAY_AMT,P.PAY_DATE

Using the above created view we can list the last payment details of each student.

So if we required to insert last payment details using this view how shall we do it ? If you use the simple insert statements similar to the ones, we used earlier, you have could ended up with the following error.

INSERT INTO VW_LAST_PAYMENT_DETAILS (STD_ID,PAY_AMT,PAY_DATE)
SELECT 1,4440,GETDATE()

In order to insert (update & delete) data to views created using multiple tables, you need to use an ‘Instead of trigger’.

**Please note that ‘After Triggers’ cannot be created for views.

Let’s create an instead of trigger using the following syntax.

CREATE TRIGGER TRGI_VW_PAYMENT ON VW_LAST_PAYMENT_DETAILS
INSTEAD OF INSERT
AS
BEGIN
    INSERT INTO STUDENT_PAYMENT
    SELECT STD_ID,PAY_AMT,PAY_DATE
    FROM INSERTED
END

Now using the above insert syntax, you can insert data without getting any error. If you inspect the ‘STUDENT_PAYMENT’ table you can see that the data has been inserted successfully.

Thursday, 28 July 2011

How to Use Update Cursors in SQL Server

There can be a situation where you have to use a cursor, even though the experts say not to use cursors or to avoid them as much as possible. But if you look closely, most of the time we use cursors to iterate through a row collection and update the same table.

In these type of situations it is ideal to use a Update Cursor, than using the default read only one.

Consider the following table :

CREATE TABLE [dbo].[SAMPLE_EMPLOYEE](
    [EMP_ID] [int] NOT NULL,
    [RANDOM_GEN_NO] [VARCHAR](50) NULL
) ON [PRIMARY]

Insert few records to the above table using the following script :

SET NOCOUNT ON
DECLARE @REC_ID        AS INT

SET @REC_ID = 1

WHILE (@REC_ID <= 1000)
BEGIN
    INSERT INTO SAMPLE_EMPLOYEE
    SELECT @REC_ID,NULL
    
    IF(@REC_ID <= 1000)
    BEGIN
        SET @REC_ID = @REC_ID + 1
        CONTINUE
    END
    
    ELSE
    BEGIN
        BREAK
    END
END
SET NOCOUNT OFF

Next we will add a Primary Key using the below script (Or you can use the table designer) :

ALTER TABLE [dbo].[SAMPLE_EMPLOYEE] ADD  CONSTRAINT [PK_SAMPLE_EMPLOYEE] PRIMARY KEY CLUSTERED 
(
    [EMP_ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

** Please note: A primary key should be there if we are to use an update cursor. Otherwise the cursor will be read only.

Here is how you use the Update Cursor. I have highlighted the areas which is differ from compared with a normal cursor. You have to mention which column you are going to update (or all columns in your selection will be updatable) and you have to use ‘where current of <cursor>’ in your update statement.

SET NOCOUNT ON
DECLARE 
    @EMP_ID                        AS INT, 
    @RANDOM_GEN_NO    AS VARCHAR(50),
    @TEMP                        AS VARCHAR(50)



DECLARE EMP_CURSOR CURSOR FOR
SELECT EMP_ID, RANDOM_GEN_NO FROM SAMPLE_EMPLOYEE FOR UPDATE OF RANDOM_GEN_NO
OPEN EMP_CURSOR
FETCH NEXT FROM EMP_CURSOR
INTO @EMP_ID, @RANDOM_GEN_NO

WHILE (@@FETCH_STATUS = 0)
BEGIN
    SELECT @TEMP =  FLOOR(RAND()*10000000000000)
    UPDATE SAMPLE_EMPLOYEE SET RANDOM_GEN_NO = @TEMP WHERE CURRENT OF EMP_CURSOR
    
    FETCH NEXT FROM EMP_CURSOR
    INTO @EMP_ID, @RANDOM_GEN_NO
END

CLOSE EMP_CURSOR
DEALLOCATE EMP_CURSOR

SET NOCOUNT OFF

Tuesday, 18 January 2011

Repeating a SQL row based on a value in a different column

There are times that we get requirements such as populating and duplicate SQL rows, based on a value, on another column. E.g.: In an inventory system when items are received those details will be saved in the following format (ItemDetails) :

And we are asked to create a GUI for end user to enter ‘Serial Numbers’ for each item. And we have to repeat the above mentioned item codes number of times which equals to the ‘ItemQty’. Of course we can achieve that using a SQL cursor or iterate using C# coding. But following example I will show how to do it using SQL.

The task would have been very simple if we would have a another table with a structure similar to this: (TempTable)

So when the two table are joined ‘ItemDetails’ will repeat according to the row count of the ‘TempTable’. But it is not very practical, and it will result in duplicating data, which will grow your database un-necessary when time goes.

But instead we can use on single table which contains a series of numbers. These numbers will start from ‘1’. And the end should be the maximum quantity which an Item can have. For this example I will take ‘10’ as the maximum value. And that table should have the following structure.

Use the following T-SQL statement to create the table:

CREATE TABLE [IntermediateTable](
    [MaxQty] [int] NULL
) ON [PRIMARY]

For this example I have inserted up to 20. But in a real world scenario it may be required to enter values (More than 1000). In a similar situation you can use the following T-SQL statement to insert values.

insert into IntermediateTable
select 
    thousand.number*1000 + 
    hundred.number*100 + 
    ten.number*10 + 
    one.number
from(
    select 1 as number union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0) one
cross join (select 1 as number union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0) ten
cross join (select 1 as number union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0) hundred
cross join (select 1 as number union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0) thousand
where (thousand.number*1000 + hundred.number*100 + ten.number*10 + one.number) > 0 and (thousand.number*1000 + hundred.number*100 + ten.number*10 + one.number) <= 2000
order by (thousand.number*1000 + hundred.number*100 + ten.number*10 + one.number)

** Please Note : Above statement will insert values from ‘1’ to ‘2000’. But removing the where condition will insert values from ‘0’ to ‘10000’.

And using the following T-SQL statement we can join the table and produce the required result.

select A.*
from ItemDetails as A
join IntermediateTable as B on B.MaxQty <= A.ItemQty
where A.BatchNo = 'B1'

Thursday, 4 November 2010

Passing parameters for dynamically created SQL queries

There are times that we need to create SQL queries dynamically and pass values to parameters. You can always assign values with a syntax similar to “… WHERE ColumnName = ‘ + @Value + ‘ and …”. But the disadvantage of using the above syntax is, that you have to provide correct formatting according the data type of the column.

This can be prevented using this type of solution. (Assume you have to get a count of records which matches a certain condition which will be provide outside the query)

   1: declare @Value            as nvarchar(50)   2: declare @Sql            as nvarchar(100)   3: declare @Parameters        as nvarchar(100)   4: declare @Count            as int   5:     6: set @Value = 'ValueX'   7: set @Sql = 'set @Count = (select count(*) from TableName where ColValue = @Value)'   8: set @Parameters = '@Count int output, @Value nvarchar(100)'   9:    10: exec sp_executesql @Sql,@Parameters,@Count output,@Value  11:    12: select @Count

Saturday, 25 September 2010

Removing Duplicate Records From a MS SQL Table – (MS SQL 2005 or above)

Have you ever been in a situation that your SQL tables contain duplicate records, where you have not defined a primary key or an auto increment field. And you need to keep one record and delete the rest.

The usual method of doing this is to use a temporary table or to use a cursor. But there is another method of doing this using a single query in SQL 2005 or above.

To illustrate this first I will create the following table.

create table SampleTable(
    id        int not null,
    name    varchar(20) not null,
    age        int not null
    )

Now I will insert some duplicate records to the above created table.

insert into SampleTable    (id,name,age) values (1,'John',30)
insert into SampleTable    (id,name,age) values (1,'John',30)
insert into SampleTable    (id,name,age) values (1,'John',30)
insert into SampleTable    (id,name,age) values (1,'John',30)
insert into SampleTable    (id,name,age) values (1,'John',30)
insert into SampleTable    (id,name,age) values (2,'Mary',26)
insert into SampleTable    (id,name,age) values (2,'Mary',26)
insert into SampleTable    (id,name,age) values (2,'Mary',26)
insert into SampleTable    (id,name,age) values (2,'Mary',26)
insert into SampleTable    (id,name,age) values (3,'Ann',25)
insert into SampleTable    (id,name,age) values (3,'Ann',25)
insert into SampleTable    (id,name,age) values (3,'Ann',25)
insert into SampleTable    (id,name,age) values (3,'Ann',25)
insert into SampleTable    (id,name,age) values (3,'Ann',25)
insert into SampleTable    (id,name,age) values (4,'James',21)

Using the below given query you can easily find out the duplicates (number of duplicate records).

select SUM(rec_count) as rec_count from(
select COUNT (*) - 1 as rec_count from SampleTable group by CHECKSUM(*)
) T having COUNT(*) > 1

On the above query I have remove one record (COUNT (*) - 1), since one should be there as a valid record. And you really don’t need ‘having COUNT(*) > 1’, since non duplicate record count(*) will return 1 and count(*)-1 will be 0. It’s there for the ease of readability. So if you execute the above query you will get 11 records as the record count (Total 15 records, 4 valid records. So 15-4 = 11 records).

If you can see I have used ‘CHECKSUM(*)’. This to avoid typing all field names. Without using that the query would be like ‘group by id,name,age’.

And finally we can build the query to delete duplicates like this. First we must find the valid records, which should not be deleted. The way to do is using the function ‘ROW_NUMBER’. Using that we assign a unique row number for each record and select the maximum row number for each group. Then we will only get one record per group.

select MAX(row_num) from (
select ROW_NUMBER() over (order by checksum(*)) as row_num, CHECKSUM(*) as ChkSum  
from SampleTable
) as T Group By ChkSum

And if you execute the above query you will get the following result:

It will return row numbers 5,9,14 and 15 as valid records which we must keep. And we must only delete records which the row number is not equal to the ones that’s been returned from the above mentioned query. First we’ll select those records (Only for checking purpose). You can select those records using the following query.

select T.* from(
    select ROW_NUMBER() over (order by checksum(*)) as row_num, CHECKSUM(*) 
    as ChkSum from SampleTable) as T
    where T.row_num not in (
        select MAX(row_num) from (
            select ROW_NUMBER() over (order by checksum(*)) as row_num, CHECKSUM(*) 
            as ChkSum from SampleTable
        ) as T Group By ChkSum
    )

And if you execute the above query you will get the following result.

So if you see closely row numbers 5,9,14 and 15 are not there. So we can sure, that we are deleting the correct set of records. So in order to delete the duplicated we can use the following query.

    
delete T from(
    select ROW_NUMBER() over (order by checksum(*)) as row_num, CHECKSUM(*) 
    as ChkSum from SampleTable) as T
    where T.row_num not in (
        select MAX(row_num) from (
            select ROW_NUMBER() over (order by checksum(*)) as row_num, CHECKSUM(*) 
            as ChkSum  from SampleTable
        ) as T Group By ChkSum
    )

And if you query the table you will get the following result.