Wednesday 30 May 2018

Computed columns in SQL Server

Computed columns are type of columns which the values are derived based on one or more other columns. Hence the data type on the computed column depends on the result of the derived column values.
Computed columns is a feature which has been there in SQL Server since version 2000. But in my experience I feel that it has been a feature which's been used less compared to many other features available, and during discussions and interviews this is something which most developers slips or fails to answer.

Why do we need computed columns ?

First we will consider a case where we need to store details on a table without the usage of computed columns.
Consider we have a table which contains employee details. We have two columns to store employee’s first and last names. But we also required to have a column which we need to store their full name as well by concatenating the first and last names. So the correct way is to have the third column which contains the full name and the data needs to be inserted during the employee record is created and it should be maintained in the case where the details are updated as well. Otherwise the data integrity will be lost. (One might debate that the full name can be built from the business logic code using the first and last names. But for the illustration purpose we would consider that we are maintaining it using SQL Server)

CREATE TABLE dbo.Employee(
    Id     INT 
    ,FirstName    VARCHAR(30)
    ,LastName    VARCHAR(30)
    ,FullName    VARCHAR(61)
)
How ever we could achieve the same with the use of a computed column and with a less effort compared to the first approach.

CREATE TABLE dbo.Employee(
    Id     INT 
    ,FirstName    VARCHAR(30)
    ,LastName    VARCHAR(30)
    ,FullName AS CONCAT(FirstName,' ',LastName)
)

Let’s insert few records to the table which we created now.

INSERT INTO dbo.Employee(Id, FirstName, LastName) 
VALUES (1,’John’,’Doe'),(2,’Jane’,’Doe')

image

PERSISTED, DETERMINISTIC or NON-DETERMINISTIC ?

The values reflected on computed column can be either deterministic or persisted.
When the values are deterministic or non-deterministic the value in the column will not be saved on to the table physically. Instead it always calculated during the query execution. Hence the value could differ based on the functions you use in the formula. E.g: If you use GETDATE() in the calculated column, it will always return a different value during each execution.

CREATE TABLE dbo.Employee2(
    Id     INT 
    ,FirstName    VARCHAR(30)
    ,LastName    VARCHAR(30)
    ,CreatedDate AS GETDATE()
)

INSERT INTO dbo.Employee2(Id, FirstName, LastName) VALUES 
    (1,'John','Doe') 


And when queried the calculated column returns different values as shown below.

image
**Note: The above mentioned can be achieved using a default constraint as well. I have used that example on strictly illustration basis.
You can further read on deterministic and non-deterministic function on the following Microsoft documentation.
https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-2017
Computed column values can be persisted by adding the keyword PERSISTED when the column is created using T-SQL or by the table designer in SSMS.
We will drop ‘FullName’ column and recreate the column.

ALTER TABLE dbo.Employee DROP COLUMN FullName;
ALTER TABLE dbo.Employee 
 ADD FullName AS CONCAT(FirstName,' ',LastName) PERSISTED;
**Note: If you try to drop the ‘CreatedDate’ column on Employee2 and try to create it as PERSISTED, it will throw an error. Because computed columns can only be persisted when it’s deterministic.
Msg 4936, Level 16, State 1, Line 45
Computed column 'CreatedDate' in table 'Employee2' cannot be persisted because the column is non-deterministic.

Now when the expression is evaluated during the execution, the ‘FullName’ will be saved into the table.
The data is read-only to the developer and it’s maintained by the engine. When the data is changed on the columns which was used in the formula, the computed values will be changed.