Use of DISTINCT, UNION and UNION ALL in SQL Server

Recently I was reviewing the code in one of the projects and came across a query in which multiple record sets (SELECT Statements) were combined using an UNION operator. I noticed another thing that in the individual SELECT statements, a DISTINCT clause was being used to get the distinct values and then they were using a UNION operator to combine them into a single distinct/unique record set.
While working with any operators/functions in SQL Server/T-SQL, it is important to understand how each of them work to be able to make effective use of these features and to deliver a Optimized, High Performing and Quality code.
Note the following things when working with DISTINCT, UNION and UNION ALL clauses/operators in T-SQL:
  • DISTINCT clause is used to get unique set of records from the SELECT statement.
  • UNION operator when used to combine multiple record sets, combines the rows from multiple SELECT statements and removes duplicates.
  • UNION ALL operator when used to combine multiple record sets, combines the rows from multiple SELECT statements and does not remove duplicates.
Hence while using a combination of DISTINCT & UNION or DISTINCT & UNION ALL, make sure to test the query and see if it is producing the expected result.
Let’s take a look at a simple example.
Create two simple temp tables using the below query:
CREATE TABLE #Temp1 (ID INT, Name NVARCHAR(100))
GO
CREATE TABLE #Temp2 (ID INT, Name NVARCHAR(100))
GO
Now, let’s load some sample data into these tables.
INSERT INTO #Temp1 (ID, Name)
SELECT 1, ‘One’
UNION ALL
SELECT 1, ‘One’
UNION ALL
SELECT 2, ‘Two’
UNION ALL
SELECT 3, ‘Three’
GO
INSERT INTO #Temp2 (ID, Name)
SELECT 1, ‘One’
UNION ALL
SELECT 4, ‘Four’
UNION ALL
SELECT 4, ‘Four’
UNION ALL
SELECT 5, ‘Five’
GO
Now let us take a look at the data in these tables.

Note that there is duplication of record with ID 1 in #Temp1 table and a duplication of record with ID 4 in #Temp2 table. And also there is a common ID (ID = 1) between the two tables.

Now, finally let us take a look at each of the combinations.

Combination-1 : Using only UNION and No DISTINCT

This combination where UNION operator is used to combine rows from multiple SELECT statements and there is no DISTINCT clause in the individual SELECT statements results in a final result set without any duplicates. The reason being, UNION operator removes the duplicates from the final result set.

Combination-2 : Using both UNION and DISTINCT

This combination where UNION operator is used to combine rows from multiple SELECT statements and there is a DISTINCT clause in the individual SELECT statements results in a final result set without any duplicates. The reason being, DISTINCT clause removes duplicates from the individual SELECT statements and UNION removes duplicates from the final result set.

Combination-3 : Using only UNION ALL and No DISTINCT

This combination where UNION ALL operator is used to combine rows from multiple SELECT statements and there is no DISTINCT clause in the individual SELECT statements results in a final result set with duplicates. The reason being, UNION ALL operator simply combines rows from multiple result sets and does not remove duplicates from the final result set.

Combination-4 : Using both UNION ALL and DISTINCT

This combination where UNION ALL operator is used to combine rows from  multiple SELECT statements and there is a DISTINCT clause in the individual SELECT statements results in a final result set with duplicates. The reason being, DISTINCT clause removes duplicates from the individual SELECT statements and UNION ALL simply combines rows from individual SELECT statements and does not remove duplicates from the final result set. Now, in our example since we have a common record with ID = 1 between the two tables, we will still get duplicates in the final result set.

Hope this article helps in getting a better understanding on which operator to use when it comes to using DISTINCT, UNION and UNION ALL clauses/operators in T-SQL.

Reference: SQL Server Books Online.
Advertisements

About Dattatrey Sindol (Datta)

Datta is a Microsoft BI Enthusiast, passionate developer, and a blogger. View Full Profile

Posted on March 20, 2011, in Comparisons, Interview Questions, SQL Server, T-SQL. Bookmark the permalink. Leave a comment.

What are your thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s