Data Profiling Using T-SQL: Part 2 – Column Length Distribution

I had written an article long ago on determining Column Null Ratio (Data Profiling) using T-SQL. While I was reviewing some of my articles, it occurred to me that, I forgot to post the next part of the Data Profiling series. Hence this blog post.
In this blog post, I will be showing a simple T-SQL script to capture the Column Length Distribution for a table.
— Provide the Schema Name and the Table/View Name
DECLARE @DatabaseName NVARCHAR(255) = N’AdventureWorksDW’
DECLARE @SchemaName NVARCHAR(255) = N’dbo’
DECLARE @TableName NVARCHAR(255) = N’DimProduct’
BEGIN
SET NOCOUNT ON
— Declare the parameters internal to query
DECLARE @SQLString NVARCHAR(MAX) = N”
DECLARE @ParamDefinition NVARCHAR(MAX) = N”
DECLARE @ColumnList TABLE (
ColumnId INT IDENTITY(1,1)
, ColumnName NVARCHAR(255)
, ColumnMaxLength INT
, ColumnMinLength INT
)
DECLARE @ColumnMaxLength INT
DECLARE @ColumnMinLength INT
DECLARE @ColumnCount INT = 0
DECLARE @LoopCounter INT = 1
DECLARE @ColumnName NVARCHAR(255)
SET @SQLString =
N’SELECT COLUMN_NAME FROM ‘
+ @DatabaseName + N’.’ + N’INFORMATION_SCHEMA.COLUMNS
WHERE
DATA_TYPE IN (”CHAR”, ”NCHAR”, ”VARCHAR”, ”NVARCHAR”)
AND TABLE_SCHEMA = @SchemaName
AND TABLE_NAME = @TableName’
SET @ParamDefinition = N’@SchemaName NVARCHAR(255), @TableName NVARCHAR(255)’
INSERT INTO @ColumnList (ColumnName)
EXECUTE sp_executesql @SQLString, @ParamDefinition, @SchemaName, @TableName
SELECT @ColumnCount = COUNT(*) FROM @ColumnList
WHILE (@LoopCounter <= @ColumnCount)
BEGIN
SELECT @ColumnName = ColumnName
FROM @ColumnList
WHERE ColumnId = @LoopCounter
SET @SQLString =
N’SELECT ‘
+ ‘@ColumnMinLength = MIN(LEN(‘ + @ColumnName + ‘))’
+ ‘, @ColumnMaxLength = MAX(LEN(‘ + @ColumnName + ‘))’
+ ‘ FROM ‘
+ @DatabaseName + N’.’ + @SchemaName + N’.’ + @TableName
+ ‘ WITH (NOLOCK) ‘
SET @ParamDefinition = N’@ColumnMinLength INT OUTPUT, @ColumnMaxLength INT OUTPUT’
EXECUTE sp_executesql @SQLString, @ParamDefinition, @ColumnMinLength OUTPUT, @ColumnMaxLength OUTPUT
UPDATE @ColumnList
SET
ColumnMinLength = @ColumnMinLength
, ColumnMaxLength = @ColumnMaxLength
WHERE ColumnId = @LoopCounter
SET @LoopCounter += 1
END
SELECT
ColumnName AS [Column Name]
, ColumnMinLength AS [Column Minimum Length]
, ColumnMaxLength AS [Column Maximum Length]
FROM @ColumnList
ORDER BY [Column Name]
END
Below is a sample output of the above T-SQL script and also the output of Data Profiling Task in SSIS with the same Database Table as input.
T-SQL Script Output
SSIS Data Profiling Task Output
Here are the highlights of the above T-SQL script:
  • Takes Database Name, Schema Name and Column Name as inputs and produces the Column Length Distribution as output.
  • With few minor changes, this script can be used in any of the SQL Server versions.
  • This script is a one step process to get the Column Length Distribution whereas data profiling using SSIS Data Profiling Task is a two step process – First generate the xml file with the profile and then use Data Profile Viewer to view the generated profile.
  • Only SQL Server Management Studio (SSMS) with necessary connectivity to the required server/database is sufficient to capture the profile whereas in case of SSIS Data Profiling Task, it requires an ADO.Net Connection Manager.
  • Knowledge of T-SQL is sufficient whereas the SSIS Data Profiling Task requires some basic knowledge of SSIS.
  • It does not provide the distribution of values in each of the individual columns whereas in case of SSIS, we can get the distribution of values in each of the columns.
Since I am a great lover of T-SQL, I prefer using T-SQL as much as possible. However, SSIS being a powerful ETL tool has its own advantages and T-SQL cannot beat SSIS when it comes to ETL.
Note: This demonstration is based on SQL Server 2008 R2.
Advertisements

About Dattatrey Sindol (Datta)

Datta is a Microsoft BI Enthusiast, passionate developer, and a blogger. View Full Profile

Posted on February 13, 2012, in Code Snippets, SQL Data Profiling, SQL Server, T-SQL, T-SQL Snippets. Bookmark the permalink. Leave a comment.

What are your thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s