Wednesday 28 May 2014

Find Diacritics (accent) marks in SQL Server



We had some issues where a downstream system was using usernames with diacritics in the URL. This was breaking the downstream system.
The following code may help you find diacritics in your data

/*
 Example script for how to find diacritical marks/accents in data
*/

/* Create some example data with and without accents */
DECLARE @exampledata TABLE (exampleString VARCHAR(50)) --has to be varchar
INSERT INTO @exampledata ( exampleString )
VALUES  (N'Peter'),(N'AURÉLIEN'),(N'JEREMY'),(N'kroužek'),(N'tomato'),(N'voël')

/*  Temp table for accent insensitive 
 Load with Accent insensitive collation */
DECLARE @insensitive TABLE (exampleString VARCHAR(50))
INSERT INTO @insensitive ( exampleString )
SELECT exampleString Collate SQL_Latin1_General_CP1253_CI_AI FROM @exampledata

/* Check for accent sensitive items */
SELECT  *
FROM    @exampledata s
WHERE   NOT EXISTS ( SELECT 1
                     FROM   @insensitive i
                     WHERE  i.exampleString = s.exampleString )
go