SQL Tidbits #3 – Finding Unicode Characters

In SQL Server you can declare string columns as either varchar or nvarchar. One main difference is the characters the column can accept. Nvarchar can accept all unicode characters where varchar can only accept characters in an 8-bit codepage. This means the character Ɛ won’t work with the varchar datatype. Fortunately, and unfortunately, SQL Server will automatically convert unsupported characters when inserting into a varchar field. But the results can be strange.

Starting with a simple table we can see the effects of various actions.

CREATE TABLE SimpleTest (
	AnsiValue varchar(10),
	UnicodeValue nvarchar(10)
);

INSERT INTO SimpleTest
VALUES ('A', 'A'),('B','B');

SELECT * FROM SimpleTest;

unicodecharaters_1

At this point there is no difference between the two fields. Any character that is acceptable in the varchar datatype also works in the nvarchar datatype. But what happens when the character is outside of the varchar acceptable range?

INSERT INTO SimpleTest
VALUES (N'ē', N'ē'),(N'Ɛ', N'Ɛ');

SELECT * FROM SimpleTest;

unicodecharaters_2

Now we can see the problem with the varchar datatype when using Unicode characters. If you look too fast, row 3 might appear the same but the conversion stripped off the bar over the e. Row 4 is even worse. Instead of the actual letter, the conversion results in a question mark. This is what happens to most Unicode characters when converting to varchar. So the good is that you won’t get errors with the implicit conversion. The bad is that you don’t know what characters you’re losing.

An area where errors can happen is outside of SQL Server when trying to write characters from an nvarchar datatype to an ASCII file. There are numerous tools for writing to files so I won’t go into that, but some tools will fail if, and usually only if, they encounter non-ASCII characters. It can be extremely frustrating when trying to write out 200,000 rows and the process fails because 1 row has an unsupported character. There are ways to resolve this. The first way is to switch to UTF-8 or UTF-16 for your file encoding. But we all know that failures happen in the middle of the night, when we least expect them, and you have to just get it working again…right now.

The other way to find unsupported characters may be apparent in the result set showing the side-by-side of AnsiValue and UnicodeValue. The varchar value that SQL Server uses in the conversion will not match the original nvarchar value. Using this we can compare the value to its converted self. If they are not equal then the column contains unsupported characters.

INSERT INTO SimpleTest
VALUES (N'Wērner', N'Wērner'),(N'FOƐ', N'FOƐ');

SELECT * FROM SimpleTest;

unicodecharaters_3

Now to find the rows where the UnicodeValue column contains unsupported characters use the following code.

SELECT * FROM SimpleTest
WHERE UnicodeValue <> CONVERT(NVARCHAR(10), CONVERT(VARCHAR(10), UnicodeValue));

unicodecharaters_4

Notice that the first two rows with standard ASCII characters don’t return. The rest contain values with the unsupported characters we expect.

SQL Tidbits #2 – Default Varchar Length

It is a pretty common thing to have to convert one datatype to another datatype. When converting to varchar it would look like the following.

SELECT CONVERT(VARCHAR(30), 12345)

This would convert the integer value 12345 into a string. Maybe it’s for a formatting reason or maybe you need to concatenate it with another string. Something of interest is that you can get lazy and use this snippet.

SELECT CONVERT(VARCHAR, 12345)

In this example you will see absolutely no difference in the results. However, this laziness can create problems. Leaving off the length does not make the length the same as anything that happens to be converted. It defaults to 30. So what would happen with this example?

SELECT CONVERT(VARCHAR, 'The young woman unzipped the carrying case for her little dog.')

There would not be an error and the return value would be “The young woman unzipped the c”, which is quite an interest start to a sentence but not the intended result. The moral of the story is to always explicitly enclose your length.

SQL Tidbits #1 – Commit (Transaction)

I’ve been busy lately, but not with writing blog posts. I’ve started a new job at an awesome company called HomeAway, moved my family to Austin, and spent much of my “spare” time doing house renovation work. But I wanted to get something out there on the off chance that somebody will gain a useful tidbit from me.

So here is a little tidbit about transactions in SQL Server. I recently came across some code that was wrapped like this.

BEGIN TRANSACTION
…
COMMIT

I’ve always used a COMMIT TRANSACTION so I wondered if you really needed the TRANSACTION part. It turns out COMMIT is not exactly the same as COMMIT TRANSACTION. COMMIT by itself is equivalent to COMMIT WORK. The WORK word is optional. This syntax is SQL-92 compatible. The action of COMMIT by itself does the same thing as COMMIT TRANSACTION. The big difference is when you use named transactions like BEGIN TRANSACTION abc. If you do this you have to use COMMIT TRANSACTION abc. COMMIT by itself, indeed, will not work.

That’s my tidbit for the day.

Finding defaults constraints

There have been times when I need to know the name of a default constraint. Most often it is because I need to modify a column or table and to do so I must first remove the constraint. The annoying thing is when there isn’t any kind of naming convention on constraints. If you define a constraint such as a varchar column with a default of empty string like DEFAULT(”), then SQL Server will generate a basically random name for the constraint. To find the name of all the constraints in a database you can use this query.

SELECT s.name AS SchemaName, t.name AS TableName,
       c.name AS ColumnName, d.name AS ConstraintName
FROM sys.default_constraints d
INNER JOIN sys.tables t
    ON d.parent_object_id = t.object_id
INNER JOIN sys.schemas s
    ON d.schema_id = s.schema_id
INNER JOIN sys.columns c
    ON t.object_id = c.object_id
    AND d.parent_column_id = c.column_id;

If you need to know the constraints on a single table then just limit based on the name from sys.tables.

SELECT s.name AS SchemaName, t.name AS TableName,
       c.name AS ColumnName, d.name AS ConstraintName
FROM sys.default_constraints d
INNER JOIN sys.tables t
    ON d.parent_object_id = t.object_id
INNER JOIN sys.schemas s
    ON d.schema_id = s.schema_id
INNER JOIN sys.columns c
    ON t.object_id = c.object_id
    AND d.parent_column_id = c.column_id
WHERE t.name = 'Subscriptions';

Finding database and table sizes with sp_spaceused

There is a handy system stored procedure called sp_spaceused that can help when you need to know the space usage of a database or table within a database. There are two ways you can use it. The first way gives some basic size information about the database itself. There are no parameters and it returns two result sets.

EXEC sp_spaceused;

This call will yield something similar to the following.

The other way to use the procedure will give you information about a particular table within the database.

EXEC sp_spaceused N'Person.Address';

When using sp_spaceused for a single table there is only one result set.

As you will notice, this procedure can only get space usage about one table at a time. There are a few techniques to get the statistics about all of the tables in a database. One is to create a temp table and use the undocumented procedure sp_msforeachtable to run sp_spaceused for each table. Rebecca Lewis has an example of this. One issue with this technique is that the result doesn’t include the schema, so if you have multiple tables with the same name that exist in different schemas, you won’t be able to tell them apart.

There are other techniques around that don’t use sp_spaceused that can give you information all in one query. One example is on Luke Hayler’s blog. Taking it a step further,  has an artcle on SQLServerPerformance.com that explains a way to track table growth.

PASS vs DevConnections

I recently attended the DevConnections conference in Las Vegas. This was my first time attending this particular conference. Having attended the PASS conference the previous 4 years, I thought I’d do a little analysis of the two from my perspective.

Sessions

The main and obvious difference is in the subject matter. PASS is designed specifically for SQL Server. It covers all things relating to SQL Server from query performance tuning, to data warehousing, to Sharepoint integration. The sessions are clearly marked for difficulty level and can get very deep and technical.

DevConnections, on the other hand, has a plethora of “connections”. They each have their own focus and so there is a little bit for everyone. If you focus on Windows management or .NET development there are sessions for each. There is also a SQL Server Connections track for DBAs. If you happen to span both camps of DBA and .NET developer then you can attend a mix of the sessions as the registration allows for access to any session at the entire conference. Strangely, the sessions under the SQL Server track were designated with numbers that gave you an idea of the experience level required, but all of the other tracks were numbered seemingly randomly.

Presenters

There are between 100 and 200 sessions at both events. That’s quite a lot for three days and there is potential for seeing lots of different presenters. I was very pleased at the overall quality of the presenters at DevConnections. Every session that I attended had a very good to excellent presenter. It is my understanding that there are only seasoned presenters at DevConnections. With PASS anyone can submit to present. The presenters are then chosen to fill all available slots. Because there are roughly the same number of sessions, but all focus on SQL Server instead of spread across different fields, there is opportunity for newer presenters to enter the scene. This can be hit or miss. I have attended a couple of session over the past 4 years that were not well delivered. But PASS also has many excellent presenters from all across the SQL Server field.

Vendors

There isn’t much surprising about the vendors. The number of vendors is about the same. There are some big ones and there are some smaller ones. At PASS the vendors are all focused in some area related to SQL Server. At DevConnections they are as diverse as the breadth of the sessions. Both conferences give ample opportunity to visit the booths during exclusive hours and around lunch time. Each conference also provides for a time one evening set aside for meeting the vendors while enjoying some food and drink.

Recordings

One big difference I see is the availability of session recordings after the conference. PASS records the audio and slides/demos of every session and makes them available for purchase during and after the conference. With these you can virtually attend any sessions that you missed due to time conflicts or non-session experiences. When registering for DevConnections you automatically get a DVD for the connection of your choosing. You can also get the other DVDs pretty cheaply. What I came to find out was that it simply contained the slides and scripts for the presentation. Sometimes, the presenter must not have gotten their materials in on time so you simply get a session description. In the SQL Server and .NET sessions in particular there is heavy emphasis on demonstration and so slides are sparse and scripts don’t tell the whole story. Because of this I found that I had difficulty deciding what I would attend and what I would miss. With the session recordings from PASS I never feel that I will miss something I want to hear.

Non-Session Experiences

At PASS there are multiple reasons I might miss a session. One simple one would be a time conflict between two sessions of interest. But PASS also offers a computer lab where you can go through lessons on how to use certain features of SQL Server. There are also tables in the exhibit hall where you can talk to experts on different areas from database design to BI. In another area you can meet with members of the SQLCAT team to discuss matters large or small with your SQL Server environment. Yet another option is hallway chat with other members of the SQL community that you only get to see in person at this event.

DevConnections offers sessions. That’s about where it stops. There was one evening that you could attend an open forum to discuss any matter thrown out. I did attend with about 75 other people out of the thousands of attendees. It was a good experience and I did get some insight out of it. However, it was after a long day of sitting through presentations.

Community

The last difference I noticed was that of community. PASS, being specific to SQL Server probably has an easier time developing ways to bring together to whole group of attendees seeing as how they can at least relate in their product focus. DevConnections, because of its diversity, seems like it has some cliques. DevConnections does have a Facebook page and a Twitter hash tag but I found that after the first day I stopped paying attention to them because people rarely posted anything, it didn’t relate to my areas of interest, or it was a vendor asking for people to stop by their booth.

Conversely, at PASS, Twitter is constantly humming. Perhaps the problem is trying to keep up. Additionally, each night there is a gathering of some sort to bring everyone together for merriment and discussion outside of sessions. It’s a chance to unwind and have a good time with each other. As I mentioned before, you also have opportunity throughout each day to sit with various people in the community and get to know fellow SQL Server professionals. PASS has also pushed to get local user groups to host events throughout the year all around the world to encourage continued learning and community. Perhaps the user groups are strong in the other areas like .NET but it isn’t obvious from the DevConnections conference what’s going on when you get back to the office.

Conclusion

Both conferences definitely have their merits. Obviously, if you are not a SQL Server professional then PASS doesn’t hold much interest for you. But if you are a SQL Server professional or wear many hats, then you may find yourself deciding which to attend. If you primarily focus on SQL Server I would suggest PASS. If you deal mostly with another area and only dabble in SQL Server then you might find DevConnections useful for learning something about many different areas. DevConnections does happen twice a year in the spring and fall so if your budget allows you could even give them both a shot. Give me a shout if you attend and hopefully I’ll see you there.

Compressing SQL Server Backups

Backups in SQL Server have the potential for a great amount of compression. Back with SQL Server 2000 I had devised a way the automate the compression of the backups after the backup jobs ran. Even with SQL Server 2005, all database backups to disk are stored as an uncompressed bak file. You have to look to third party tools like Red Gate SQL Backup to compress on the fly or perform compression of the files after the backup process completed. With SQL Server 2008 you are able to create a backup that compresses during the backup process. One unfortunate caveat is that the feature was only available in Enterprise Edition and above. But with SQL Server 2008 R2, Microsoft added the feature to Standard Edition as well. Another thing to keep in mind is that compressed backups can be restored to any edition of SQL Server since SQL Server 2008, but not any version from SQL Server 2005 and before.

Using compression during backup is simply a matter of selecting compression from a drop down, adding a switch to a T-SQL command, or setting the server default for compression. Using the SSMS interface, right click the database, go to Tasks, and select Back up. Set any of the General settings as you normally would. Under the Options page, at the bottom, is a drop down for compression. Simply choose to “Compress Backup” and that’s it.

Choose Compress Backup on Options Page

To use compression with T-SQL, all you need to do is add the COMPRESSION switch to the backup statement in the WITH clause.

BACKUP DATABASE [CAI] TO  DISK = N'E:\MSSQLBackup\CAI.bak' WITH COMPRESSION;

By default, SQL Server sets the backup compression to 0, or off. To change the default value use sp_configure.

sp_configure 'backup compression default', 1
RECONFIGURE
GO

You may want to evaluate whether or not you want the default value set to true. Books Online states that compression significantly increases CPU usage. But since most backups occur during non-peak hours, it may be a great trade off for the disk space savings.

You can analyze the effects of compression by looking at a couple of fields in the backupset table. For example, the following statement shows that the backup achieved a 5:1, or 80%, compression ratio. Instead of a 1.4GB backup it created a 283MB backup.

SELECT TOP 1 name, backup_size, compressed_backup_size, 
       backup_size/compressed_backup_size as ratio 
FROM msdb..backupset where name like 'MID%';

This backup had an 80% compression rate

Give compression a try and save some space. For more in depth discussion see this article by SQLCAT back in 2008 that talks more about compression ratios and ways to optimize the process.

Concatenate/Pivot Data with FOR XML–and Count It

I recently needed to concatenate or pivot some data into a comma-delimited list. There are numerous examples around (Ken Simmons, Rob Farley) showing how to accomplish this by using FOR XML PATH and the STUFF function. My only addition is to add the ability to count the number of items in each list. Here is a simple example to get started.

-- Uses syntax for SQL Server 2008 R2
DECLARE @table TABLE (value varchar(10));

INSERT INTO @table VALUES ('A'),('B'),('C'),('D');

-- Pivot with STUFF and FOR XML PATH
SELECT STUFF((SELECT ',' + value FROM @table
FOR XML PATH('')), 1, 1, '') as val;

Here is the result:

Often it isn’t so useful to just rotate a single column to get a list. Instead, you may want to rotate a list of values per some group of divisions. You might want a list per state or per department. Here is an example that shows how to rotate per a division.

-- Take it a step further with additional values
DECLARE @table2 TABLE (division int, value varchar(10));

INSERT INTO @table2 VALUES (1,'A'),(2, 'B'),
                   (1, 'C'),(2,'D'),(3,'E');

-- Pivot with STUFF and FOR XML PATH and DISTINCT
SELECT DISTINCT division,
  STUFF((SELECT ',' + value FROM @table2 t2 WHERE t1.division = t2.division
  FOR XML PATH('')), 1, 1, '') as val
FROM @table2 t1;

Here is the result:

For my purpose it was also going to be handy to know the number of items in each list. With just a couple of tweaks you can get the count of values per division at the same time as the rotate and even drop the DISTINCT.

-- Take it one more step doing groupings and counts</span>
DECLARE @table3 TABLE (division int, value varchar(10));

INSERT INTO @table3 VALUES (1,'A'),(2, 'B'),(1, 'C'),(2,'D'),(3,'E');

-- Pivot with STUFF and FOR XML PATH and DISTINCT
SELECT division, COUNT(*) as divisioncount,
  STUFF((SELECT ',' + value
  FROM @table3 t2 WHERE t1.division = t2.division
  FOR XML PATH('')), 1, 1, '') as val
FROM @table3 t1
GROUP BY division;

Here is the final result:

Cleaning Up Backup History

Every time there is a backup or restore action, SQL Server stores a record in msdb. In my setup I perform quite a few log backups every day. If I allow the logs to continue to grow I eventually start to see performance degradation on the backup and the restore sides. You can clean up this history in a couple of ways. One way is to set up a Maintenance Plan that cleans up the history. I use this on a day to day basis. Sometimes I may need to remove history records manually so I use this little snippet.

USE msdb
GO
DECLARE @OldestDateToKeep DATETIME
SET @OldestDateToKeep = CONVERT(VARCHAR(10), DATEADD(dd, -30, GETDATE()), 101)
EXEC sp_delete_backuphistory @OldestDateToKeep
GO

Still Here and Backing Up

I’ve been spending quite a bit of time in the .NET world lately and haven’t done much on the SQL Server side. But I thought I’d try out the WordPress IPad app to see how it feels. I did have a recent need to supply someone with a report of the backups of a database. I have been using a custom built implementation for log backups using Redgate SQL Backup. Their tool does still keep track of the backups within SQL Server. So I was able to use msdb..backups and other associated tables to create a query to give me the results I needed and even break it out by SQL Server Full backups, Redgate Full backups, and Redgate Log backups.