Tuesday, February 5, 2013

ABCs of CDC with SSIS 2012

Today I presented a session as part of Pragmatic Works free "Training on the T's", called "ABCs of CDC with SSIS 2012". The session was attended by 507 people and I received a lot of positive feedback (even though my voice as a bit rough because of a cold that I'm getting over).

This session focuses on defining what CDC (Change Data Capture) is for those that have heard of it and maybe even know that it has been offered with SQL Server 2008 and greater, but didn't really know how to use it. I also show how it is much easier to consume the CDC data now with SSIS 2012, since it now includes native components for CDC.

I have posted all of the slides and demo materials for this session to my SkyDrive for anyone to download and try out. The only thing that you will need to get this up and running in your environment is the AdventureWorksDW2012 database from CodePlex  and a SQL Server 2012 Enterprise/Developer/Evaluation Edition instance with the Database Engine and SSIS setup on (CDC only works on these editions of SQL Server). You will need to make sure you have full admin rights on this instance since the commands to setup CDC require that level of permission.

Below are a few of the most common questions that I received in the session with answers:

What is the performance impact for using CDC on a database/table?
This will vary based on the amount of changes that are being done to the database/table you are using CDC on, but Microsoft has optimized this process to use as little CPU/memory as possible. As with any process that you are going to add to a database you should test this in your own environment to know for sure what the impacts are. There is a great White Paper available on MSDN that covers many different scenarios and impacts of CDC on each.

How is CDC different from Replication, Mirroring or other Disaster Recovery/Backup Scenarios?
CDC is only going to track the changes to the data in the database/table that it is implemented on, so it will not be able to capture security changes or many other DDL changes done to a table (columns being added/removed are tracked, but with limits, see MSDN article). The main purpose behind CDC is for a quick way to track the changes to the data made over time and to periodically apply those changes to another database with the same schema. CDC is not meant to be used in place of Replication, Mirroring, Log Shipping or other disaster recovery type of scenarios. It is possible to use some of these other technologies with a database/table that has CDC enabled (see the "Change Data Capture and Other SQL Server Features" section of the MSDN article linked here). The most common use for CDC is where a source database needs to be loaded into a data warehouse or the staging area for a data warehouse and you want to get all of the data inserts, updates and deletes applied appropriately without using triggers or other custom code that will have to be maintained.

Can CDC be enabled on any database/table?
Yes it can, but it must be enabled at the database level first and then the table(s) in that database that you want to track with CDC have to be enabled separately. CDC is not on by default on any databases or tables when it is first setup. There are also some important column restrictions that are shown in the slides that I have attached above.

Is it possible to have the CDC track only Updates and Inserts and not Deletes (or any combination of these)?
The CDC stored procedures provided by Microsoft to enable CDC on the database and table do not have any options to ignore certain changes, so once it is enabled it will track all Inserts, Updates and Deletes done to that table. When you are creating the processes that use the CDC data to apply the changes to the destination database, you can choose to not implement any of the types of changes you don't want by using the _$operation column and looking for the appropriate value(s). If you are using SSIS 2012 and the new CDC Splitter, then you can choose not to implement any of the data flows off of it that you don't want to have processed.

Can the SSIS 2012 CDC components read the CDC source data from older versions of SQL Server?
Yes! The SSIS 2012 CDC components can connect to a SQL Server 2008/2008R2/2012 CDC enabled database/table and use that data to then update any SQL Server 2008/2008R2/2012 destination (along with the possibility of other databases that have the required ADO .NET data providers in SSIS 2012). See this link for details on using SSIS 2012 CDC with Oracle!

For more details on how CDC works, there are great resources available on MSDN.

If you have any other specific questions, please feel free to email me at the address in the slides attached above or send me a message on Twitter.