Change Data Capture since Last Request Package Sample

11/09/2008 21:38:10


This Readme describes the purpose of the Change Data Capture since Last Request Package Sample and the components of the sample. For a more extensive discussion of the scenario that this sample addresses, and the technology behind the sample, download the related white paper from http://download.microsoft.com/download/0/7/9/079586D0-B6B0-45DD-A6D5-AB9BAA41F120/CDCSSISLSN.docx.

Note:
This sample works only with SQL Server 2008 Enterprise and SQL Server 2008 Developer. This sample will not work with any version of SQL Server earlier than SQL Server 2008, or on any edition other than SQL Server 2008 Enterprise or SQL Server 2008 Developer.



This sample shows how to incorporate change data capture technology into an extraction, transformation, and loading (ETL) environment that uses Integration Services. This sample is a complete end-to-end sample that includes both workload generation and the initial load of target tables as part of the sample setup. After the initial load finishes, the master package runs periodically to gather the changes that have occured since the last request. The master package then updates the target replicas to reflect those changes.
The source database is the AdventureWorks2008 database. This sample uses existing tables in the AdventureWorks2008 database to populate comparable destination tables in the CDCSample schema in the AdventureWorksDW2008 database.

Note:
This sample uses log sequence numbers (LSNs) to specify the range for the change data capture functions that request change data. To see how to use datetime values to specify the query range when you query for change data, see Readme_Change Data Capture for Specified Interval Package Sample.




Important:
Samples are provided for educational purposes only. They are not intended to be used in a production environment and have not been tested in a production environment. Microsoft does not provide technical support for these samples.



Requirements

To run this sample package, the following requirements must be met:

Location of the Sample Files

This sample consists of the following files:
Location of the Solution File and Readme File
The following folder is the default location of the solution file and Readme file for this sample:
<drive>:\Program Files\Microsoft SQL Server\100\Samples\Integration Services\Package Samples\Change Data Capture Since Last Request Package Sample\
The following table provides the complete name and a short description of the files in this folder.

File Description
Change Data Capture LSN Based Sample.sln The solution file for this sample.
Readme_ChangeDataCapturesinceLastRequestPackageSample.htm The Readme file for this sample.


Location of the Project Files and Packages
The following folder is the default location of the project files and packages for this sample:
<drive>:\Program Files\Microsoft SQL Server\100\Samples\Integration Services\Package Samples\Change Data Capture Since Last Request Package Sample\Change Data Capture LSN Based Sample\
The following table provides the complete name and a short description of the files in this folder.

File Description
Change Data Capture LSN Based Sample.dtproj The project file for this sample.
SetupCDCSample.dtsx The setup package and sample test harness.
MasterCDC.dtsx The master package that performs an incremental load cycle.
CDCCustomerExtract.dtsx The child package that gathers the changes to the Customer table for the extraction interval, and then applies these changes to the destination.
CDCCreditCardExtract.dtsx The child package that gathers the changes to the CreditCard table for the extraction interval, and then applies these changes to the destination.
CDCWorkOrderExtract.dtsx The child package that gathers the changes to the WorkOrder table for the extraction interval, and apply these changes to the destination.


Note:
If you did not install this sample to the default location, you will have to modify the package variables, SQLServerInstallPath and BasePath , to correspond to the appropriate installation path. The default value of the SQLServerInstallPath package variable is <drive>:\Program Files\Microsoft SQL Server\. The value of the BasePath package variable derives from the value of the SQLServerInstallPath package variable.

Location of the Transact-SQL Script Files
The following folder is the default location of the Transact-SQL script files for this sample:
<drive>:\Program Files\Microsoft SQL Server\100\Samples\Integration Services\Package Samples\Change Data Capture Since Last Request Package Sample\Change Data Capture LSN Based Sample\Scripts\
The following table provides the complete name and a short description of the files in this folder.

File Description
CDCSetupTables.sql The sample setup file.
CDCCustomerInsert.sql The script that applies Inserts to CDCSample.Customer table.
CDCCustomerModify.sql The script that applies Updates to CDCSample.Customer table.
CDCCreditCardInsert.sql The script that applies Inserts to CDCSample.CreditCard table.
CDCCreditCardModify.sql The script that applies Updates to CDCSample.CreditCard table.
CDCWorkOrderInsert.sql The script that applies Inserts to CDCSample.WorkOrder table.
CDCWOrkOrderModify.sql The script that applies Updates to CDCSample.WorkOrder table.
CDCCleanup.sql The script that removes all objects that the sample creates.To apply this script, run sqlcmd –E –iCDCCleanup.sql .


Running the Sample

To run the package, you can use either the dtexec command prompt utility (dtexec.exe) or BI Development Studio.
To run the package by using the dtexec utility:
  1. Open a Command Prompt window.
  2. Use the Change Directory command, cd, to change the directory that contains the dtexec utility. The default location of the dtexec utility is <drive>:\Program Files\Microsoft SQL Server\100\DTS\Binn.
  3. At the command prompt, type the following command: dtexec /f "C:\Program Files\Microsoft SQL Server\100\Samples\Integration Services\Package Samples\Change Data Capture Since Last Request Package Sample\Change Data Capture LSN Based Sample\SetupCDCSample.dtsx
  4. Press ENTER.

For more information about how to use the dtexec utility run a package, see the topic "dtexec Utility" in SQL Server Books Online. To run the package in Business Intelligence Development Studio In BI Development Studio, on the File menu, point to Open, and then click Project/Solution. For Files of type, select Integration Services Project Files (.dtproj). Then, locate the Change Data Capture Since Last Request Package Sample folder, and double-click the file, Change Change Data Capture LSN Based Sample.dtproj . In Solution Explorer, in the SSIS Packages folder, right-click SetupCDCSample.dtsx, and then click Execute Package.

Sample Packages and Their Components

The sample consists of five packages:
The setup package—SetupCDCSampleThe setup package acts as a test harness for the sample. As a test harness, this package does the following processes: Performs all the required initialization. This includes creating a database snapshot. Generates a data manipulation language (DML) workload against the source tables while cycling the master package at 10 second intervals. The master package performs an incremental load of the target tables for all changes processed by the capture process since the last request. Monitors the progress of the run, and stops the sample after all the DML workload has been processed and applied to the destination tables. Validates, at the end of the run, the content of the destination tables. The setup package compares the destination tables with the content of the source tables, and then records the results in the event log. The master package—MasterCDCThe master package performs an incremental load of the target tables for the interval provided as input. To determine whether to continue with the incremental load, the master package checks the endpoints of the interval: If both endpoints for the extraction interval are in range, the master package starts the three child packages, which perform the incremental loads for the individual tables. If either the low or high endpoint is outside the interval where change data exists for a source table, the master package logs an error and stops the sample.After all the child packages have finished running, the master package logs a message to the event log that indicates the sample has finished for the interval. The master package then stops the sample. Three child packages—Extract Customer Data, Extract CreditCard Data and Extract WorkOrder DataThe structure of each of these child packages is identical: Each child package first creates the query that extracts the changed data. Then, the child package runs a data flow task that performs the incremental load of the changed data. The data flow task uses information in the result set that the query returns to split the returned rows. The data flow splits the rows into three distinct flows—insert, update, or delete—that correspond to the operation required to apply the row to the destination. The data flow task then runs the tasks that handle the various flows. The following sections describe the components found in each package.
SetupCDCSample Package Components
The SetupCDCSample package has several annotated functional groups:

Note:
To view the annotated functional groups, open the SetupCDCSample package in Business Intelligence Development Studio. To see a larger view of the package, click the four-way arrow in the lower-right corner of the Control Flow tab, and then position the view window over the package.



Initialize Environment and Enable Change Data Capture
This functional group contains a single Execute SQL task. The purpose of this task is to do the following:

Name Element Purpose
Create Tables and Enable Change Data Capture Execute SQL task Creates the sample source tables from existing AdventureWorks2008 tables:
Generate Workload
This functional group contains six Execute SQL tasks and a Script task:

Name Element Purpose
Insert Customer Table Execute SQL task Applies inserts to the AdventureWorks2008.CDCSample.Customer table.
Modify Customer Table Execute SQL task Applies updates to the AdventureWorks2008.CDCSample.Customer table.
Insert CreditCard Table Execute SQL task Applies inserts to the AdventureWorks2008.CDCSample.CreditCard table.
Modify CreditCard Table Execute SQL task Applies updates to the AdventureWorks2008.CDCSample.CreditCard table.
Insert WorkOrder Table Execute SQL task Applies inserts to the AdventureWorks2008.CDCSample.WorkOrder table.
Modify WorkOrder Table Execute SQL task Applies updates to the AdventureWorks2008.CDCSample.WorkOrder table.
Mark Workload Completion Script task Sets the following package variables:

Verify Capture Process is Running
This functional group contains one Execute SQL tasks and one Script task.
Perform Initial Load for Target Tables
This functional group contains one Script task and three Data Flow tasks:

Name Element Purpose
Create Database Snapshot Execute SQL task Creates a database snapshot for AdventureWorks2008.
Load Target Customer Table from Snapshot Data Flow task Uses an OLE DB source to extract data from the snapshot table, CDCSample.Customer, and then uses an OLE DB destination to apply the rows to the table, AdventureWorksDW2008.CDCSample.Customer.
Load CreditCard Table from Snapshot Data Flow task Uses an OLE DB source to extract data from the snapshot table, CDCSample.CreditCard, and then uses an OLE DB destination to apply the rows to the table, AdventureWorksDW2008.CDCSample.CreditCard.
Load WorkOrder Table From Snapshot Data Flow task Uses an OLE DB source to extract data from the snapshot table, CDCSample.WorkOrder, and then uses an OLE DB destination to apply the rows to the table, AdventureWorksDW2008.CDCSample.WorkOrder.


Prepare for Initial Extraction
This functional group contains one Execute SQL task, which determines an appropriate LSN range to synchronize the first incremental load to the initial load.

Name Element Purpose
Get End-points for Initial Query Interval Execute SQL task Extracts the last LSN that is included in the database snapshot from snapshot metadata. Then, increments this value to serve as the low LSN value for the first incremental load that is applied after initialization. The task then delays until the capture process has processed through this LSN value in the transaction log. After this condition is true, the task identifies the current maximum LSN as the high LSN value for the initial query interval. The task uses the function, sys.fnvarbintohexstr, to convert the binary LSN values for the initial query interval to strings that are saved as the package variables, User::ExtractStartLSN and User::ExtractEndLSN . The task uses the cdc.lsntime_mapping table to associate an approximate datetime value with the low endpoint, which the task saves as the package variable, User::ExtractStartTime .


Extract and Process Change Data
This functional group contains a For Loop container that includes two tasks:

Name Element Purpose
Cycle Master at 10 second intervals For Loop container First runs the package to gather changes for the interval, and then applies these changes to the target tables. The container then runs the Execute SQL task that first delays for 10 seconds and then computes the new extraction interval.The container stops processing when all the changes in the generated workload have been applied to the destination tables.
Run Master to Extract Data Execute Package task Runs the MasterCDC package that applies the incremental load for a given extraction interval to the destination tables.
Delay 10 Seconds and Set New Query Interval Execute SQL task Uses the function, sys.fn_cdc_hexstrtobin, to convert the string representation of the high LSN value for the previous query to a binary(10) value. Then, uses the function, sys.fn_cdc_increment_lsn, to increment this value to serve the low LSN value for the next query interval. After a delay of 10 seconds, uses the function, sys.fn_cdc_get_max_lsn, to obtain the current maximum LSN that has been processed by the capture process. This value serves as the high endpoint of the next query itnerval.If the low endpoint is greater than the maximum LSN, the task continues to delay for 10 seconds, and then obtains the current maximum LSN until the low endpoint does not exceed the maximum. The task uses the function, sys.fn_varbintohexstr, to convert the binary LSN values that act as the lower and upper bounds for the next query interval to strings. The task then saves these strings as the package variables, User::ExtractStartLSN and User::ExtractEndLSN . The task uses the cdc.lsn_time_mapping table to associate an approximate datetime value with the low endpoint, which is saved as the package variable, User::ExtractStartTime .


Validate Incremental Load and Report Run Status
This functional group contains an Execute SQL task and a Script task:

Name Element Purpose
Check for Mismatch in Replicas Execute SQL task Uses CHECKSUM to compare the replicas to the source tables. Based on this comparison, the task then sets the package variables, CustomerMismatch , CreditCardMismatch , and WorkOrderMismatch , accordingly.
Output Run Completion Status Script task Outputs a status message to the event log at the end of the run. (This message reports on the parity of the package variables, CustomerMismatch , CreditCardMismatch , and WorkOrderMismatch .) Then, sets the run status accordingly.


MasterCDC Package Components
When the SetupCDCSample package runs, five package configurations of type, "Parent package variable", enable the SetupCDCSample package to pass package variables to the MasterCDC package. The following table lists these package variables.

Parent Child
User::ExtractEndLSN ExtractEndLSN
User::ExtractStartLSN ExtractStartLSN
User::IntervalID IntervalID
User::BasePath BasePath


The following tables list the tasks, containers, data sources and destinations, and transformations that are used in the master package.

Name Element Purpose
Wait for Capture Process to Process Extraction Interval For Loop container Contains the following components:

Name Element Purpose
Check for Valid Interval Execute SQL task Sets the DataReady package variable to one of the following values:

Name Element Purpose
Log Extract Error Script task Logs an error to the event log when the DataReady package variable is set to 0.(This task runs in the For Loop Container.)
Extract Customer Data Execute Package task Runs the CDCCustomerExtract package after the For Loop container finishes, and the DataReady package variable is set to 1
Extract CreditCard Data Execute Package task Runs the CDCCreditCardExtract package after the For Loop container finishes, and the DataReady package variable is set to 1.
Extract WorkOrder Data Execute Package task Runs the CDCWorkOrderExtract package after the For Loop container finishes, and the DataReady package variable is set to 1.
Log Extraction Complete Script task Enters a message in the event log that indicates that the extraction interval has finished.This task runs after all child packages have finished.


CDCCustomerExtract Package Components
When the MasterCDC package runs, two package configurations of type, "Parent package variable", enable the MasterCDC package to pass package variables to the CDCCustomerExtract package. The following table lists these package variables.

Parent Child
User::ExtractEndLSN EndLSN
User::ExtractStartLSN StartLSN


The following table lists the tasks, containers, data sources and destinations, and transformations that the CDCCustomerExtract package uses.

Name Element Purpose
Generate SQL Data Query Script task Uses the package variables, StartLSN and EndLSN , to determine the LSN values to provide to the generated function that queries for change data. The query is deposited in the package variable, SQLDataQuery .
Process Change Data Data Flow task Uses an OLE DB source to query for change data. A data flow component then separates the returned rows into flows—insert, delete, and update—that correspond to the operation required to apply the row to the destination. An OLE DB destination handles the insert flow. Data Flow components that run SQL commands for each row handle the delete and update flows.
Query for Change Data OLE DB source Uses the query that the SQLDataQuery package variable supplies to extract change data. The query uses the connection manager, AdventureWorks2008.
Filter Based on Operation Conditional Split transformation Divides the results set that the query returns into three different flows. The transformation uses the value of the $operation column in each returned row to divide the results set:

Name Element Purpose
Customer Inserts OLE DB destination Processes the insert flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then inserts the specified rows into the target Customer table.
Customer Deletes OLE DB Command transformation Processes the delete flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then deletes the specified rows from the target Customer table.
Customer Updates OLE DB Command transformation Processes the update flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then updates the specified rows in the target Customer table.


CDCCreditCardExtract Package Components
When the MasterCDC package runs, two package configurations of type, "Parent package variable", enable the MasterCDC package to pass package variables to the CDCCreditCardExtract package. The following table lists these package variables.

Parent Child
User::ExtractEndLSN EndLSN
User::ExtractStartLSN StartLSN


The following table lists the tasks, containers, data sources, data destinations, and transformations that the CDCCreditCardExtract package uses.

Name Element Purpose
Generate SQL Data Query Script task Uses the package variables StartLSN and EndLSN , to determine the LSN values to provide to the generated function call that queries for change data. The query is deposited in the package variable, SQLDataQuery .
Process Change Data Data Flow task Uses an OLE DB source to query for change data. A data flow component then separates the returned rows into flows—insert, delete, and update—that correspond to the operation required to apply the row to the destination. An OLE DB destination handles the insert flow. Data Flow components that run SQL commands for each row handle the delete and update flows.
Query for Change Data OLE DB source Uses the query that the SQLDataQuery package variable supplies to extract change data. The query uses the connection manager, AdventureWorks2008.
Filter Based on Operation Conditional Split transformation Divides the results set that the query returns into three different flows. The transformation uses the value of the $operation column in each returned row to divide the results set:

Name Element Purpose
CreditCard Inserts OLE DB destination Processes the insert flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then inserts the specified rows into the target CreditCard table.
CreditCard Deletes OLE DB Command transformation Processes the delete flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then deletes the specified rows from the target CreditCard table.
CreditCard Updates OLE DB Command transformation Processes the update flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then updates the specified rows in the target CreditCard table.


CDCWorkOrderExtract Package Components
When the MasterCDC package runs, two package configurations of type, "Parent package variable", enable the MasterCDC package to pass package variables to the CDCWorkOrderExtract Extract package. The following table lists these package variables.

Parent Child
User::ExtractEndLSN EndLSN
User::ExtractStartLSN StartLSN


The following table lists the tasks, containers, data sources, data destinations, and transformations that the CDCWorkOrderExtract package uses.

Name Element Purpose
Generate SQL Data Query Script task Uses the package variables, StartLSN and EndLSN , to determine the LSN values to provide to the generated function call that queries for change data. The query is deposited in the package variable, SQLDataQuery .
Process Change Data Data Flow task Uses an OLE DB source to query for change data. A data flow component then separates the returned rows into flows—insert, delete, and update—that correspond to the operation required to apply the row to the destination. An OLE DB destination handles the insert flow. Data Flow components that run SQL commands for each row handle the delete and update flows.
Query for Change Data OLE DB source Uses the query that the SQLDataQuery package variable supplies to extract change data. The query uses the connection manager, AdventureWorks2008.
Filter Based on Operation Conditional Split transformation Divides the results set that the query returns into three different flows. The transformation uses the value of the $operation column in each returned row to divide the results set:

Name Element Purpose
WorkOrder Inserts OLE DB destination Processes the insert flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then inserts the specified rows into the target WorkOrder table.
WorkOrder Deletes OLE DB Command transformation Processes the delete flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then deletes the specified rows from the target WorkOrder table.
WorkOrder Updates OLE DB Command transformation Processes the update flow. Uses the connection manager, AdventureWorksDW2008, to connect to the destination and then updates the specified rows in the target WorkOrder table.


© 2008 Microsoft Corporation. All rights reserved.