Table of Contents
3.1 Network Options
3.1.1 Company Network
3.1.2 Internet
3.2 Transport Options
3.2.1 Access Methods
3.2.2 Compression
3.3 Accessing data stores
3.3.1 Accessing Databases
3.3.2 Accessing remote servers
3.3.3 Accessing cloud stores
3.4 Security
3.4.1 FTP
3.4.2 SFTP
3.4.3 SCP
3.4.4 HTTP
3.5 Data Flow Examples
4.1 Overview
4.2 QC Job Types
4.2.1 Database Synch
4.2.2 File Transfer
4.2.3 Cloud Bound
4.3 Starting a Job
4.4 Job Cancellation
5.1 Creating New Session
5.2 Modifying a Session
APPENDIX A. DATA DICTIONARY A-1
APPENDIX B. ERROR MESSAGES B-1
APPENDIX C. DATA PLATFORMS C-1
APPENDIX D. GLOSSARY D-1
Audience: Operations/Technology
Reporting
ETL
Enclosed is the latest edition of the QCSM-GUI Technical Specifications dated July 7, 2015. This document was originally published December 9, 2014 for data experts developing data integration interfaces to Oracle RDBMS. Updated versions were published as follows:
All updates to this document regarding QC are posted to team sharepoint.
The changes described in this edition of QCSM-GUI Technical Specifications will be available in the QCSF Production environment in Aug 15, 2015.
Effective July 15, 2014:
The following is an overview of the enhancements to QCSM-GUI effective with the July 15, 2015 release.
Chapter 1 |
No change |
Chapter 2 |
No change |
Chapter 3 |
No change |
Chapter 4 |
Added argument description |
Chapter 5 |
Added SQL Server versions |
Chapter 6 |
Changed Copy Vector menu |
To obtain the latest information or answers to questions about QCSM-GUI contact Alex Buzunov. Metrics Team provides expertize and necessary information to use QCSM-GUI and implement data move.
General information is maintained on the QCSM Web page.
Technical Support Business Support
Phone: Phone:
Fax: Fax:
E-mail: E-mail:
Web: Web:
The Query Copy Session Manager (QCSM-GUI) is part of integrated set of tools used for seamless bi-directional data integration between major data stores. In addition it supports self-service data extraction, and load by Data Experts. This application is developed by Metrics Team in a response to the technical challenges presented by ETL and reporting requirements. Final product presents means for ad-hoc, on-demand data transfer for upstream and downstream Oracle environments including table, partition, and query defined datasets. In addition data transfer sessions can be synchronized with other events and orchestrated.
Figure 1.1. An Overview of the QCSM-GUI Process.
User Creates User initiates QCSM-GUI Creates Scripts QC Performs Data is Loaded
Query Copy Session Data Transfer Required for Data Copy Data Transfer to Target DB
QCSM-GUI Technical Specifications provide Data Experts with resource for on-demand data transfer between staging and/or reporting environments allowing better accident resolution, query performance tuning, data debugging, and application testing.
This document is not intended to provide information on how to develop a system for data transfer or how to do database development. Instead it’s intended to describe what such a system must deliver to your organization.
Table 1.1 describes the contents and purpose of the QCSM-GUI Technical Specifications
Section |
Description |
|
Describes the purpose of the system and provides an overview of QCSM-GUI templates and technical requirements. |
|
Describes the purpose and requirements of parameter entry of the session. |
|
Describes data transfer configuration |
|
|
|
|
|
|
|
|
|
|
|
|
|
Describes each argument including permitted values. |
|
Defines detailed aspects of argument templates |
|
Defines all messages generated by QC-CLI |
|
Defines the words and phrases with meaning unique to Data Copy and QCSM-GUI |
3. TEMPLATES
When you plan a data move you have to define certain information about it. You must assign an alpha-numeric identifier to each copy request. The order identifier is a Batch Sequence Number or timestamp. It must be unique to your organization. The order identifier along with user name allow QC to maintain unique code snapshot and log files related to this particular order.
Order lifecycle from origination to completion includes metadata entry by Data Experts. To facilitate order definition process Data Experts are required to define source and target templates to help QC identify a method of data copy, host_map to set execution host, and connection credentials for source and target databases.
Transactional data can be synchronized using QC-CLI
Some useful source on the internet include.
This list about data replication is not exclusive. Further information about data synchronization is available from selected DB vendor and internet.
QCSM-GUI does not implement real-time replication. It makes no representations regarding the accuracy integrity or completeness of your data replication.
Data procedures.
-Pre-determined intervals throughout the day.
Data examinations include:
- maintain your synchronization procedures
- maintain a log of synchronization
- compliance examinations to existing adequacy procedures
- audit trial information against your organizational standards.
In order to facilitate examinations, Data Experts should document and maintain a log of synchronization procedures.
This log should contain date and time of synch whether target was truncated. This log should be maintained for the period of the time and accessibility specified by your DBA.
After review session can be modified and resubmitted.
For additional information about QCSM-GUI technical specs and requirements contact dedicated support.
This type of copy requires you to set the following arguments in Session Manager: source and target database, field separator, host_map, and table names.
You have to set source and target logins, source table and partition name, target partition name (if any), field separator, and host_map.
You have to set source and target logins, source table and sub-partition name, target sub-partition name (if any), field separator, and host_map.
You have to set source and target logins, query file name, target table name (if any), field separator, and host_map.
You have to set source login, source table name, field separator, and target file name.
You have to set source login, source table name, partition name, field separator, and target file name.
You have to set source login, source table name, sub-partition name, field separator, and target file name.
You have to set source login, query file name, field separator, and target file name.
You have to set target login, target table name, field separator, and source CSV file name.
You have to set source login, skip target table name, field separator, and CSV file name (file name has to include target schema, and table name).
You have to set source login, skip target table name, field separator, and CSV file name (file name has to include target schema, and table name).
You have to set source login, skip target table name, field separator, and Query file name (file name has to include target schema, and table name).
QCSM Technical Specifications has been created majorly to describe the technical requirements for transferring data between databases.
It provides detailed information about required source and target arguments and data formats. The most basic technical requirement is defining data vector and source and target templates.
Session metadata has to be packaged into one archived file and sent to remote server for execution. More than one session can be executed at the same time.
QC does not have to be executed manually from command line. QC Sessions are executed directly from QCSM interface. QC bat files can be generated from QCSM for external execution.
Bash script files are transmitted to Linux using SFTP protocol. Remote execution is performed using SSH.
Data copy shell script generates log files which are compressed and send back to calling environment. (For more information, see Section 7 – Feedback and Errors)
You can use QCSM GUI interface to access log files generated by remote data copy script.
For an overview of QCSM Technical Requirements andInformation Flow, see Figure 1.2
Figure 1.2 Overview of QCSM Technical Requirements and Information Flow.
This section provides the Data Flow specification which Data Experts can use to perform.
The goal of QCSM in developing these specifications is to provide simple and reliable information flow mechanisms which allow Data Experts to minimize development and operational complexity by using off-the-shelf data synchronization templates, while providing generic access to all available on the market data stores. This section discusses template options, security considerations, network options, design requirements needed to achieve this goal.
QCSM supports two execution types: local (User’s desktop) exec and remote (Linux) exec. Figure 3-1 shows the very high level details of these data load options.
Figure 3-1. Execution Options.
User has two options to copy data using QCSM.
Data copy job performance depends on network connectivity and database server load. User can switch between remote and local load using QCSM interface.
3.1.2 Linux Access
Remote load server are configured in host_map file using QCSM interface. Multiple servers can be configured.
3.2 Transport Options
QCSM provides tree options for User to transfer data between Windows desktop and POSIX compliant remote host (Linux/Unux). If transfer is via FTP it is set with CSV->FTP copy vector. If transfer is via SFTP it is set with CSV->SFTP copy vector. If transfer is via SCP it is set with CSV->PSCP copy vector. Figure 3-2 illustrates tree basic types of data transfer using QueryCopy.
Figure 3-2. Data Transfer Flows.
FTP : CSV, JSON
SFTP: CSV, JSON
SCP : CSV, JSON
FTP : CSV, JSON
SFTP: CSV, JSON
SCP : CSV, JSON
3.2.1 Access Methods
Table 3-1 summarizes the transfer methods and file formats that are pertinent toeach transfer method.
Assess Method |
FTP |
SFTP |
SCP |
CSV data files |
+ |
+ |
|
JSON data files |
+ |
+ |
|
logs |
+ |
+ |
+ |
code |
+ |
+ |
+ |
The specifications for each access method are described below:
Table 3-2 Summarizes thespecifications for each transfer method.
FTP |
SFTP |
SCP |
No file size limit. RFC 959 Compatible No encryption Binary Mode US ASCII code |
No file size limit. RFC 959 Compatible SSL Encryption Binary Mode US ASCII code |
Encrypted Server side certificates SSL Encryption |
Benefits of On-demand Data Sychronization Architecture |
|
Data loaded in Bulk manner using native loader provided by database manufacturer. |
User invokes data synch using variety of different predefined and tested templates. |
Atomicity of load |
The data synch copy-vector identifies the smallest unit of work (data load) providing separation of data transport and load from application logic, allowing user to focus on business requirements and not technical implementation. |
Performance |
Local data load uses local Hard Drive to persists User data. High performance load is done by switching to remote (server) load. |
Data Investigation |
Accessing temporary dump file produced by QC copy job provides additional method to access raw data in CSV or JSON files. |
Scalability |
Scalability is achieved by adding new remote servers and running parts of the load on multiple hosts in parallel. |
QCSM Real Time Data Synch.
Loading Schedules and Data Stores.
5.3 Typical data strategy
The following objectives must be accomplished to implement data synchronization strategy.
6.6 Job Scheduling – Control-M
Control-M can be used as dedicated scheduling tool for all jobs including QC. Control-M can be used to schedule and monitor both the batch and real time jobs within QC.
Control-M allows basic orchestration (job dependencies)
Real time Data Sync
QC can be configured to continuously capture Oracle transactional data. This service can be stopped on demand for cutoff batch processing or DW maintenance.
Near real time data copy.
QC job can be sceduled to start data copy at particular intervals during the day.
Batch Data Processing.
QCSM creates QC Sessions allowing User full control over data integration process. QC job can be scheduled to run as part of a larger dependency of data-centric ETL and batch pipeline which already exists in your DW environment.
QCSM is architected as multi-database system. Version 0.3.5 of QCSM supports all major RDBMS including number on noSQL stores. Asadditional data sources are added to QC lineup – more copy vectors are defined and added to menus.
3.3 Access to QueryCopy logs.
3.3.1 QC Feedback
Feedback items include information conveyed from data spooler and loader. Allfeedback items available via QCSM-GUI. The paragraphs below discuss each of the feedback items (see Section 6 – Feedback and Errors)
Job Status – Job status can be obtained from CLI window or QCSM session status. If job was submitted using QCSM job status will be displayed on RUN button. (Also see QCSM GUI User Guide)
Job Failure – Failure of a QC job is displayed in CLI window or Qcsm session status (Run button). Failed job can be resubmitted using QCSM-GUI or QC-CLI.
Job Logs – Application log file and Loader log files can be accessed in QCSM via “Output” tab.
3.4 Security
Copy Job submissions require valid usir ID and password combination for source and target databasees. In addition validLinuxlogin required if job is executed remotelly.
3.4.1 FTP
QC logging to company servers using supplied user IDand password. User ID and password are stored in host_mapfile for each Linux server configured. FTP sessions are not persistent so new session is open for each file transmission. Some files(shards) are transmitted in parallel using multiple FTP sesions. (See QC-CLI User Guide).
Files sent va FTP aro not encrypted and can onlybe sent via company network.
3.5 Data Flow Examples
The below diagrams present typical exchanges between QC-CLI and execution server.
In the Examples, Step 1, depicts a QC preparing code and submitting it to remote server. Step 2 includes copy job submission with a possible jeneration of error. Step 3 demonstrates QC resubmitting failed job using Session file (Details are in Section 5, and Appendix C)
Typical QC-CLI information flow via FTP.
Accessing Sensitive Data via Internet.
To access financial data strong encryption is required. Due to confidential nature of the market data you transfer all client software used to communicate with execution server must be 128-bit Strong Encryption. Strong Encryption provides the highest level of protection for information transmitted over the Internet and is particularly suited for confidential on-line transactions. NOTE: no encryption needed for in-house transfers.
Data Perspective.
The QCSM-GUI provides up to three data perspectives with which to view the data. Perspective allows User to secure data persisted in different ways. The available perspectives are:
Remote – data located on remote server
Cloud – data located in cloud storage
Local – data persisted on local drive.
When User creates a session QSCM decides which of the above perspectives is applicable.
“New Session” form provides mechanism to select preferred source and target templates which define data perspective. Once selected, the perspective will be applied to all data copy submissions.
To change a perspective open “New Session” form and chose new session templates. Argument values can be optionally inherited by new session.
The objective of this section is to describe QCSM data copy facilities. It provides an overview of data copy execution, describes copy vectors, templates, and provides representative sample of copy scenarios.
The mission of QCSM is to establish and maintain the complete life cycle of data copy in your Data Warehouse. To achieve this goal QCSM must be able to uniquely identify all related events including data extract and load. Furthermore it has to be able to link related artifact like load logs to QCSM Graphical User Interface (GUI).
User creates new Session which is used to execute data copy job.
Session contains information required to perform physical data copy. Job is a metadata, related to performed data transfer like logs, timestamp, and job status.
Table 4.1 lists available Session types. Each format is fully defined in Appendix C.
Session type |
Copy Vector |
Description |
Spool and Load |
ORA11G-ORA11G |
Copy from Oracle to Oracle |
Load from File |
CSV-ORA11G |
Load CSV file into Oracle table |
Extract to File |
ORA11G-CSV |
Extract table data to CSV file |
Inbound file transfer |
CSV-PSCP |
Transfer file to remote host |
Outbound file transfer |
PSCP-CSV |
Transfer file from remote host |
File splitter |
CSV-SPLIT |
Split CSV file to many. |
File merger |
CSV-MERGE |
Merge CSV file into one |
3.2.1 Data Copy Execution
When User originates a copy request to transfer data QCSM records certain information about the order, also it assigns an alphanumeric identifier (job ID) to each transfer. The Job ID may be a batch Sequence Number or another identifier meaningful to the process. Job ID must be unique across the company within Business Day.
The nature of the transfer must be recorded along with date and time of the job. Interdepartmental data copy must be approved by Compliance Control guarding the flow across Chinese Walls.
3.2.2 Data Copy Request
When User originates a Data Copy Job and then subsequently transmits data certain information is required to be reported including copy vector, timestamp, and status.
The Data Transfer Desk must be run as a place within the firm where transfer request can be executed either automatically or with assistance of Data Expert. Finally extra-departmental requests have to be handled with extra care and should require gatekeeper’s approval and manager’s sign off with specific instructions attached.
3.2.3 Data Security.
Along with requirements to protect information User must apply rigorous information security protocols.
Here are two obvious but vital principles User has to employ:
Passwords. Always keep your passwords in secret. Never write them down or give them out to anyone, even to IT stuff, for any reason. Data dumps located on your Hard Drive may contain sensitive information. If you think there is a chance someone else knows your password – change it immediately.
Locking your PC. You are responsible for any actions taken when your user ID is in use. You should ensure that you log off or lock your workstation whenever you are away from your desk.
4.2.4 Session Modifiction
Since ech Job must hve unique identifier, user initited modification od a job id is typically done via QCSM-GUI.
4.2.5 Job Execution
When User starts a copy job and QC-CLI subcequentrly executes it, certain information about this job is logged.
User must identify source and target connect arguments. Passwords are hidded in QCSM-GUI. In addition User must identify source and target table names, delimiter, andparallel config. NOTE: if target table is truncated QCSM will prompt for user confirmation.
4.2.6 Job Cancellation
To end job execution on Widows User can close QC-CLI window. It will terminate all spoler and loader processes immediately and rollback all loader transactions. Remote loads will terminate because when ssh session is closed it effectively terminates all child processes initiated by data copy shell.
4.3 Copy Vector Types
This section describes te types of Copy Vectors and related data elements. Copy Vector defines general direction of data move. Check figure 4.3 for example description.
Figure 4.3. Data Vector Example.
ORA11G – ORA11G
Source - Target
There are 4 types of entitied which can be used on either side of Copy Vector: database, file type, file transfer tool, file transformation tool.
Combination of listed types give full set of available Copy Vectors in QC.
Table 4.3. List of Copy Vector types in QC.
Copy Vector Type |
From |
To |
Result |
Description |
Database |
Database |
Database |
Data copied |
Defines data transfer direction between databases. |
Database |
File Type |
Data extracted |
Defines data extraction direction. |
|
File Type |
Database |
Data loaded |
Defines data load direction |
|
File Transfer |
File Type |
File Transfer Tool |
Data ransferred |
|
File Transfer Tool |
File Type |
Data trasferred |
||
File Manger |
File Type |
File transformation Tool |
File mangled |
|
4.3.1 Database Copy Vectors
Each database is identified by short identifier. For example Oracle 11G database is configured using following identifier:
Figure 4.3.1 Oracle identifier
ORA11G
Oracle database 11G Oracle versions
If there are multiple versions of the same database they are grouped into database family. For example Oracle family includes following Oracle database versions.
Table 4.3.1 Oracle Database identifiers.
Identifier |
Full database name |
ORA12C |
Oracle 12c |
ORA11G |
Oracle 11g |
ORA10G |
Oracle 10g |
ORA9I |
Oracle 9i |
ORA8I |
Oracle 8i |
ORA733 |
Oracle 7.3.3 |
ORAXE |
Oracle XE |
Oracle –
|---
|---
|---
|--
In order to define a copy vector User should utilize “short” databaseidentifier. Copy vector consists of 3parts: source db, pipe type,, targetdb
For example:
Source target
ORAC11G-ORA12C
Short dash short
Copy vector setsdirection of data transfer and type of transport used to do actual data copy.
4.3.2 File Types
Currently there are 2 data file types and one metadata file type in QC.
This means QCcan import andexport data in 2 formats: CSV and JSON-LINE and one metadata format: DDL
CSV abbreviates”coma-separated values” and is a format used topersist data in a flat file. JSON-LINE is a flat file format also called newline-delimited JSON. Exch line in such file is a valid JSON value. DDL is a metadata record about your database object.
User can crete copy vector for data extracts and load from and into CSV and JSON-LINE formats. Inaddition it is possible to create copy vector for metadataextraction and load in DDl format.
4.3.6 Data transfer types
There are 3 data transfer types defined in QC: FTP, SFTP, SCP. User can define copy vector for files to be ransferred to and from POSIX compliant systems. In this case copyvectorwill look like this:
Example 1:
Local remote
CSV – SFTP
CSV file Posix compliand host
Example 2:
Remote local
SCP – JSON
Posix host JSON-LINE file
4.3.7 Data transformation tools
The purpose of data munglin tools in to perform data transformation. Some examples are: file split and file merge. All data transformations are local to User PC.
Copy vector example:
Example 1:
Local Local
CSV- PLIT
One file Multiple files
Example 1:
Local Local
MERGE- JSON_LINE
Multiple files One file
5.0 Session Argument Lists
Session holds metadataabot your transfer, listof arduments needed toexecute qc.exe. Arguments are divided into 3 groups – common, source andtarget. Common aguments are “shared” by Spooler and Loader. Source anrguments areset exclusively for Spooler, and Target arguments are defined for Loader.
Some Common arguments are used by QC to set internal structures.
Major data elements of a new session are:
Data Copy Session (Sub-partition copy) |
||
Common Arguments |
Source Arguments |
Target Arguments |
-w [--copy_vector] -o [--pool_size] -r [--num_of_shards] -t [--field_term -l [--lame_duck] -K [--keep_data_file] -V [--validate] -U [--truncate_target] -E [--ask_to_truncate] -X [--key_on_exit] -L [--email_to] -M [--log_dir] -F [--default_spool_dir] -B [--job_name] -Y [--time_stamp] -0 [--column_buckets] -1 [--job_pre_etl] -2 [--shard_pre_etl] -3 [--job_post_etl] -4 [--shard_post_etl] -C [--loader_profile] -5 [--host_map] -6 [--spool_type] -dbg [--debug_level] -spID [--status_pipe_id] |
-q [--query_sql_file] -Q [--query_sql_dir] -c [--from_table] -P [--from_partition] -S [--from_sub_partition] -j [--from_user] -x [--from_passwd] -b [--from_db_name] -e [--nls_date_format] -m [--nls_timestamp_format] -O[--nls_timestamp_tz_format] -A [--header] -W [--keep_whitespace] |
-u [--to_user] -p [--to_passwd] -d [--to_db_name] -a [--to_table] -G [--to_partition] -N [--to_sub_partition] -k [--skip_rows] |
In order to execute new Job the following criteria should be met.
4.3.2 Job Execution Log
Whenever User transfers data betweendatabases “Ouptput” tab ispopulatedin QCSM GUI. It offers access to all application and SQL*Loader logs. Users can open log files using default editor.
It must be possible to 100% determine the correct functioning of all software modules and ETL scripts through the unit and regression test harness.
Capacity |
Horizontal scale. |
The design willallow the system to scale horizontally at the individual module level. Adding capacity to the system should always be achievable through adding new host servers. |
Management |
Operations Management |
The routine running configuration ofthis system will notrequire any shell access to the production server or databases. All configurationand monitoring isnot on User PC |
Performance |
Data copy performance is limited only by network bandwidth. |
|
4. Data Copy scenarios
This section provides a representative sample of Data Copy Scenarios which exist in QC Framework. It defines and illustrates selective set of scenarios and their requirements. NOTE: QC is not a real-time system, thus the representations in the scenarios do not reflect the actual time when QC jobs are submitted. They are only intended to illustrate which arguments must be set in for a copy task to be completed.
Each scenario is accompanied by a list of arguments required for it to work. While each scenario illustrates key points of how QC works they are not exhaustive.
Components and Responsibilities.
ID |
Component name |
Responsibilities |
Non-functional Requirements |
Sqlldr.exe |
Bulk loader |
Load CSV data into Oracle table |
Provides high performance bulk load protocol. |
Sqlplus.exe |
Db client shell |
Execute SQL statements. |
Provides access to data using SQL. |
Split.exe |
File splitter |
Split large files into smaller pieces |
Provides fast file split, byte or line based. |
Pscp.exe |
SCP client |
Transfer binary files |
Provides secure file transfer protocol for POSIX systems. |
Figure 6. QC Event Sequence Diagram.
Read table
Spool Data
Load Data
Table 4.2 Local Job Execution Events
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads metadata. |
QC makes SQL request to source databse to fetch table andcolumn metadata. |
4 |
QC extracts data from source |
QC creates SQL scripts and spools data using SQL*Plus. |
5 |
QC loads data to target |
QC created extract scripts using column metadata and invokes loader process. |
Figure 4.1 Table Data Copy Job Events
QCSM-GUI
This scenario illustrates common Event Flow for QC-CLI and includes the following assumptions.
Table 4-3. Remote Job Execution Events
Ref |
Event |
Event Details |
1 |
User submits a Job using QCSM |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument count and values, parses them to well-formed bat command and opens QC-CLI process user defined arguments. |
3 |
QC reads metadata. |
QC makes SQL request to source database to fetch table and column metadata. |
4 |
QC generates shell scripts. |
QC creates SQL and Bash Shell scripts for execution on remote host. |
5 |
QC transfers compressed code file to remote host. |
QC compresses generated shellcode and transfers it to remote host for execution. |
6 |
QC triggers data spool and load |
QC opens SSH connection to remote host and runs generated shell scripts. |
7 |
QC downloads log files |
QC transfers created log files back to User environment. |
Figure 4.2 Remote Table Data Copy Job Events
QCSM-GUI
This scenario illustrates common Event Flow for QC-CLI and includes the following assumptions.
Table 4.3 Sharded File Load Job Execution Events
Threads |
Shards |
Number of input files |
Source |
Target |
multiple |
multiple |
single |
SCV File |
Table |
todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
4 |
QC loads file to target |
QC shards input file and executes file load using multiple loader instances. |
QCSM-GUI
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.3 Multi-File Load Job Execution Events
Threads |
Shards |
Number of input files |
Source |
Target |
multiple |
One per file |
multiple |
CSV Files |
Table |
todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC loads files to target |
QC executes file load using multiple loader instances. |
QCSM-GUI
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.3 Multi-File Sharded Load Job Execution Events
Threads |
Shards |
Number of input files |
Source |
Target |
multiple |
Multiple per File |
multiple |
CSV Files |
Table |
todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC loads files to target |
QC shards all input files into logical partitions and executes file load using multiple loader instances. |
QCSM-GUI
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.4 Table Data Extract Job Execution Events
Threads |
Shards |
Number of temp files |
Source |
Target |
multiple |
multiple per table |
multiple |
Table |
CSV File |
todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads metadata. |
QC makes SQL request to source database to fetch table and column metadata. |
4 |
QC extract data to CSV file(s) |
QC executes Table extract using single spooler instance. |
QCSM-GUI
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.4 Sharded Table Data Extract Job Execution Events
Threads |
Shards |
Number of temp files |
Source |
Target |
multiple |
multiple per table |
multiple |
Table |
CSV Files |
todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads metadata. |
QC makes SQL request to source database to fetch table and column metadata. |
4 |
QC extract data to CSV file(s) |
QC shards source Table and executes file extract using multiple spooler instances. |
QCSM-GUI
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.5 Table Data Copy Events
Threads |
Shards |
Number of temp files |
Source |
Target |
single |
single per table |
single |
Table |
Table |
todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads metadata. |
QC makes SQL request to source database to fetch table and column metadata. |
4 |
QC extract data to CSV file(s) |
QC extracts source Table data to temporary CSV file using single Spooler. |
5 |
QC loads data to target Table |
QC loads temporary CSV file to target Table using single Loader. |
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.6 Sharded Table Data Copy Events
Threads |
Shards |
Number of temp files |
Source |
Target |
multiple |
multiple |
multiple |
Table |
Table |
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads metadata. |
QC makes SQL request to source database to fetch table and column metadata. |
4 |
QC extract data to CSV file(s) |
QC extracts source Table data to temporary CSV file using single Spooler. |
5 |
QC loads data to target Table |
QC loads temporary CSV file to target Table using single Loader. |
Sharded Source table
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.5 Partition Data Copy Events
Threads |
Shards |
Number of temp files |
Source |
Target |
single |
single |
single |
Partition |
Table |
todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads Partition metadata. |
QC makes SQL request to source database to fetch table and column metadata. |
4 |
QC extract data to CSV file(s) |
QC extracts source Partition data to temporary CSV file using single Spooler. |
5 |
QC loads data to target Table |
QC loads temporary CSV file to target Table using single Loader. |
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.6 Sharded Partition Data Copy Events
Threads |
Shards |
Number of temp files |
Source |
Target |
multiple |
multiple |
multiple |
Partition |
Table |
Todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads metadata. |
QC makes SQL request to source database to fetch table and column metadata. |
4 |
QC extract data to CSV file(s) |
QC extracts source Table data to temporary CSV file using single Spooler. |
5 |
QC loads data to target Table |
QC loads temporary CSV file to target Table using single Loader. |
Sharded Source Partition
Threads |
Shards |
Number of temp files |
Source |
Target |
multiple |
multiple |
multiple |
Partition |
Table |
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.5 Sub-Partition Data Copy Events
Threads |
Shards |
Number of temp files |
Source |
Target |
single |
single |
single |
Sub-Partition |
Table |
todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads Partition metadata. |
QC makes SQL request to source database to fetch table and column metadata. |
4 |
QC extract data to CSV file(s) |
QC extracts source Sub-Partition data to temporary CSV file using single Spooler. |
5 |
QC loads data to target Table |
QC loads temporary CSV file to target Table using single Loader. |
This scenario illustrates common Event Flow and includes the following assumptions.
4.4.4 Sharded Sub-Partition Data Copy
Threads |
Shards |
Number of temp files |
Source |
Target |
multiple |
multiple |
multiple |
Sub-Partition |
Table |
todo
Table 4.6 Sharded Sub-Partition Data Copy Events
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads metadata. |
QC makes SQL request to source database to fetch table and column metadata. |
4 |
QC extract data to CSV file(s) |
QC extracts source Table data to temporary CSV file using single Spooler. |
5 |
QC loads data to target Table |
QC loads temporary CSV file to target Table using single Loader. |
Sharded Source Sub-Partition
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.3 Query Load Job Execution Events
Threads |
Shards |
Number of temp files |
Source |
Target |
single |
single |
single |
Query File |
Table |
todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads metadata. |
QC makes SQL request to source database to fetch Query columns metadata. |
4 |
QC extract Query data to CSV file |
QC extracts source Table data to temporary CSV file using single Spooler. |
5 |
QC loads data to target Table |
QC loads temporary CSV file to target Table using single Loader. |
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.3 Multi-Query Parallel Load Job Execution Events
Threads |
Shards |
Number of temp files |
Source |
Target |
multiple |
multiple |
Multiple |
Query Files |
Table |
todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads metadata. |
QC makes SQL request to source database to fetch Query columns metadata. |
4 |
QC extract Query files data to CSV files |
QC extracts source Queries data to temporary CSV files using multiple spooler instances. |
5 |
QC loads data to target Table |
QC loads temporary CSV files to target Table using multiple Loaders. |
This scenario illustrates common Event Flow and includes the following assumptions.
Table 4.3 Multi-Query Sharded Parallel Load Job Execution Events
Threads |
Shards |
Number of temp files |
Source |
Target |
multiple |
multiple |
Multiple |
Query Files |
Table |
todo
Ref |
Event |
Event Details |
1 |
User submits a Job |
User select Session record and clicks “Run” button. |
2 |
QCSM opens QC-CLI window. |
QCSM verifies argument values, parses them to well-formed bat command and executes it in CLI window. |
3 |
QC reads metadata. |
QC makes SQL request to source database to fetch Query columns metadata. |
4 |
QC extract Query files data to CSV files |
QC extracts source Queries data to temporary CSV files using multiple spooler instances. |
5 |
QC loads data to target Table |
QC loads temporary CSV files to target Table using multiple Loaders. |
Sharded Query Files
This scenario illustrates common Event Flow and includes the following assumptions.
This section and Appendix C describe the detailed specification for the layout of data extracts.
Out of all available extract formats, QC is using CSV and JSON-LINE data formats to extract data from database. CSV is universal format, JSON-LINE is used in Ajax and noSQL database line MongoDB.
QCSM supports 2 row formats – CSV and JSON-LINE.
4.1.1 The following rules apply to CSV format:
4.1.2 JSON-LINE format rules.
HEADER record is not required for CSV file to be loaded into database. There’s no trailer record for CSV file. The rest of the records may be included in any order.
Header record is used to identify names and order of columns, separated by a delimiter.
The following is assumed while processing header record:
QCSM provides some testing facilities for User Jobs.
Users transmitting information should test their jobs thoroughly before they start full-sized data load.
All new Sessions in QCSM are created will test presets for trial transmissions. As test succeeds User can remove test limitations and execute full-size load.
Testing does not alleviate User from responsibility to do manual checks on validity of data loads.
Every new session is created with the following presets.
The following information is important for user to meet data synch requirements.
It provides specifications not covered elsewere and gives info on where to seek assistance for understanding these tech specs.