[Net 2000 Ltd. Home][DataBee Home][DataBee Manual][DataBee FAQ]

An Overview of the DataBee
Extract and Load Process

The DataBee software consists of three applications - each of which is dedicated to a particular function. The applications install on a Windows PC and form a sequential process which accomplishes the task of creating a subset database.

The steps below are a quick summary of the actions required to build a subset database with the DataBee software. Please see embedded links for more information on each step and the DataBee Quick Start Guide for a detailed, practical, step-by-step walk through of the DataBee design, extract and load methodology.

In general, the actions you would perform with the DataBee software in order to create a referentially correct subset database from a much larger source version are:

Build the Extraction Set
An extraction set contains the connection information, table structure and rules required to extract a referentially correct data subset from a source database. Extraction sets are built using the Set Designer application and run using the Set Extractor application.

The development of the rules in the extraction set is the most difficult part of the DataBee process and this is the step that will require the most effort. Fortunately the rules only need to be built once. After the extraction set has been constructed, it is simple to run it repeatedly in order to extract subset databases on demand. The DataBee Quick Start Guide discusses the best method of building an extraction set. All users are strongly advised to follow the Quick Start Guides step-by-step iterative approach to building a the extraction rules - it is very effective.

Build the Loader Set
A loader set contains the connection information, table structure and rules required to load the target database with a referentially correct set of data. Like extraction sets, loader sets are built using the Set Designer application but they are run using the Set Loader application.

The development of the rules in the loader set is usually a very straightforward process and usually takes a relatively small amount of time. The DataBee Quick Start Guide contains a chapter on building a loader set and all implementers are advised to follow the methods described there.

Perform the Extraction Operation
The extract of the database subset is performed by the launching the Set Extractor application and then running the previously prepared extraction set. Detailed feedback on the the extraction process is provided - so at any time you are aware of which rules are running, the tables on which they operate and the number of rows per second which are being extracted.

When the extraction process is complete the ROWID values of every row from every table which will need to be moved to the target database will have been identified and stored in a temporary table in the source schema (or proxy). The extraction process simply identifies the rows (by ROWID) and at no time is actual data transmitted to the Windows PC on which the Set Extractor application is installed.

Build the Structure of the Target Schema

The target schema must be created and the tables and indexes in this schema which will receive the subset data must be created.

The quantity of data loaded in the target schema will be smaller than that in the source - sometimes much smaller. Often in such situations, it is desirable to adjust the tablespace and storage clause definitions of the tables and indexes in the target schema so they are not allocated more space than is necessary. This requires obtaining the DDL structure of all tables and indexes and rebuilding them with new tablespace and storage clause information. Often the code for the schema DDL structure is available in a separate repository or documentation system.

A companion tool from Net 2000 Ltd. (the authors of DataBee) called the DDL Wizard is available to all DataBee users. The DDL Wizard software can generate the DDL rebuild scripts for an existing schema and this source code can be manipulated with easy to apply rules which modifies the tablespace and storage definitions of the tables and indexes. The structure of the subset target database can then be rebuilt with smaller storage allocations and fewer tablespaces using the modified DDL definitions.

Perform the Load Operation
The load of the database subset is performed by the launching the Set Loader application and then running the previously prepared loader set. As with the extraction operation, the process is multi-threaded for increased speed and detailed feedback on the tables being loaded, number of rows loaded and the rows per second is provided.

Running loader set rules execute in the target schema and pull the data through a database link (or via direct schema-to-schema copy) from the source into the target. At no time is the data transmitted to the PC running the Set Loader application. The Set Loader software simply initiates and controls the load process.

The Subset Schema is Complete
Once the load has finished, the destination subset database is complete. You are guaranteed that no duplicate rows will have been loaded and that all foreign keys (if present) will enable and all logical data relationships defined by an extraction rule will be valid. The subset database will contain every row that is required to be present by the rules and there will be no extra rows.

The load process can be repeated many times for multiple target schemas. It is not necessary to re-extract in order to load other targets schemas with copies of the same subset data.


[Net 2000 Ltd. Home][DataBee Home][DataBee Manual][DataBee FAQ]