[Net 2000 Ltd. Home][DataBee Home][DataBee Manual][DataBee FAQ]

An Overview of
DataBee Loader Sets

The DataBee software is designed to build smaller copies of large Oracle databases. In order to figure out which tables to access and which rows to load, DataBee needs information about the target schema it is dealing with (tables , table columns, existing foreign key constraints etc.) and the location of the source data. It also needs to be able to record user specified load requirements.

The collection of information about a target schema, its structure, connection details, data source and defined loader rules is called a loader set. A loader set is built and maintained by the DataBee Set Designer application.

Physically, loader sets are stored on disk as standard XML text file. So yes, technically the loader set files are user editable via a text editor. Please be aware that making manual adjustments is unsupported - the Set Designer application is the only supported solution. We do recognize that situations may sometimes arise where a quick search and replace on the loader set file may save hours of work. If you do edit the loader set files by hand, be sure to save a backup copy first and perhaps email Support@DataBee.com for some advice before proceeding.

Loader sets are built with the DataBee Set Designer application and executed with the Set Loader application. The Set Loader needs to have access to a list of ROWIDs of the rows in the source schema which will need to be copied to the target. The DataBee Set Extractor application builds the ROWID list during the execution of an extraction set. The two tools work together to create a referentially correct subset database. It is possible to load many target schemas from the results of a single extraction run.

Building A Loader Set

Loader sets are simple to build - much simpler than extraction sets. The recommended method of building a loader set is to:

  1. Create the basic structure of the loader set using the New Loader Set Wizard in the Set Designer. [tutorial (local) (internet)] When the New Loader Set Wizard completes, all of the typically required rules will have been built and populated.
  2. By default, all triggers in the target schema are disabled during the load of the data and re-enabled afterwards. If this is not desirable then edit the Disable Trigger Manager rule to inhibit the disable of any (or all) of the triggers.
  3. By default, all tables in the target schema will be truncated prior to the load. If some tables should not be disabled then edit the contents of the Truncate Manager rule to inhibit the truncate of the specified tables. Note it is necessary to truncate any table that will be loaded - the Set Loader software cannot "just load the rows which are not already there". All tables which will be loaded must be empty before the load process starts.
  4. If there are tables in the target schema which have a different column structure to that in the source schema then the transformation engine for that table in the Load Manager rule will need to be edited. [tutorial (local) (internet)] Any tables which should not be loaded can also be disabled in the Load Manager rule.
  5. Save the set in the Set Designer. If an extract has previously been performed on the source schema configured into the loader set, then open the saved loader set in the Set Loader and run it.
  6. Once the loader rules complete in the Set Loader, the appropriate extracted rows from the source schema will have been copied across to the target tables and a referentially correct subset will have been created.

The list above is just a summary. The DataBee Quick Start Guide provides a detailed, practical, step-by-step walk through of the DataBee design, extract and load methodology. The Extraction and Load Process help file provides a summary the subset database creation and population process.

Loader Set Rules

As discussed above, a loader set is a container for the source and target connection information, the table/index/foreign key structure details and for the loader rules which control the load activities. All of this information is encoded in the loader set in the form of rules.

One key rule, called a Controller Rule, describes the schema into which the data will be loaded and also provides the contact information for the schema which contains the extracted source data. The Rule Controller also contains the schema structure (tables, indexes foreign keys) and has dependent rules which describe the load actions. Every loader set has at least one Rule Controller (you can have more than one). Every other type of loader rule has a parent Rule Controller and will execute in the schema defined within that Rule Controller.

Other than the Rule Controller and the Command rules, all of the other rules which can be configured in a loader set are of a type known as manager rules. Manager rules encapsulate a large number of actions and perform those actions in parallel. If you are unfamiliar with DataBee manager rules we strongly advise reading the short What Is A Rule Manager help file.

The following is a brief summary of the rules which can be configured in a loader set. Each rule has its own detailed help file and a slightly longer summary can be found on the New Loader Rule Form help page.

Rule Controllers
A Rule Controller contains login information for the target schema and as well as the schema structure information for the target schema. Rule Controllers tell their dependent loader rules which database and schema they should connect to in order to perform their actions. The Rule Controller in a loader set also contains the connection information of the source schema which contains the extracted rows. This connection information can be the name of a database link if the data will be pulled from though a database link or the name of the source schema if a direct schema to schema copy is to be used. Usually only one Rule Controller is created in any loader set. It is possible to create a loader set with multiple Rule Controllers and to load the data which supports referential relationships between them. However, this is a very advanced topic and you should definitely contact DataBee Support prior to implementation.

Foreign Key Manager Rules
Foreign Key Manager rules are used to disable the foreign keys in the target schema prior to load and to re-enable those foreign keys afterwards. The foreign keys must be disabled prior to the load because, although the Load Manager rule will copy over every row referentially required to be present in the target schema, it cannot ensure that while the load is actually in process that referentially integrity between all tables will always be maintained. Thus the foreign keys have to be disabled before the load takes place and re-enabled afterwards.

Trigger Manager Rules
Trigger Manager rules are used to disable the triggers in the target schema prior to load and to re-enable those triggers again afterwards. It is not absolutely necessary for the triggers to be disabled during the load process. However if tables which contain INSERT triggers are loaded, those triggers will fire and the load process will slow down considerably. Often such triggers are in only place to support auditing mechanisms which are not important in a subset test and development database. In such cases, disabling the triggers while loading is the best option.

Truncate Manager Rules
A Truncate Manager rule removes all rows from the target table. By default, all tables in the target schema are truncated. If a table should not be truncated, then the Truncate Manager rule can be edited to omit that table from its truncation operations.

Load Manager Rules
Load Manager rules perform the actual copy of the data from the source to the target database. It is possible to edit the Load Manager rule to skip the load of specified tables and if the table structure is different between the source and target schemas it is possible to configure specific data transformation rules on a table.

Command Rules
Command rules can execute any Oracle SQL command or PL/SQL script. This type of rule is often used to perform administrative functions which assist with the load process. An example of this would be performing an analyze of the schema after it is loaded to ensure that the statistics on each table are current.

Rule Blocks and Loader Rule Execution Order

The Set Loader application is multi-threaded. It can, and will, run multiple rules simultaneously. The number of rules which can run in parallel is determined by the Number of Loader Threads setting on the Misc. Setup tab. Unless strictly specified (see below), each loader rule is considered by the DataBee software to be an independent entity and its order of execution is not guaranteed.

In addition, certain types of loader rule (Foreign Key Manager, Trigger Manager, Truncate Manager, and Load Manager) encapsulate a large number of subsidiary actions. If the manager rules run simultaneously, then their actions will also run simultaneously and this may not be appropriate. For example, the Truncate Manager rule will truncate the tables in the target schema and the Load Manager will populate them with extracted data. It would not be acceptable for a table to be loaded by the Load Manager only to have the Truncate Manager remove all rows from it at a later time. The order of execution of the two rules is important and clearly the Truncate Manager rule must complete its actions before the Load Manager rule activates.

An Example Loader Set Rule Blocks

Since the rules in a loader set can, and will, execute in parallel the DataBee software implements a concept called Rule Blocks to explicitly control the execution order. A rule block is the two-digit numeric prefix listed before the rule number. Rule blocks are processed in strict numeric order and all rules in a given rule block will complete before any rule in the next highest rule block begins. Inside a rule block the rules execute in random order as determined by the optimization routines and the availability of worker threads.

In the image above, six rules are listed. In this case each rule has been given its own rule block. For example, rule 10-0002 disables the foreign keys. Because it is the only rule in rule block 10 there will be no other rules operating at that time. The foreign key disable operations will proceed in parallel according to the number of worker threads set on the Set Loader Misc. Setup tab.

Once rule 10-0002 has completed, the rules in the next highest rule block will be activated. In the example above this is rule 15-0003. After that rule completes, the the chain of rules 20-0004, 30-0005, 40-0006 and 45-00007 will all execute sequentially one after the other because they are in separate rule blocks.

Note that the decision to place rule 0002 in rule block 10 and rule 0003 in rule block 15 is somewhat arbitrary. Since these rules only disable foreign keys and triggers they could reasonably execute at the same time and hence they both could be placed in the same rule block. Doing so would have no effect on the results of the loader set. Sometimes the rule block really matters though - certainly rule 0004 needs to be in a lower rule block than rule 0005 since the truncate must happen before the load. The decision of where in the execution sequence to place a rule is up to the implementer of the loader set. For example, if a Command Rule was added to the loader set it would be necessary to carefully adjust the rule block of that rule so the SQL command it contains executes at the appropriate point in the loader set.

[Net 2000 Ltd. Home][DataBee Home][DataBee Manual][DataBee FAQ]