[Net 2000 Ltd. Home][DataBee Home][DataBee Manual][DataBee FAQ]

DataBee Where Clause Rules

Where Clause rules are used to extract the initial driver rows for an subset schema. These are the rows to which the remainder of the subset data extracted by the Table-To-Table rules will be referentially related. Usually there is only one Where Clause extraction rule in an extraction set although it is possible to implement an extraction set with multiple driver tables. It is always a good idea to seek advice from the DataBee support team prior to implementing an extraction set with multiple driver tables.

Where Clause rules are always added manually using the New Where Clause Rule Form (see below for details). The creation of the Where Clause rule which defines the initial driver rows is almost always the first action taken after a new extraction set has been created by the New Extraction Set Wizard.

How to Choose a Driver Table and Rows

The choice of the driver table is fundamental to the design of the extraction set. Typically the decision is based on the intended use of the subset database. One effective method of deciding on the choice of driver table is to ask the end users of the subset database to finish the following sentence: "I would like a subset database based on...". The answer received usually determines the choice of driver table. For example, if they say "I would like a subset database based on all of the customers of this type", then probably the CUSTOMER table will be the likely choice. However, if they say "I would like a subset database based on all of the invoices issued in the last quarter", then probably the INVOICE table is a better choice.

Where Clause rules can choose the sample data based on a Where Clause. Any Where Clause which would be valid in a statement on an SQL*Plus command line will work in a DataBee Where Clause rule. Furthermore, Where Clause rules contain sophisticated methods for reducing the data quantity returned by the Where Clause. It is possible to sample the data and also limit the amount of rows by applying a cap to the total number. Note that if no Where Clause, sampling or options are chosen then the Where Clause rule is effectively an All Rows rule.

Implementing a Where Clause

The Where Clause should be added as it would be in a SELECT statement used on the SQL*Plus command line. Do not enter the entire query - only the Where Clause is needed. DataBee will add the other components of the SQL statement as required.

For example, if the desired rows could be retrieved from the INVOICE_TABLE table via a SQL query such as:

Select * from INVOICE_TABLE
where INVOICE_NUM>1000
  and INVOICE_NUM<2000
then make an entry in the Where Clause box of:
WHERE INVOICE_NUM>1000 and INVOICE_NUM<2000

Note that it is necessary to include the word "WHERE" however a terminating semicolon is optional - DataBee will strip it off if it is present.

It is possible to implement a multi-table join in a DataBee WHERE clause rule. The SQL statement which the DataBee Set Extractor application runs in the source schema appears as follows:

select SRC.rowid from SOURCE_USER.DTB_INVOICE_LINE SRC 
All that is required for a multi-table join is to lead the Where Clause with a comma and add additional join tables as you would within a normal SQL statement and then append the real WHERE conditions as usual. For example, the statement:
select SRC.rowid from SOURCE_USER.DTB_INVOICE_LINE SRC 
       , DTB_INVOICE XXX
       WHERE SRC.invoice_number > 200
       and XXX.INVOICE_NUMBER=SRC.INVOICE_NUMBER
could be constructed by implementing a Where Clause of: ", DTB_INVOICE XXX WHERE SRC.invoice_number > 200 and XXX.INVOICE_NUMBER=SRC.INVOICE_NUMBER". Note that the leading comma is necessary - DataBee will not fill it in by default. The SRC alias is always used for the main target table of the query and can be referenced in the remainder of the SQL statement as required.

Notes on Row Sampling and Row Limits

If the sampling option is enabled on a Where Clause rule, the sampling sifts the values returned by the Where Clause. For example, if a where clause states where region_ID=10 and there is a 10% sample condition applied then the rows extracted by the Where Clause rule will be ten percent of the rows returned by the Where Clause - not 10% of the rows in the entire table. Limit values are similarly applied to the rows returned by the Where Clause and sampling options.

The sampling option is quite sophisticated. For example, assume a Where Clause rule is implemented with no Where Clause or limit and a 25% sample condition. The rule will not extract just the first 25% of the rows from the table, and it is also not a mechanical extraction of every fourth row. The sampling algorithm will randomly select rows from the entire table so that the resulting number comprises 25% of the total. It is possible to implement non integer sampling values. For example, a sample of 0.1% will extract 1000 rows, taken at random, from a million row table.

Complex Where Clause Rules

One of the most common beginner mistakes when implementing Where Clause rules arises when there are multiple options which must be applied to the selection of the rows extracted for the driver table. Usually this arises when there is an AND condition in the selection criteria. For example, the end users of the subset schema might have stated: "I would like a subset database based on all of the customers in this region and also all customer of this type in any region". Clearly there are two criteria here - all customers in a specific region irregardless of the type and all customers of a certain type regardless of the region.

Faced with such a requirement, the usual response is to write a Where Clause which contains a complex UNION or MINUS operation. This will work, however, it is far easier to take advantage of the fact that DataBee always sifts out duplicate rows and implement this requirement in the form of two separate Where Clause rules. As far as DataBee is concerned, it does not matter if a rule which states WHERE REGION_ID=10 extracts some of the same rows as another rule which states WHERE CUST_TYPE=99. Any duplicate rows extracted by the two statements will simply be ignored.

Notes on Where Clause Rules

Where Clause rules execute in Stage 1 of the extraction process. The DataBee extraction process is multithreaded and can run multiple rules simultaneously. If necessary, (it usually isn't), the execution order of the Where Clause rules can be explicitly controlled by adjusting the rule block component of the rule ID number. Please see the discussion of rule blocks in the extraction set overview help page in order to understand how to explicitly control the execution order of stage 1 and stage 3 extraction rules.

Where Clause rules are created by launching the New Where Clause Rule Form using the New Extraction Rule button located on the bottom right of the main Set Designer form.

How to Create a New Where Clause Rule


[Net 2000 Ltd. Home][DataBee Home][DataBee Manual][DataBee FAQ]