The DataBee for SQL Server
Frequently Asked Questions (FAQ) List

 

This FAQ is for the SQL Server version of the DataBee software. There is a separate FAQ for the DataBee for Oracle version.

 

Section Summary

 

Questions about What DataBee Is and Does

[01.00] Where is the latest version of this FAQ?
[01.01] How about a summary of what DataBee does?
[01.02] How can I contact the authors of DataBee?
[01.03] I have an idea for a new/better feature - do you want to hear about it?
[01.04] Where can I find out more about how DataBee works?

 

Version and Operating System Questions

[02.01] What is the current release version of DataBee?
[02.02] What are the PC system requirements for DataBee?
[02.03] What are the server side requirements for DataBee?
[02.04] What server side operating systems does DataBee support?
[02.05] What SQL Server database versions does DataBee support?
[02.06] What version of DataBee am I currently using?
[02.07] How can I see or edit my license key?
[02.08] How do I upgrade DataBee?

 

'How does it work?' and Architecture Questions

[03.01] Why are there three applications instead of just one?
[03.02] Where is the extracted data stored?
[03.03] Doesn't extraction use a lot of network bandwidth?
[03.04] How is the data moved from the source to the target database?
[03.05] What are the source database temporary tables used for?
[03.06] What happens if I extract duplicate rows?
[03.07] What happens if I have foreign keys in my database?
[03.08] What happens if I do not have foreign keys in my database?
[03.09] What happens if my target database is on a different operating system?
[03.10] What happens if my target database is at a different SQL Server version?
[03.11] Can I create a referentially correct set of data from tables distributed between more than one database?
[03.12] What if I want to create a referentially correct environment involving databases located in different instances?
[03.13] What happens if I want to add a bit more data into an already populated target database?
[03.14] What if the tables in the target database have a different structure?
[03.15] What happens when I build my sets of extraction and load rules and then the database changes?
[03.16] Where are the extraction and loader rules stored?
[03.17] Can I edit the XML of an Extraction or Loader Set?
[03.18] How does DataBee know what to extract?
[03.19] Can I have multiple driver tables?
[03.20] Can I extract all rows from a table?
[03.21] Can I extract all rows from all tables less than a certain size?
[03.22] What happens if there is an error during the extract?
[03.23] What happens if there is an error during the load?
[03.24] Do I have to extract each time I want to load?
[03.25] Can I load multiple target databases from a single extract?
[03.26] Is there a command line option?
[03.27] Can DataBee extract and load tables containing columns with large datatypes such as VARBINARY?
[03.28] What are extractions set plans and why do I want one?
[03.29] Do I have to have a plan in order to create an extraction set?

 

'How do I' and 'Can I' Type Questions

[04.01] Can a table have more than one Extraction Driver rule?
[04.02] Can a table have more than one rule pointing out of it?
[04.03] Can a table have more than one rule pointing into it?
[04.04] Can a table have a rule pointing at itself?
[04.06] Can a table have more than one rule pointing at another table based on different join conditions?
[04.07] Can a table have more than one rule pointing at another table based on the same join condition?
[04.08] If table A is the source of table B in one rule can it be the target of table B in another rule?
[04.09] Why does the Set Designer sometimes not know about a column in a table?
[04.10] The rest of the rules extract really fast. Why is one so slow?
[04.11] What do the letters fk and tt mean in the Table-to-Table rule ID values?

 


 

Questions about What DataBee Is and Does

 

[01.00] Where is the latest version of this FAQ?

The latest version of the DataBee FAQ can be found on the DataBee web site at http://www.DataBee.com/dbs_faq.htm

 

[01.01] How about a summary of what DataBee does?

The basic function of DataBee is to construct smaller SQL Server databases which are composed of a subset of the rows contained in a larger SQL Server database. This is a common requirement as it is quite undesirable (in terms of cost, time and performance) to give developers, testers and trainers full size copies of databases to work on. DataBee consists of three applications each of which is dedicated to a single function.
The Set Designer application finds, edits and collects extraction rules together into a group known as an Extraction Set. The Set Designer also builds the rules which control the load operations - this is called a Loader Set.

 

The Set Extractor application uses the Extraction Set to identify the rows in the source database that will be required to build the subset. The identifiers for these rows are stored in a temporary table in the source database.

 

The Set Loader application uses the rules in the Loader Set to load the data into the target database. It can load by direct copy or by pulling the rows through a database link. The Set Loader application also handles the enabling of the relevant constraints and triggers.

 

Another Net 2000 Ltd. software product called Data Masker can be used to sanitize and scrub data. Data Masker is designed to protect sensitive information in test databases by replacing it with realistic looking but false information.

 

[01.02] How can I contact the authors of DataBee?

DataBee is published by Net 2000 Ltd. Any email to Info@Net2000Ltd.com will get to us. Our full contact details are on this page.

 

[01.03] I have an idea for a new/better feature - do you want to hear about it?

For sure we do. Please E-mail Info@Net2000Ltd.com.

 

[01.04] Where can I find out more about how DataBee works?

Ah... you mean all of the technical internal workings. Section 3 of this FAQ contains some of the more commonly asked questions. The overview page on the DataBee website contains a useful summary and we would be happy to provide an interactive Tutorial for detailed training. If you would like a more theoretical overview, have a look at the Database Subsetting: What You Need to Know whitepaper.

 


 

Version and Operating System Questions

 

[02.01] What is the current release version of DataBee?

The current production release of the DataBee for SQL Server software is v06.00

 

[02.02] What are the PC system requirements for DataBee?

DataBee runs on the Windows XP, Windows Vista, Windows 7, Windows 8, Windows 10, Windows 2008 Server or Windows 2012 Server operating systems with the Microsoft supplied .NET 4.0 runtime environment is installed. There are no really large hardware requirements - any 1Ghz (or better) PC with more than 500Mb of memory should be sufficient. A reasonably large monitor is desirable and the screen resolution should be 1024x768 or greater.

 

The DataBee software itself takes up very little space - 60 Mb of disk space should be more than sufficient.

 

[02.03] What are the server side requirements for DataBee?

Not very much. DataBee requires some temporary tables to be created in the target SQL Server database (or a proxy). These tables can be deleted after the subsetting operation completes. No other components are needed. DataBee requires no installation of software of any sort on the server and requires no server side disk space other than that used for the temporary tables.

 

[02.04] What server side operating systems does DataBee support?

All of them actually. DataBee can be used to subset data on any server that will run an SQL Server database. DataBee is server neutral, it runs on a PC and connects via the standard Microsoft standard ADO technologies. The server side operating system is largely irrelevant to it.

 

[02.05] What SQL Server database versions does DataBee support?

DataBee supports SQL Server versions 2005, 2008, 2012, 2014 and 2016 as well as Azure SQL Server and AWS RDS SQL Server.

 

[02.06] What version of DataBee am I currently using?

Every copy of DataBee contains version information in the "About DataBee" dialog box. To see this dialog box start any DataBee application and click on the Subset logo in the upper right hand corner.

 

[02.07] How can I see or edit my License key?

To view the License Key dialog box press the Registration Key button in the "About DataBee" dialog box. See the FAQ question [02.06] for information on the "About DataBee" dialog box.

 

[02.08] How do I upgrade DataBee?

Just get the latest version from the web site (http://www.DataBee.com) and re-install. You do not need to uninstall first and you can skip as many versions as you wish. Your saved settings and license keys will be preserved.

 

[02.09] What other databases besides SQL Server does DataBee support?

An Oracle version of DataBee is available.

 


 

'How do I' and 'Can I' Type Questions

 

[03.01] Why are there three applications instead of just one?

To make it easier to use. The process of creating a referentially correct subset database is divisible into three separate and distinct actions. At any one time you are either "Extracting", "Loading" or "Designing Rules". Because the DataBee software has a separate application for each of these operations (Set Extractor, Set Loader and Set Designer) the learning curve is much reduced. For example, if you are using the Set Extractor application, then every available tool or action is appropriate to the process of extraction and you will not have to mentally disregard many options which are appropriate only to design and loading. The DataBee applications all communicate with one another so they can actually be thought of as just different interfaces into the same functional application.

 

[03.02] Where is the extracted data stored?

Nowhere. The term "Extraction" is slightly misleading since the DataBee software does not actually extract data. The process of extraction consists of identifying the data rows in the source database which will need to be moved to the target database in order to create a referentially correct subset there. The data content of the tables in the source database does is not moved or copied anywhere during the extraction process.

 

[03.03] Doesn't extraction use a lot of network bandwidth?

No, it uses hardly any at all. As discussed in FAQ item [03.02] the extraction process is primarily concerned with collecting a referentially correct list of row identifiers and the actual row data is not moved or copied at any time during the extract process. The row identifiers themselves are stored in the temporary tables however they are never transported to the PC running the DataBee software. The DataBee Set Extractor application finds, uses and records the row identifiers it extracts entirely within the SQL Server database.

 

[03.04] How is the data moved from the source to the target database?

Well, it absolutely does not travel down to the PC running the DataBee software and then back up to the target database. The Set Loader application is used to perform the load and there are two methods which can be used. The data can be pulled through a database link (which runs from the target into the source database) or, if the source and target databases are co-located in the same database, it is possible to do a direct copy. Either way, the data required in the target database is transferred directly between the servers. The PC running the DataBee software only manages and controls the process. It delegates the actual transport of the data to the SQL Server - which is, of course, optimized for that task.

 

[03.05] What are the source database temporary tables used for?

DataBee requires temporary tables to be installed on the source database. These tables are:
DBSSTAT_TSTATS
This table contains table information needed during the extraction process. It will have one row for every table which has at least one row extracted for it.
DBSSTAT_RSTATS
This table contains rule information needed during the extraction process. It will have one row for every extraction rule which has executed.
DBSRID*
These tables are used to record the row identifiers (PK or UK) of each row in the source database which will need to be copied to the target database in order to form a referentially correct subset there. There will be one table for each extracted table and that table will have one row for each extracted row identifier.

 

[03.06] What happens if I extract duplicate rows?

You never do. It is the job of the DataBee software to sift out any duplicate rows returned by the extraction rules and that is exactly what it does. In other words, all of your primary and unique keys will be valid in the target database. Duplicate extracted rows are just not something you need to concern yourself with when you are using the DataBee software.

 

[03.07] What happens if I have foreign keys in my database?

DataBee will automatically build rules to ensure that the referential integrity of each foreign key is maintained during the extraction process. After you load the subset data into the target database your foreign key relationships will be referentially correct and they will enable.

 

[03.08] What happens if I do not have foreign keys in my database?

You will need to manually build rules which model the data relationships. The DataBee software provides a number of innovative tools which assist with the discovery and bulk building of the extraction rules. Yes, there will be a development effort, but it is not such a terribly labour intensive process.

 

[03.09] What happens if my target database is on a different operating system?

The operating system on which the database runs is not something you need to concern yourself with. The source and target databases can be on the same or different operating systems and the entire operation will be transparent to you. The DataBee software performs all of its operations using the standard Microsoft ADO technologies and this nicely handles operating system differences.

 

[03.10] What happens if my target database is at a different SQL Server version?

If the target is at a higher SQL Server version then the database version differences are of no concern and the load operation will function normally. If the target is at a lower SQL Server version than the source database then, in theory, it is possible that the data simply cannot be re-created in the target because support for those data types does not exist. In reality, if your source database structure doesn't contain any features not implementable on the earlier SQL Server version, then you will probably find that the subset creates just fine.

 

[03.11] Can I create a referentially correct set of data from tables distributed between more than one database?

Yes, the DataBee software handles this transparently. You simply create multiple Rule Controllers and run the extraction process as usual.

 

[03.12] What if I want to create a referentially correct environment involving databases located in different instances?

Well this is a bit more complex but it certainly is do-able. If you need this functionality please get in contact with the DataBee support team and we will provide you with implementation specific advice.

 

[03.13] What happens if I want to add a bit more data into an already populated target database?

Ahhh.. you mean "delta loading". This is not possible with version 01.00 of the DataBee for SQL Server software although we plan to add it in a future release.

 

[03.14] What if the tables in the target database have a different structure?

No problem. The Set Loader application contains a tool called a transformation engine which can adjust the subset data as it is loaded in to a target database. The transformation engine can remove a column from an extracted data stream or supply a default value for a column which exists in the table in the target database but not in the source.

 

[03.15] What happens when I build my sets of extraction and load rules and then the database changes?

The DataBee software is pretty robust when faced with this sort of situation. Unless the columns involved in the join conditions are changed then major modifications are not likely to be required to a set of extraction rules. If new tables are added, then new rules will probably have to be added to ensure the data they contain is extracted. Rules enforcing foreign keys can be refreshed at the press of a button.

 

[03.16] Where are the extraction and loader rules stored?

Extraction and Loader rules are stored as part of an Extraction or Loader Set on disk as an XML file. The DataBee software does not use a database repository or any other storage mechanism.

 

[03.17] Can I edit the XML of an Extraction or Loader Set?

Technically, no, but the XML in the Extraction and Loader Sets is readily interpretable by a human being and some users have been known to indulge in a bit of judicious (but unsupported) search and replace and manual editing.

 

[03.18] How does DataBee know what to extract?

You supply it with a set of Driver rows. These rows are used as the basis for the extraction set and every other extracted row is related to that initially supplied set of rows.

 

[03.19] Can I have multiple driver tables?

Certainly you can do this. It is a bit of an advanced topic but it is a common requirement.

 

[03.20] Can I extract all rows from a table?

No problem. Just put an All Rows rule on the table and all rows for that table will be extracted.

 

[03.21] Can I extract all rows from all tables less than a certain size?

Sure, you can do this - and on lots of other criteria besides size as well. The Safe All Rows Rule Generator tool is designed to assist with the bulk creation of All Rows Rules.

 

[03.22] What happens if there is an error during the extract?

You just restart the extraction set. The DataBee software will figure out what it was doing before the error and restart from that point. There is no need to start again at the beginning.

 

[03.23] What happens if there is an error during the load?

You fix the problem (perhaps it was something simple like a storage issue) and resume the load. The DataBee software will figure out where in the load process it had got to before the error occurred and restart from that point. There is no need to start again at the beginning.

 

[03.24] Do I have to extract each time I want to load?

No, you can load and re-load a database as many times as you wish from a single extract.

 

[03.25] Can I load multiple target databases from a single extract?

Yes, you can load and re-load as many target databases as you wish from a single extract. In fact, you can run multiple loads at the same time.

 

[03.26] Is there a command line option?

Yes, the DataBee software can be run from the command line for both for the extraction and load operations. In fact, it is possible to supply the connection information on the command line in order to dynamically determine the database on which the operations will execute.

 

[03.27] Can DataBee extract and load tables containing columns with large datatypes such as VARBINARY?

Yes it can, in fact DataBee handles such data transparently.

 

[03.28] What are extraction set plans and why do I want one?

Fundamentally, a plan is just an annotation associated with a table which indicates that it would be desirable to extract "All", "a Subset" or "No Rows" from a table. The benefit of using a plan is derived from the fact that there are tools within the DataBee software to compare how well the rows you have extracted match what you want to extract. When building extraction rules a plan makes it easy to find and focus on extracting rows for tables which do not match the plan.

 

[03.29] Do I have to have a plan in order to create an extraction set?

Plans are recommended but not strictly necessary. You can add the plan information for the tables, in whole or in part, at any time using the Set Designer tool.

 


 

'How do I' and 'Can I' Type Questions

 

[04.01] Can a table have more than one Extraction Driver rule?

Absolutely, place as many Extraction Driver rules on a table as you need to extract the rows you wish. You do not need to worry about extracting duplicate rows either - DataBee will automatically take care of removing them. See the About Extraction Driver Rules help page for more information.

 

[04.02] Can a table have more than one rule pointing out of it?

Yes, DataBee is specifically designed for this. See the Table-To-Table Rules page for more information.

 

[04.03] Can a table have more than one rule pointing into it?

Yes, DataBee is specifically designed for this. See the Table-To-Table Rules page for more information.

 

[04.04] Can a table have a rule pointing at itself?

Yes, DataBee transparently handles self-referential rules.

 

[04.06] Can a table have more than one rule pointing at another table based on different join conditions?

Yes, DataBee will take each rule in turn and make sure the target table has the rows it needs in order to satisfy both rules.

 

[04.07] Can a table have more than one rule pointing at another table based on the same join condition?

No, it is not possible to create such duplicate rules. The DataBee software will detect and warn you if you should try to create one.

 

[04.08] If table A is the source of table B in one rule can it be the target of table B in another rule?

Yes, DataBee will take each rule in turn and make sure the target table has the rows it needs in order to satisfy both rules.

 

[04.09] Why does the Set Designer sometimes not know about a column in a table?

This can happen if the database information associated with an Extraction or Loader Set is out of date. This situation is easy to fix - just use the tools in the Rule Controller to refresh the database structure.

 

[04.10] The rest of the rules extract really fast. Why is one so slow?

There are two common causes.
  1. Are the join columns used in the rule indexed in the target table? If the join columns in the target table are not indexed, the only option the SQL Server database will have is to do a full table scan. This will be slow if there are a lot of rows in the target table. You can easily see which rules have unindexed join columns by using the Set Designer Unindexed Rule Target Checker Tool. Prominently displayed icons also turn green or red when viewing the rule columns and these indicate the indexed status of the columns used in the rule.

  2. Do both tables in the rule have up-to-date statistics? Odds are one or both do not and SQL Server is simply putting together an inefficient execution plan.

 

[04.11] What do the letters fk and tt mean in the Table-to-Table rule ID values?

They indicate whether the rule is derived from a foreign key fk or was manually entered to support a logical relationship tt.