////How to Set Table Specific Processing Options
How to Set Table Specific Processing Options 2018-11-13T13:02:40+00:00

How to Set Table Specific Processing Options

There are a number of rare cases in which table-specific processing rules are necessary (please remember, these are rare cases):

  • The Salesforce metadata API reports fields in a table that are not actually present.
  • The default LIMIT option for a table always times out.
  • The Salesforce SOAP API returns an invalid response for an Attachment while the REST API works properly.

In all cases CopyStorm supports overriding default processing rules for a table by creating an XML configuration file “TableRuleRegistry.xml” in a CopyStorm configuration directory. The best practice for creating a configuration directory is to:

  • Create a “config” subdirectory in the directory where .copyStorm files are kept.
    • This directory should not be inside the CopyStorm installation directory.
  • Set the “Config Directory” parameter on the CopyStorm advanced tab to point to the newly created config directory.
  • Store the “TableRuleRegistry.xml” configuration file in the newly created config directory.

For other best practices on how to install CopyStorm and your configuration files, please see our Installation Best Practices article.

A quick and dirty solution is to store custom configurations in a subdirectory named “config” in the CopyStorm installation director, but this makes upgrading the CopyStorm installation more difficult:

  • If CopyStorm.bat is in the directory “C:\bin\CopyStorm”
  • Create a file named “C:\bin\CopyStorm\config\TableRuleRegistry.xml”

Task: Prevent Large Attachments from being extracted from Salesforce

The following TableRuleRegistry.xml file will cause CopyStorm to exclude Attachment and ContentVersion records where their data size is greater than 1,000,000 bytes:

<TableRules>
    <TableRule name="Attachment"  predicate="bodyLength&lt;1000000" />
    <TableRule name="ContentVersion"  predicate="contentSize&lt;1000000" />
</TableRules>

Task: Prevent fields from being extracted from Salesforce

The following TableRuleRegistry.xml file will cause CopyStorm to exclude the LastViewedDate and LastReferencedDate fields when processing the Salesforce Opportunity table (columns will be created in the database but never populated):

<TableRules>
    <TableRule name="Opportunity" >
        <ExcludeField name="LastViewedDate" />
        <ExcludeField name="LastReferencedDate" />
    </TableRule>
</TableRules>

The following TableRuleRegistry.xml file will cause CopyStorm to exclude all fields called “LastViewedDate” from tables in the “XYZ” package (columns will be created in the database but never populated):

<TableRules>
    <TableRule regexp="XYZ__.*" >
        <ExcludeField name="LastViewedDate" />
    </TableRule>
</TableRules>

Task: Specify exactly which fields are extracted from Salesforce

The following TableRuleRegistry.xml file will cause CopyStorm to only include the “Name” field when processing the Salesforce Opportunity table (other columns will be created in the database but never populated):

<TableRules>
    <TableRule name="Opportunity" >
        <IncludeField name="Name" />
    </TableRule>
</TableRules>

Note that:

  • When an IncludeField is specified any ExcludeField rules are ignored.
  • The following fields are always copied, even if they are in a configuration directive:
    • SystemModStamp
    • CreatedDate
    • LastModifiedDate
    • LoginTime
    • isDeleted

Task: Exclude Formulas from CopyStorm tables

Default CopyStorm behavior is to backup formula columns just like any other column. To change this behavior, the “excludeFormulas” directive will cause CopyStorm to ignore formulas on a table. Note that the database columns corresponding to Salesforce formulas will still be created but will always have null values.

The following TableRuleRegistry.xml file will cause CopyStorm to exclude formulas for the Opportunity table.

<TableRules>
    <TableRule name="Opportunity" excludeFormulas="true" />
</TableRules>

Starting with CopyStorm version 8.37.1 the default formula exclusion rule can be set for all tables by setting the “excludeFormulas” attribute on the table “default”. The “default” table does not correspond with a table name in Salesforce or the CopyStorm database, but is used to specify global table processing options.

The following TableRuleRegistry.xml file will cause CopyStorm to exclude formulas for all tables (unless there is a rule overriding the global default for individual tables):

<TableRules>
    <TableRule name="default" excludeFormulas="true" />
</TableRules>

Task: Change the default SOQL LIMIT value used for a table when reading from Salesforce

The following TableRuleRegistry.xml file will cause CopyStorm to read Opportunity records 17,777 at a time:

<TableRules>
    <TableRule name="Opportunity" limit="17777" />
</TableRules>

Task: Change the Field used to determine if a record has been modified

Default CopyStorm behavior is to use the systemmodstamp field to determine when a record has been modified. This ensures that all changes are mimicked in CopyStorm, but occasionally a site will want to use the LastModifiedDate column instead.

The following TableRuleRegistry.xml file will cause CopyStorm to use the “LastModifiedDate” column to determine if “Account” records have changed:

<TableRules>
    <TableRule name="Account" timestamp="LastModifiedDate" />
</TableRules>

Since this would be tedious to apply to all tables, another option is to change the global default for all tables. Note that if a table does not have the specified timestamp column, CopyStorm will automatically adjust to the most pertinent column contained in the table.

The following TableRuleRegistry.xml file will cause CopyStorm to use the “LastModifiedDate” to determine if any record has changed:

<TableRules>
    <TableRule name="default" timestamp="LastModifiedDate" />
</TableRules>

Task: Force CopyStorm to use a specific database table name

Default CopyStorm behavior is to create database tables with names matching the corresponding table in Salesforce. To override the database name for a specific Salesforce table, use the “databaseTableName” parameter.

The following TableRuleRegistry.xml file will cause CopyStorm to store “Account” records in a table named “AccountMaster”:

<TableRules>
    <TableRule name="Account" databaseTableName="AccountMaster" />
</TableRules>

Task: Add database specific table creation options

Default CopyStorm behavior is to use the default RDBMS options. To specify non-default options for table creation use the “CreateTableRule” configuration directive.

The following TableRuleRegistry.xml file will cause CopyStorm to use specific creation options in MySQL and SQL/Server:

<TableRules>
    <TableRule name="default" >
        <CreateTableRule database="MySQL" option="ENGINE=ISAM"/>
        <CreateTableRule database="SQLServer" option="WITH(DATA_COMPRESSION=PAGE)"/>
    </TableRule>
</TableRules>

If the database type matches the “database” parameter the specified option will be appended to the generated CREATE TABLE statement. For example:

CREATE TABLE Account(id CHAR(18) PRIMARY KEY, name NVARCHAR(80) NULL, …) ENGINE=ISAM;

To apply a rule to all tables, set the “name” of the TableRule to “default”.

Task: Limit the amount of time CopyStorm will spend on a table

Default CopyStorm behavior is to process each table until all records are backed up. Setting the “maxRuntime” parameter will cause CopyStorm to stop processing a table after an amount of time has elapsed. The maxRuntime is specified in seconds.

The following TableRuleRegistry.xml file will cause CopyStorm to stop processing Attachment records and continue with the next table in the backup set after 10 minutes has elapsed:

<TableRules>
    <TableRule name="Attachment" maxRuntime="600" />
<TableRules/>

Task: Set table specific BULK API processing parameters

Default CopyStorm behavior is for all table to use the Bulk API parameters specified on the CopyStorm Advanced tab. Several of these parameters can be overridden on a per-table basis.

The following TableRuleRegistry.xml file will cause CopyStorm to use table-specific Bulk API rules for the Account table:

  • Switch to the Bulk API after 5000 records have been processed by the SOAP API.
  • Read 100,000 records per Bulk API transaction.
<TableRules>
    <TableRule name="Account" bulkThreshold="5000" bulkLimit="100000" />
<TableRules/>

Task: Force the REST API to be used for a table

The following TableRuleRegistry.xml file will cause CopyStorm to always use the REST API for the Attachment table:

<TableRules>
    <TableRule name="Attachment" useRestAPI="true" />
<TableRules/>

Task: Change the Record Deletion Policy for a table

Default CopyStorm behavior is to use the same record deletion policy on all tables, specified by the “Delete Older Than” option on the Advanced tab.

The following TableRuleRegistry.xml file will cause CopyStorm to never remove deleted records from the Attachment table (overriding the “Delete Older Than” global parameter):

<TableRules>
    <TableRule name="Attachment" purgeDays="" />
<TableRules/>

Task: Force a table to backup by ID Order

Default CopyStorm behavior is to back up all columns in timestamp order. For rare cases where a table does not contain a timestamp column, a table can be forced to backup in ID order. In nearly all cases, this option is only used internally by CopyStorm for Salesforce system tables that do not follow the normal rules.

The following TableRuleRegistry.xml file will cause CopyStorm to backup the PartnerNetworkRecordConnection table by ID (this is an example of a Salesforce system table that cannot be backed up in timestamp order):

<TableRules>
    <TableRule name="PartnerNetworkRecordConnection" ignoreTimestamp="true" />
<TableRules/>

Task: Force a table to be TRUNCATED before each copy

The following TableRuleRegistry.xml file will cause CopyStorm to delete all data in the “PartnerNetworkRecordConnection” before starting each copy:

<TableRules>
    <TableRule name="PartnerNetworkRecordConnection" truncate="true" />
<TableRules/>

Task: Force a table to backup using a Timebox Window

A number of tables in Salesforce do not support sorting by their timestamp column, and other tables where a user without the ViewAllData permission will get stuck. In these extremely rare cases, CopyStorm can be forced to backup a table using a sliding timebox apprach.

The following TableRuleRegistry.xml file will cause CopyStorm to backup the LoginEvent table using a sliding timebox on the EventDate field:

<TableRules>
    <TableRule name="LoginEvent"  timestamp="EventDate" timebox="90" timeboxEpoch="2014-01-01" />
<TableRules/>

This XML rule means:

Use a sliding 90 day window when backing up LoginEvent.
Start the first search on 1-Jan-2014 or the most recent EventDate in the LoginEvent backup table.

The system will backup LoginEvent entries in roughly the following order:

1-Jan-2014 to 31-Mar-2014
1-Apr-2014 to 30-Jun-2014
1-Jul-2014 to 30-Sep-2014
etc… until the current date is reached.

Subsequent backups will start the first timebox at the most recent EventDate in the LoginEvent backup table.

Summary

A single TableRuleRegistry.xml file can contain multiple rules and any or all features can be specified for a specific table.

However, this is a bad practice unless instructed to by Capstorm. The built-in rules have been validated on many Salesforce instances and are generally optimal.

CONTACTS

Product Questions: info@capstorm.com
Technical Support: support@capstorm.com
Phone: +1 314.403.2143