How Does CopyStorm Work?
This article explains how CopyStorm optimally decides which records are copied from Salesforce.
The Most Important Point
CopyStorm’s default behavior is to perform an incremental copy from Salesforce every time it runs. There are no special parameters to set — just use the defaults and CopyStorm will incrementally update your Salesforce backup.
How CopyStorm Selects Records to Copy
CopyStorm minimizes the number of records processed during each run by using record timestamps stored in the CopyStorm database from previous runs.
For example, if the most recent timestamp for an Account in the CopyStorm database is 14-Jul-2013 at 12:00:00.000 then there is no need to read records from Salesforce that were modified before this timestamp.
The following diagram illustrates how CopyStorm reads records for each selected table:
The details are a bit more complex than the above diagram — for example, records are not read one at a time in the code. However, this is the exact logical process followed by CopyStorm.
Advanced Timestamp Details
This section provides a bit more detail on exactly how timestamp values are used by CopyStorm. You do not need to grasp this section to understand how CopyStorm works.
Salesforce has an interesting quirk — records are not committed to a Salesforce database in timestamp order. The implications of this are:
- At 13:01 a SOQL query indicates that there are 1000 records created earlier than 13:00.
- At 13:02 the same SOQL query may indicate that there are 1001 records created earlier than 13:00.
It seems that records written to Salesforce “take a little while” to be committed to the database AND are committed in no particular order.
CopyStorm overcomes this Salesforce quirk using two parameters on the Advanced tab:
- The SFDC Commit Latency parameter indicates how long to wait for Salesforce to commit records. The default is 300 seconds.
- This means that, by default, CopyStorm will not read a record from Salesforce until it has a timestamp at least 300 seconds old.
- The Timestamp Latency parameter indicates how many seconds to subtract from the most recent timestamp found in CopyStorm.
To sum up the details, the SOQL used by CopyStorm really looks like:
- SELECT * FROM TT WHERE TT.systemmodstamp>=(MostRecent – “TimestampLatency”) AND TT.systemdstamp <=(CURRENT_TIME – “SFDC Commit Latency”)
LIMIT Related Details
In Salesforce a query of the form “SELECT * FROM TT” will often fail with a timeout error on large tables. To avoid this problem CopyStorm runs every SOQL query with a LIMIT. For most tables the LIMIT value is set to 20,000 — but is configured to be smaller for a handful of tables known to have problems with this high of a limit.
The default LIMIT value can be overridden on the Advanced tab using the “Row LIMIT per Query” option — however, changing this parameter generally results in non-optimal behavior.
In the case where all records in a batch have exactly the same timestamp, CopyStorm dynamically increases the LIMIT value until the end of records is reached or not all records have the same timestamp. When this type of adjustment is required it is rarely noticed unless a Salesforce table has 100,000’s of records with exactly the same timestamp.
Other Quirky Details
If you would like to know about further Salesforce quirks CopyStorm has to manage (and be convinced that you don’t want to write your own CopyStorm) take a look at this blog post: SOQL’s Seven Deadly Sins.