Optimizing CopyStorm/Restore Performance

Though CopyStorm/Restore provides good performance with its default settings, there are a few best practices to achieve optimal performance.

Index Your CopyStorm Database

To determine which records should be restored, CopyStorm/Restore performs a lot of reference field lookups on its CopyStorm database. If reference fields in the CopyStorm database are not indexed performance can be greatly impacted. To add all the necessary indexes to the CopyStorm database:

  • On the CopyStorm Advanced tab make sure that the Create Indexes option is checked.
  • Run CopyStorm to automatically create any missing indexes.

Store Your Tracker Database On a Local Disk

CopyStorm/Restore tracks the progress of a restore using a small-footprint database called a “Tracker Database”. This database is named on the Global Parameters tab and, by default, is stored in the directory “$HOME/.capstorm/copyStormRestoreSets”. If your home directory is not on a local disk, performance can be improved by changing the tracker database location to a location stored on local disk. For example

  • If the tracker database is “fred” then it will be stored at:
    • $HOME/.capstorm/copyStormRestoreSets/fred.mv.db
  • If the tracker database is “C:\tmp\fred” then it will be stored at C:\tmp\fred.mv.db

If the location already contains a tracker database, the pre-existing one will be used. If not, CopyStorm/Restore will automatically create a new one.

Increase CopyStorm/Restore Parallelism

To upload data to Salesforce, CopyStorm/Restore uses 1-10 virtual data entry clerks (also called Salesforce Writers). Each Salesforce Writer enters data into Salesforce concurrently. In addition to the number of Salesforce Writers, the amount of work given to each writer in a single batch can be controlled to increase performance.

There are two parameters on the Global Parameters tab which control the level of parallelism:

  • The “Default Max Per Update” parameter controls the amount of work given to a single Salesforce Writer.
    • This parameter defaults to 200 but a value of up to 600 can often improve performance.
  • The “# Salesforce Writers” parameter controls the number of concurrent threads used to write to Salesforce.
    • This parameter can range from 1 to 10.
    • Higher values increase the level of parallelism (i.e. number of virtual data clerks) but also increases the load on your network.

Values for these parameters that work best for you depend on several factors:

  • The speed of your internet connection to Salesforce. This is the most important factor.
  • Salesforce limits the number of active update requests per Salesforce login for your entire Organization.
    • Running too many restores at the same time can cause this limit to be hit.
    • This issue has been observed when a customer was running three separate restores concurrently where each restore used the same Salesforce credentials and 10 concurrent threads.
  • More concurrent threads take more system memory. This is unlikely to be an issue on a 64-bit system but is something to consider.
  • The performance of your database can be a factor but rarely causes issues.

In Capstorm’s test labs we typically use values of:

  • Default Max Per Update = 400
  • # Salesforce Writers = 10

High parallelism parameters can, under certain circumstances, degrade restore performance. This has been observed when restoring to a table with a computed column involving aggregation of data from another table. If this happens:

  • CopyStorm/Restore will still work but may have to make multiple restore requests for the same batch of records.
    • This happens internally in the program and the only visible effect is on restore performance.
  • CopyStorm/Restore will automatically lower the level of parallelism on a table when it is causing restore performance issues.
  • The level of parallelism can also be controlled for a specific table by setting table specific parameters on the Restore Set Editor tab.
    • Table specific parameters override global defaults.