How Can I do a Complete Restore?
Capstorm is commonly asked “How can I do a complete restore of my Salesforce?”, and the short answer is you cannot. This article explains why and the kinds of restores that occur in practice.
Background
For a person coming from a traditional database world, how to do a complete restore is a simple question. Traditional databases allow an operator to easily disable database access and replace the image with a backup — this technique allows an operator to put a database into a known state at any desired point in time.
Salesforce, however, is unlike a traditional database in several important ways, making a traditional database restore impossible:
- It is impossible to get an exact point in time backup of Salesforce using the Salesforce API.
- Salesforce cannot be shutdown during the restore process. This often means that data entered by employees may be overwritten during while the restore is in progress.
- The restore process will change Salesforce.
- A restore is analogous to hundreds of data entry clerks rapidly entering data into Salesforce.
- Salesforce has no idea that the “clerks” are really a restore process, as API calls and “clerks” look the same to Salesforce.
- Not all tables and metadata in Salesforce are writable from the API.
- There is no such thing as an empty Salesforce database — even an “empty” sandbox has dozens (or more) tables populated with data.
There is another Salesforce “twist” that can further complicate restores. Restoring Salesforce metadata (e.g. ApexClasses, Components, PageLayouts, Triggers, SObjects, etc…) often requires dependency knowledge that is not necessary when restoring a traditional relational database. This knowledge is not always provided by the Salesforce API. Since API limits the size of a single metadata deployment, the knowledge of metadata dependencies is an absolute requirement. Total automation of a metadata restore would require a “compiler/semantic analysis” solution that understands the dependencies between all Salesforce metadata types. With the exception of internal Salesforce development, there is no absolute authority on how to perform this dependency analysis.
Restores in Practice
In practice there are three general classifications of restores used — Sandbox, Production, and Migration.
Sandbox Restore
The main objective of a sandbox restore is to populate an empty development sandbox with a rich subset of data from a production backup. The sandbox generally has one of the following purposes:
- To provide a rich set of data for a developer.
- To provide a rich set of data for a trainer to use in a class.
- In this case, multiple sandboxes are populated using the same restore process.
Since sandboxes rarely have enough space to hold all of the production data, a user needs to select the data important to their task. For example, a developer may want to restore all Opportunities won in the past 6 months with a value greater than $50,000 along with their associated Accounts, Contacts, Cases, and related custom objects. The process a developer would follow is:
- Select the records and relationships to restore.
- Press a button to restore the records and relationships.
- Optionally add additional records and relationships.
- Press a button to restore the additional records and relationships.
- Repeat the two steps above until the sandbox contains all desired data.
A less frequent type of sandbox restore is to refresh the data in a sandbox from current production data.
CopyStorm/Restore supports all of these restore cases and sees many of them every day.
Production Restore
Production restores are more varied than sandbox restores but tend to fall into these categories:
- Column Restore
- Restore a handful of columns in a table due to data corruption.
- Hierarchy Restore
- Restore an Account (or another top-level record) and all associated tables after it was accidentally deleted and removed from the recycle bin.
- Polymorphic Restore
- Restore the Attachements / Notes for a given set of Contacts (or another object).
The common characteristic of production restores is that an operator has surgically precise knowledge about what needs to be restored.
Though it is technically possible for a massive restore to involve hundreds of tables and millions of records on a production Salesforce, this would mean that either the Salesforce infrastructure has failed or your Salesforce data governance has suffered a massive failure/breach. In practice this almost never happens. The worst production disasters Capstorm has seen have involved:
- A Salesforce data center failure lost 6 hours of production updates.
- A key logger corrupted specific fields in a customer’s Salesforce for a period of 2 months.
- A programming error deleted a large set of attachments.
- A mistake in a trigger corrupted a known set of fields.
From a data recovery standpoint, all of these were surgical level restores since there was a level of precision available to identify the corrupted or missing data.
Migration Restore
Migration restores involve moving large hierarchies of records from one Salesforce instance to another where:
- The target Salesforce instance already contains a lot of records.
- The source instance records may or may not match records already in the target.
- How to determine record matching varies by record type.
- The table structure of the two instances is not the same.
- Data transformation is likely required to move records.
- The Salesforce system tables in the two instances do not match.
- Plus dozens of other subtleties.
Doing a migration is generally much more complex than a sandbox or production restore. The core issue is a classic ETL problem:
- Extract the data from the source Salesforce.
- Transform the data into the structure required by the target Salesforce.
- Load the data into the target Salesforce.
Capstorm products deal with the Extract problem (CopyStorm) and the Load problem (CopyStorm/Restore). The Transform problem is highly specific to each migration and is what typically makes a migration project difficult.
Another type of migration restore is used by several Capstorm customers — bringing data from many disparate systems into Salesforce. In these cases:
- Data from many sources is populated into a CopyStorm created schema using customer written code.
- CopyStorm/Restore is used to restore the data into a production Salesforce.
The highest volume example of this type of migration task that Capstorm knows about restores millions of records every day collected from over 50 disparate data sources.
Conclusions
If your requirement is to “click a few boxes and restore any Salesforce instance to the exact state of a backup” then we wish you good luck. However, if your operational requirements fall into the restore categories described above the Capstorm suite of products is your best choice. Of course, you could write the tools in-house and discover the less documented underbelly and restrictions of the Salesforce API on your own time. If you want to be convinced that there is an underbelly, check out this article we posted a few years ago: Salesforce’s Seven Deadly Sins.