Data Retention Policy

The relational database industry has decades of experience supporting strong data retention policies. What is a “data retention policy”? A data retention policy is a set of practices that retain historical corporate data for a specified period of time and, in the case of databases, allows historical data to be restored. A good data retention policy supports tasks like “Restore this data to how it was last Wednesday”.

Who is the best at supporting, developing, and executing data retention policies?

  • Relational database vendors
  • Database administrators

CopyStorm does not directly support a data retention policy for one simple reason:

  • CopyStorm could never be better than the tools relational database vendors already have in place for implementing retention policies.

However, there is good news. Defining, setting up, and administering a data retention policy is a fairly simple task. Most importantly, recovering data to how it was at a specific point in time is easy.

Key Decisions for a Data Retention Policy

To help your local DBA set up a data retention policy, you will need to have a conversation like:

  • Mr. DBA, I need my Salesforce backup database to be treated like a production database with point-in-time recovery capabilities.
  • I want to be able to recover data that is up to 6 months old.
  • I am making updates to the database every 20 minutes (depending on how frequently you run CopyStorm).

Your DBA will turn your request into specific actions supporting data retention and point-in-time recovery, for example:

  • Run a full backup every Saturday at 01:00.
  • Run a log backup every 5 minutes.
  • Retain backups onsite for 4 weeks.
  • Retain backups in offsite storage for 6 months.

What Are the Steps for Data Recovery?

Suppose we need to recover an account that was accidentally deleted 3 days ago. Here are the steps:

  1. Ask the DBA to create a new database that contains the Salesforce backup as it was at 28-Jan-2016 at 12:00.
  2. Use CopyStorm/Restore with the newly restored database to restore the account (and all of its related data) to Salesforce.
  3. If you are not licensed for CopyStorm/Restore, the Salesforce data loader provides an alternate solution.
    1. CopyStorm/Restore is far less onerous to use than the Salesforce toolkit — we highly recommend buying a copy before attempting a Salesforce data restore.
    2. See here for more information on CopyStorm/Restore

Why is Data Retention Not Built Into CopyStorm?

We have already covered the primary reason — any capability offered by CopyStorm would be amateur compared to the mature capabilities offered by database vendors. If a company wants to trust a vendor with data retention, they should trust a vendor that has been great at it for decades.

There are also a number of technical issues causing CopyStorm to rely on database vendors to solve this problem.

Salesforce instances tend to have a dynamic structure. Tables, columns, and relationships are frequently added, modified, and deleted both by Salesforce and by Salesforce’s customers. CopyStorm faithfully reproduces structural and data changes in its database while keeping a “hot” copy of Salesforce. These facts turn a task like “Restore data to how it was last Wednesday” into a complex problem that would often involve steps like:

  • Remembering how all tables and columns in Salesforce were defined last Wednesday.
  • If any columns were redefined or deleted, pretend like they have not changed while doing the restore.
  • If any relationships were deleted in Salesforce, do not try to restore them but know about them so I can be warned.
  • Ensure that all backup jobs can still function while performing the restoration.

These types of issues (plus dozens more) are massively complex to track — especially when considering that they frequently change over time. Relational database vendors have a much more mature method of tracking these changes reliably over time.

I Do Not Have a Full Time DBA — Help!

The manual for Backup and Recovery for SQL/Server is not my favorite choice for a good book on a winter evening, but if you want to be able to restore data to a point in time a little bit of reading and understanding is required. Fortunately, understanding the concepts is the hard part because the tools are almost all point-and-click. For example, SQL/Server provides wizards that:

  • Ask how often a full backup should  be done.
  • Ask how often an incremental backup should be done.
  • Ask how long backups should be retained.
  • Schedule a job that implements the backup policy.

Restoring with SQL/Server is almost as simple as picking a point-in-time for the recovery.

If you are using Oracle then you probably have a DBA who already knows all about redo and archive logs, if not then start here.

If you are using MySQL, then start with the manual.

If you are using PostgreSQL or MariaDB, then try this as a starting point.