How Does CS:Govern Work With Base64 Chunking?

All databases have upper limits on how much data a column can hold, and the CS:Govern supported databases are no exception.  Some data originating in Salesforce are large documents (Attachment, and ContentVersion are good examples) that can easily exceed the maximum database limit on column width.

To get around this problem, CopyStorm introduced a feature known as “chunking” and since the field types for these large documents are Base64, the technique is commonly referred to as Base64 Chunking.

When chunking Base64 field values, CopyStorm will break the field value into as many chunks as it takes to store the effective field value in the database across several records.  A separate table, the CopyForceTableFieldChunk table, holds the chunks as Base64 values.  Later, when restoring these field values back into Salesforce, these chunks are recombined (we refer to that as “stitching”) and restored back into Salesforce as a whole.

The same discussion holds true of the Snapshotting system with the exception that the field value chunks are stored in a CopyForceArchiveFieldChunk table.  And unlike the “normal” CopyStorm field values which are only chunked if a Chunk Base64 Values configuration flag is enabled, Snapshot Base64 field values are ALWAYS chunked regardless of how big or small those values may be.

Since chunked values are always stored in a separate table from the Salesforce object table in the CopyStorm database, one might wonder what field value is placed into the object table column?  When chunking is performed, CopyStorm, as well as the Snapshot system, will place a “pointer”, which is a JSON representation of the metadata pointing to the respective chunk tables, into the column value.

When CS:Govern is used to protect Base64 chunked field values (such as the Body field of an Attachment, or the VersionData field of a ContentVersion record) there are two operations that have to happen:

  1. The “pointer” column value has to be encrypted in the storage table, and the field value has to be masked.  This is all standard CS:Govern functionality.
  2. The chunks need to be individually encrypted and those encrypted values need to be placed into the GuardianTableFieldChunk table or the GuardianArchiveFieldChunk table, depending on whether the record is being created for CopyStorm or the Snapshotting system.

Note that field values for CS:Govern protected fields are ALWAYS chunked regardless of whether the Chunk Base64 Values configuration switch is enabled or not.  If a field value was not originally chunked, but is later protected under CS:Govern, then when a Custodian is run on that value, that field will become a set of one or more chunks stored in the tables mentioned above.

CS:Govern creates INSTEAD OF INSERT triggers on the CopyStorm chunk tables (CopyForceTableFieldChunk and CopyForceArchiveFieldChunk) which is significant because it implies that should there be any error in processing the field value chunks, the insertion of chunks into those tables are in the same transaction as encrypting the chunk values and putting those into the respective CS:Govern chunk tables.  In this manner there is no transient state in which some unencrypted chunks reside in the CopyStorm tables.  They are either residing in an encrypted state in one of the CS:Govern chunk tables, or they do not reside anywhere.