How Does CopyStorm Base64 Chunking Work?

All programming languages, databases, and most applications have limits on the maximum size of values they can store or manipulate. For example, Salesforce has a 2 GB limit the size of an uploaded file, and PostgreSQL has a 1 GB limit on the size of a column value.

When CopyStorm downloads large files from Salesforce, CopyStorm streams the Base64-encoded file body to the file system (which has no size limits other than available disk space). Without using CopyStorm’s Base64 Chunking feature the Base64-encoded file body will be written directly to the database, subject to the size limitations (e.g. 1 GB in PostgreSQL).

The CopyStorm Base64 Chunking feature allows large files to be broken apart into chunks, storing these chunks in a separate table named CopyForceTableFieldChunk. By default, files will be broken into 100 MB chunks – but this is configurable. For example, if the Body field of a ContentVersion record is 500 MB, and the Base64 Chunking feature is enabled, then the 500 MB file will be broken into 5 parts that are stored as 5 separate entries in the CopyForceTableFieldChunk table.

With Base64 Chunking enabled, there are no practical database storage limits to the size of field values in SalesForce objects that can be brought into the CopyStorm database, and CopyStorm is not subject to the maximum size limitations of the database.

Base64 Chunk Usage

The Base64 Chunking feature is disabled by default when CopyStorm is initially configured – to enable this feature, check the Chunk Base64 Values checkbox on the Database Schema configuration editor.

When this checkbox is:

  • Unchecked, then Base64 field values are always stored in-line in the primary CopyStorm data tables.
  • Checked, then Base64 field values that are larger than the chunk size will be stored in the CopyForceTableFieldChunk table.
    • In this case, a pointer value will be stored in the primary CopyStorm data table.

The pointer is the string representation of a JSON object containing field metadata. For example:

#AAA:{
   "numChunks": "5",
   "chunkSize": "100000000",
   "contentSize": "500000000",
   "salesforceId": "0686300000190d9AAA",
   "hash": "bcfc403cba99fda2224395a843d9397b"
 }

All chunked values will be prefixed with the value “#AAA:”.  Inlined Base64 values will never begin with this value because the “#” character is not valid in Base64 strings.

Base64 Snapshot Chunking

All of the Base64 chunking information within this page also applies to CopyStorm Snapshots, with two differences:

  1. CopyStorm Snapshots store Base64 values in a secondary chunked data table named CopyForceArchiveFieldChunk.
  2. CopyStorm Snapshots will always chunk Base64 values, regardless of whether Base64 chunking is enabled in the primary CopyStorm database.

Overriding System Default Chunk Sizes

To override the system default chunk size, create a Base64ChunkParameters.xml file and place it in the [CopyStorm]/config/ directory. This file is used to provide the chunk size to CopyStorm. In the example below, the default chunk size has been changed to 5 MB (5000000):

<Base64ChunkParameters>
            <DataSourceType name="SQLServer" chunkSize="5000000" />
            <DataSourceType name="Oracle"  chunkSize="5000000" />
            <DataSourceType name="MySQL" chunkSize="5000000" />
            <DataSourceType name="H2" chunkSize="5000000" />
            <DataSourceType name="PostgreSQL" chunkSize="5000000" />
            <DataSourceType name="RedShift" chunkSize="5000000" />
            <DataSourceType name="Snowflake" chunkSize="5000000" />
            <DataSourceType name="Unknown" chunkSize="5000000" />
</Base64ChunkParameters>

All chunks created after updating this file will be created using the newly provided chunk size.

Deleting Base64 Chunk Records

CopyStorm will automatically delete records from the CopyForceTableFieldChunk table as parent records are deleted from the primary CopyStorm data tables. If the CopyStorm tables are truncated, or if records are deleted from the CopyStorm tables without CopyStorm performing the deletions, then the CopyStorm/Medic Delete Orphaned Base64 Chunks tool can be used to scan for and delete any CopyForceTableFieldChunk records that point to records that are no longer present in the primary CopyStorm data tables.

Snapshot chunked records contained in the CopyForceArchiveFieldChunk table will be automatically cascade-deleted when their parent records are deleted.