Base64 Export
When storing Files and Attachments in the CopyStorm database, CopyStorm mirrors the Salesforce API specification for the field. This means that Salesforce object fields of type Base64 (such as Attachment.Body and ContentVersion.VersionData fields) are stored in the CopyStorm database as Base64 text strings. Such representations are industry standard and are intended to eliminate problems that can arise with storing binary values in a database. CopyStorm also removes limitations on the size of large text fields by means of a process of “chunking” – breaking down large blocks of content into chunks that fit comfortably in database given the unique datamodel limitations of the database engine. For example – Salesforce supports ContentVersion files as large as 2 GB, but the largest text string that can be stored in a PostgreSQL database is approximately 1 GB. With the “chunking” system, there are no practical limitations to how large a field value can be stored.
This storage format can make it complex to view the contents of a Salesforce File or Attachment. For example, if the Base64 content is a PDF file, then you cannot view the PDF using a normal PDF viewer without exporting the Base64 text from the database and decoding the file content.
The Base64 Export tool provides a way to directly export Files and Attachments from the CopyStorm database. This tool is also compatible with all database storage types used by CopyStorm – Base64 strings, chunked files, and CS:Govern-encrypted files. Just as there are no practical limits in how large a document can be stored in the primary CopyStorm database or a Snapshot, there are also no practical limits on how large a document can be exported to the file system (as long as you have enough disk to store the file!).
Running Base64 Export in a GUI
To run the Base64 Export tool in the CopyStorm/Medic GUI, open the tool from the main menu using Other Tools => Data Synchronization => Export Base64 Objects to File System. This will open the following GUI:
The DataSet Selector drop-down box in the top-left of this tool contains a listing that includes the primary CopyStorm Database along with any defined Snapshot policies that the user has defined. In the diagram above, the primary CopyStorm Database has been selected which means that the Base64 Exporter will export files stored in the main CopyStorm data tables.
Below that drop-down box is the list of Available Table Fields. The Available Table Fields list allows selection of the specific File or Attachment object and field(s) to be exported.
Base64 Export Parameters
When exporting content out of the source DataSet and into the file system, the tool, minimally, must know where to place those exported files. CopyStorm/Medic creates the output location by combining the “Output Path” folder path and a rule that specifies which sub-folders, if any, the files should be exported into. The Output Path setting is shown in the GUI screen shot above, and the sub-folder rule is provided by the Folder Names drop-down box. This drop-down has two choices:
- No Sub-Folders: Files are exported into a flat structure in the Output Path folder location.
- ObjectType: Files are exported into a folder whose name is based upon the Salesforce object name. For example, all ContentVersion documents will be exported into a folder named [Output Path]/ContentVersion/.
As not all File objects in Salesforce provide a file name field, the Default File Extension is used to provide a reasonable default value to use in cases where a file name cannot be established.
The Decrypt Compliance Data checkbox is used to tell the Base64 Export tool whether it should attempt to decrypt a Base64 field that has been encrypted with CS:Govern. If the field is encrypted with CS:Govern, then this checkbox will cause CopyStorm/Medic to attempt to decrypt the field values – if you have permission to view the plaintext for the fields, then it will export the file data. If you do not have permission to view the plaintext, CopyStorm/Medic will export the CS:Govern masked value.
File Rules
To configure rules for exporting files from a specific field, click on the field name in the left-hand side of the Base64 Exporter. This will display a dialog that looks like the following:
This dialog allows configuration of a number of parameters, selecting which records should be exported along with setting restrictions to limit the total disk consumption used by the exporter. The Base64 Export tool allows limits on both the number of files to be exported as well as the total disk space taken up by the export process. It is important to know that these limits are per-object based. For example, if you enter 5 into the Max Exported Files field for the Attachment object along with a value of 15 for the ContentVersion object, and there are more than five Attachment and more than 15 ContentVersion records that match the selection parameters, then 20 total files will be written. There will be 5 Attachment files written to the disk, and 15 ContentVersion records similarly written.
A blank value for the Max Exported Files sets there to be no limit – ALL scanned candidate records will be exported for all objects and records matching the selection criteria. This is also true for the Max Disk Space Consumption parameter – if this parameter is set to blank, then CopyStorm/Medic will not limit the amount of disk used by the export.
Having values entered for BOTH the Max Exported Files as well as the Max Disk Space Consumption data entry fields causes both rules to be applied – whichever limit is exceeded first will stop the processing of the object type.
Having limits on the Max Exported Files as well as the Max Disk Space Consumption helps you prevent runaway situations where millions of files or terabytes of data match the criteria – filling up all space on the output drive. You can also play with the tool with a small number for one or the other (or both) and, once comfortable with how it works, leave those data entry fields blank to allow for a complete export.
File Name and Duplicate (Clobbering) Rules
The Base64 Export tool will use the following logic to determine the name of the file to be exported to the file system:
- First the tool will look for a title field in the scanned candidate record. If the title field is not blank, then the title field value will be used. If there is no title field or the title field is blank, the SalesforceId will be used as the file name.
- If the selection for the File Names drop-down box is PreserveFileName then we are done and we know the name of the file to be exported – it is the one from step (1).
- If the selection for the File Names drop-down box is ID_Filename and a title field value was found in step (1), the file name will be the Salesforce record Id followed by an underscore and followed by the title found in step (1). Otherwise the filename is just the Salesforce Id.
- If the selection for the File Names drop-down box is Object_ID_Filename and a non-blank title was found in step (1) then the file name will be the Salesforce record’s Object name followed by an underscore and then followed by the Salesforce record Id followed by an underscore and followed by the title found in step (1). Otherwise it will be the Salesforce record Object name followed by an underscore and then followed by the Salesforce record Id.
If exporting files to a directory that already has files in it, then CopyStorm/Medic may create new output files and not overwrite existing files, or CopyStorm/Medic may overwrite the existing files with a new export.
These rules also apply to exports that may include duplicate file names – for example:
- Files were exported to the target folder from an earlier CopyStorm/Medic session.
- A target Output Path was selected which just happened to already contain some files some of which some just happened to have the same name as files to be exported.
- The user clicks the Export button multiple times – each click causing the tool to re-scan and re-export candidate records.
- The user has selected the No Sub-Folders option for the Folder Names drop-down box and there happen to be similarly titled documents in Salesforce even though the Object types are different. For example, there might be an Attachment called “Company Regulations.pdf” as well as a ContentVersion object with that exact same name.
The Duplicate Rule drop-down box is used to configure the Base64 Export tool to handle these situations. Here are the options:
- Skip: If the tool attempts to write a file to the file system that already exists in the target folder, it will simply skip exporting that file (in other words it will not clobber the existing file.)
- Overwrite: If the tool attempts to write a file to the file system that already exists in the target folder, it will simply overwrite that file (in other words it will clobber the existing file.)
- Version: If the tool attempts to write a file to the file system that already exists in the target folder it will suffix the new file with the conventional syntax for such cases – a space followed by parentheses that enclose a monotonically increasing number. Behind the scenes the tool will continuously bump that number until a unique file name is generated.
Record Selector Rules
Next to the File Rules tab in the screen shot above of the Base64 Export Rules dialog, there is a tab called Select Records. Clicking on that tab will show a view that looks like this:
Like the File Rules, the Select Records tab defines rules that are specific to the selected Object from the Available Table Fields list. These rules, however, have nothing to do with how and where files are stored in the target file system. Instead they are used to determine which records in the selected DataSet are candidates for export. This is a much more strategic and surgical rule than the file rules, and the functionality provides similar options as in the CopyStorm/Restore application. To view the CopyStorm/Restore documentation for record selection, you can go here (Records to Restore Editor).
Note that not all of the UI elements used in the dialogs for Restore apply to Base64 Export.
Point-In-Time Export From Snapshot
Beyond selecting which records should be exported, the Base64 Exporter also allows point-in-time selection of matching records. The following screen shot shows available options for Point-In-Time Export. These are available when a Snapshot is selected in the DataSet Selector:
When selecting a Snapshot from the DataSet Selector, you have options for determining which points in time the Base64 field values should be exported from. The screen shot above is showing the A Point In Time selection. Here are the available choices:
- Most Recent
- A Point In Time
- A Single Date
- Between Two Times
An in-depth discussion of how the CapStorm Snapshot sub-system works can be found here (How do CopyStorm Archives Work).
The following shows available options when the Most Recent selection is made:
The following screen shot shows available options when the A Single Date selection is made:
The following screen shot shows available options when the Between Two Times selection is made:
If either of the following options are selected, it is possible that multiple files match the selection criteria:
- A Single Date: The time span starts at midnight of the selected date and ends just prior to midnight of the following day – resulting in a 24 hour period.
- Between Two Times: The time span starts at the specified start time and ends at the specified end time.
In those two cases it is very possible that there are many records for the selected Salesforce object of a specified Salesforce Id. To choose which version(s) of the record to export, you can select the Earliest, the Latest, or All, versions of that Salesforce Id.
Adding a Field to the Base64 Exporter
Available tables are provided by a Base64ExportTableRuleRegistry.xml configuration file. This file contains metadata that defines which Salesforce object fields are candidates for the Base64 Export process, and contains the following built-in definitions.
To add a new table to the Base64 Exporter, create a new Base64ExportTableRuleRegistry.xml file and add it to the [CopyStormMedic]/config/ directory.
Each rule in this XML document specifies the following:
- The name of the Salesforce object (tableName)
- The name of the Salesforce Base64 field (contentsFieldName)
- The name of the Salesforce field whose value is the title of the document (titleFieldName)
- The name of the Salesforce field whose value is the length (in bytes) of the document (contentLengthFieldName)
All attributes are required with the exception of the titleFieldName attribute. If the titleFieldName attribute is not provided, the Salesforce Id field value will be used to name the exported files.
Running Base64 Export as a Batch Job
The Base64 Export tool cannot be executed as a batch job.