Though MySQL claims support for all UTF-8 characters, experience indicates that not all MySQL variants support the full extended 4 byte UTF-8 character set. When MySQL fails to recognize an extended UTF-8 character and error like this will be displayed:
java.sql.BatchUpdateException: Incorrect string value: ‘\xF0\x9F\x98\x8A’ for column ‘CommentBody’ at row 48
To get around this issue, you can configure CopyStorm to replace unsuppported 4 byte UTF-8 characters with a substitution.
Note: The extended 4 byte UTF-8 character set contains like 100 variations of a smiley face, trees, hamburgers, etc…
Character mappings are defined in a “UTF8Tool.xml” configuration file that can be placed in either the “config” subdirectory of the CopyStorm installation or in a custom config directory you have created. The following UTF8Tool.xml will cause CopyStorm to replace the 4 byte UTF-8 characters for two smiley faces to be converted to their more old fashioned counterparts:
This rule will cause any text containing the specified smiley face characters to have the characters be replaced by “:)” or “:(” before data is written to the database.
Since MySQL reports UTF-8 problems using a hex byte sequence, you will need a way to convert hex byte sequences to a character suitable for XML. Capstorm has used the tool located at:
To use the tool enter the hex digits from MySQL into the box labeled “UTF-8 Code” (e.g. 20 E2 88 9A) and press the Convert button. Copy the generated unicode character into the “from” parameter in UTF8Tool.xml. Note that you can use the regular UTF-8 character in the “from” parameter rather than the decimal or hex representation.