MySQL and MariaDB Data Import - Accented characters (letters) are replaced by question marks or other characters after upgrading to New York release


Description

After upgrading to New York some of your MySQL / MariaDB data sources can experience issues while loading data if you have an old version of MySQL DB server which can not negotiate on the collation type.

You may experience the international accented characters as question marks. For example, instead of à, è, etc, you may observe question marks (?) within strings.

Release or Environment

New York and onwards

Cause

New York release bundles a new and improved version of MySQL / MariaDB JDBC driver, and it's also included with MID Server. When MariaDB JDBC driver initiates a connection against a DB server, it also detects what the target DB server is configured for character set and collation type. Some old deployments of MySQL server seem to be incompatible for this process and the connection collation type and character set falls back to a collation and character set which is not UTF-8. Then, MariaDB JDBC driver processes this result set assuming it's UTF-8, which causes incorrect character conversions.

Resolution

In order to instruct the DB to send the result set in UTF-8, you will need to do additional configuration on the data source. To get the form ready, you will need to add the "Connection URL" field to the data source form. In order to do that;

  1. Open your MySQL / MariaDB data source record.
  2. Right-click on the gray header of the form and select "Configure -> Form layout"
  3. Select "Connection URL" from the left list and click on the ">" button to add it to the form. You can move the field to where you want in the form.

After you have made the field visible, you can start configuring your MySQL / MariaDB data source.

  1. Make sure you have all the required fields populated to connect to your MySQL data source.
  2. Check the value of the "Connection URL" field, and confirm that it's populated. (Please take a look at the Additional Information section on this KB article as well)
  3. Append the following string at the end of "Connection URL", and save the data source record:

?sessionVariables=collation_connection=utf8_swedish_ci,character_set_connection=utf8,character_set_client=utf8,character_set_results=utf8

The end result should look like:

jdbc:mysql://10.10.10.10:3306/DB_NAME?sessionVariables=collation_connection=utf8_swedish_ci,character_set_connection=utf8,character_set_client=utf8,character_set_results=utf8

 

Then import your data one more time. This time accented characters should work fine.

Additional Information

"Connection URL" on Data Source form is an auto-populated field, which is filled with the values gathered from other fields, and in normal circumstances, it shouldn't be modified manually. If you make a change on any other field values (like DB Name) then the field content will change after the save operation.