DeutschEnglish

Submenu

 - - - By CrazyStat - - -

9. May 2014

Updating to PHP 5.4 causes missing Text

Filed under: PHP,Server Administration — Tags: , , , , , , — Christopher Kramer @ 14:29

After updating from PHP5.3 to PHP 5.4, on some sites text was missing. No error could be found in the error log so I had to dig into the code to find out what was going on.

The root cause is that with PHP5.4, the default character set expected by htmlentites(), htmlspecialcharacters() and html_entity_decode() changed from ISO-8859-1 to UTF-8. So if a script passes ISO-8859-1 characters like German “Umlaute” (öäüÖÄÜß) to one of these functions without specifying the charset with the corresponding parameter, these functions will return an empty string. And unfortunately, with PHP 5.4, they also removed the error message that PHP 5.3 recorded in the logfile in this case. This makes finding the problem a lot more difficult.

So what can you do about it? You could

  1. Use PHP 5.3 😉
    Here is a blog post on downgrading to PHP 5.3. on Debian Wheezy
  2. change the used charset to UTF-8
    This might require changing the character set in files, databases or config files, depending on what is used on the site.
    I explained in a blog post how to change the charset in Typo3 to UTF-8 back in 2012.
  3. Provide ISO-8859-1 as a parameter to all calls of htmlspecialcharacters() etc.

So for the third option, what you have to do is find places like this:

htmlspecialchars($string);

And replace them with something like:

htmlspecialchars($string, ENT_COMPAT | ENT_XHML, 'ISO-8859-1');

The problem is that it’s hard to do this automatically. What is easy to do, is replace all htmlspecialchars()-calls with calls to htmlspecialchars_PHP5-3() etc. and place these functions there:

function htmlspecialchars_PHP5-3($string, $ent=ENT_COMPAT, $charset='ISO-8859-1') {
    return htmlspecialchars($string, $ent, $charset);
}

function htmlentities_PHP-5-3($string, $ent=ENT_COMPAT, $charset='ISO-8859-1') {
    return htmlentities($string, $ent, $charset);
}

function html_entity_decode_PHP-5-3($string, $ent=ENT_COMPAT, $charset='ISO-8859-1') {
    return html_entity_decode($string, $ent, $charset);
}

So just do a search & replace over all files and make sure that all scripts have a file included that contains these functions.

Recommendation

Try my Open Source PHP visitor analytics script CrazyStat.

9. June 2012

Typo3 and other charsets than UTF-8 (latin1 / ISO-8859-1, …)

Filed under: PHP,Server Administration,Typo3 — Tags: , , , , , , , — Christopher Kramer @ 12:30

When updating a Typo3 installation to Typo3 4.5.x, I had problems with charsets and explained the solution here.

Now updating an installation of Typo3 to 4.6.x, I ran into another charset problem: The backend now was completely UTF-8 and therefore, changing texts in the backend caused them to be stored as UTF-8. As the frontend was still ISO-8859-1, special characters (Umlaute) over there got messed up. Maybe there is a way out of this as well ($TYPO3_CONF_VARS['BE']['forceCharset'] I guess), but this clearly shows that Typo3-developers drop support for other charsets slowly and that it might be easier to switch to UTF-8.

In the release notes of Typo3 4.5, I found the following passage:

UTF8 by default: New installations will use UTF8 automatically. Keep in mind that we will be deprecating all other charsets in the release of 4.5, but still support those charsets. 4.7 or maybe even 4.6 will be the first “UTF-8 only” release. When upgrading from older releases to 4.5, you will have to specifically set $TYPO3_CONF_VARS['BE']['forceCharset'] and $TYPO3_CONF_VARS['BE']['setDBinit'] in your localconf.php. An Upgrade Wizard will help you with that.

In the release notes of Typo3 4.6, I could not find a word about UTF-8, but in the release notes of 4.7, it is clearly stated:

check you database if it is utf-8 encoded – TYPO3 4.7 only will work with utf-8.
[…]
The forceCharset option has been deprecated in version 4.5. UTF-8 is now enforced. Even though other values than “utf-8” have not been possible anymore for some time, the option’s value has been queried at plenty of places within the whole core. These references, the option in the Install Tool, as well as many defaults with charset “iso-8859-1” in several classes have been changed, so TYPO3 now works UTF-8-only internally.

So it is clearly time to make the switch.

It is not that complicated – everything is described very well over here.

As the official wiki is very long and explains lots of stuff you might just not care, here are the basic steps:

  • Backup Database and Files
  • Set the charset in your webserver (e.g. “AddDefaultCharset utf-8” in a .htaccess)
  • Adjust some settings in localconf.php:
    // For backend charset
     $TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8';
     $TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;'; 
    
     // For GIFBUILDER support
     // Set it to 'iconv' or 'mbstring'
     $TYPO3_CONF_VARS['SYS']['t3lib_cs_convMethod'] = 'mbstring';
     // For 'iconv' support you need at least PHP 5.
     $TYPO3_CONF_VARS['SYS']['t3lib_cs_utils'] = 'mbstring';
  • Adjust your typoScript (change language to your needs):
    config.locale_all = de_DE.utf-8
  • Convert your templatefiles to UTF-8 (and remap them if you use TemplaVoila) – usually in fileadmin/templates
  • Convert your DB to UTF-8
    1. Backup it first if you have not yet (believe me!)
    2. Paste this tool into fileadmin
    3. Run it by opening it in the browser (http://example.com/fileadmin/db_utf8_fix.php)
    4. If everything says “OK”, change the constant “SIMULATE” to false
    5. Run it again
    6. Clean cache of Typo3
    7. Check your site (esp. special characters). If the content is messed up or parts are missing, do the following:
      1. Restore the backup of the database (yes, I told you!)
      2. Uncomment lines 108 – 123 in db_utf8_fix.php
      3. Run it in browser againClean cache in Typo3
    8. Clean all cache in Typo3 Backend

You can find more detailed information here. There are also lots of other ways described how to convert the database.

Happy converting!

 

Update 2014-05-05: Changed link to db_utf8_fix-script as the original site is reported to be attacked and does not host the script anymore. I cannot check if the script at snipplr is exactly the same, but it looks so.