DeutschEnglish

Submenu

 - - - By CrazyStat - - -

19. March 2019

PHP: DateTime::createFromFormat fails for string read from CSV

Filed under: PHP — Tags: , , , , , — Christopher Kramer @ 19:31

I wrote a small PHP script to import a dozen csv files exported from Excel into a database. The CSV import basically looked like this:

<?php
$f = "file.csv";	
if (($handle = fopen($f, "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
         $date = DateTime::createFromFormat('d.m.Y', $data[0]);
         // Inserting this along with other data into a DB
    }
} ?>

And now what happened is that $date was false. So I fetched the errors like this:

var_dump(DateTime::getLastErrors());

And this returned:

array(4) {
  ["warning_count"]=>
  int(0)
  ["warnings"]=>
  array(0) {
  }
  ["error_count"]=>
  int(1)
  ["errors"]=>
  array(1) {
    [0]=>
    string(22) "Unexpected data found."
  }
}

So I added var_dump($date), but it gave string(13) "31.01.2019", which looked right. But looking closely, the string length of 13 seems a bit long for a 10 character date, right? I tried trim() , but without luck. And then I remembered that I had a similar problem before where invisible empty space was due to the UTF-8 Byte Order Mark (BOM). This is a sequence of “inivisible” bytes at the beginning of a textfile that define in which unicode encoding the file is (UTF-8, UTF-16, …) and its endianess (big-endian or little-endian). Microsoft Office programs such as Excel or Word like to write this to the beginning of a file, but other programs may do so as well.

So the solution is simple: In the first line, strip the BOM if it is there:

<?php
$f = "file.csv";
$bom = pack('CCC', 0xEF, 0xBB, 0xBF);
$firstline=true;
if (($handle = fopen($f, "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
         if ($firstline and substr($data[0], 0, 3) === $bom)
             $data[0] = substr($data[0], 3);
         $firstline=false;
         $date = DateTime::createFromFormat('d.m.Y', $data[0]);
         // Inserting this along with other data into a DB
    }
} ?>

So this just checks whether the first three bytes in the file match the UTF-8 BOM added by Excel and in case it detects them, it remove these bytes. Now the date parses fine. If your file has a different BOM, e.g. for UTF-16, you may need to change the definition of $bom. Just check your file in a hex-editor to find the first three bytes. This is PSPad, a great text-editor that includes a HEX-editor:

Note the first three bytes EF BB BF, which are the BOM.

If this helped you to solve your problem faster, please drop a comment below. This motivates me to keep writing these articles.

Recommendation

Try my Open Source PHP visitor analytics script CrazyStat.

9. June 2012

Typo3 and other charsets than UTF-8 (latin1 / ISO-8859-1, …)

Filed under: PHP,Server Administration,Typo3 — Tags: , , , , , , , — Christopher Kramer @ 12:30

When updating a Typo3 installation to Typo3 4.5.x, I had problems with charsets and explained the solution here.

Now updating an installation of Typo3 to 4.6.x, I ran into another charset problem: The backend now was completely UTF-8 and therefore, changing texts in the backend caused them to be stored as UTF-8. As the frontend was still ISO-8859-1, special characters (Umlaute) over there got messed up. Maybe there is a way out of this as well ($TYPO3_CONF_VARS['BE']['forceCharset'] I guess), but this clearly shows that Typo3-developers drop support for other charsets slowly and that it might be easier to switch to UTF-8.

In the release notes of Typo3 4.5, I found the following passage:

UTF8 by default: New installations will use UTF8 automatically. Keep in mind that we will be deprecating all other charsets in the release of 4.5, but still support those charsets. 4.7 or maybe even 4.6 will be the first “UTF-8 only” release. When upgrading from older releases to 4.5, you will have to specifically set $TYPO3_CONF_VARS['BE']['forceCharset'] and $TYPO3_CONF_VARS['BE']['setDBinit'] in your localconf.php. An Upgrade Wizard will help you with that.

In the release notes of Typo3 4.6, I could not find a word about UTF-8, but in the release notes of 4.7, it is clearly stated:

check you database if it is utf-8 encoded – TYPO3 4.7 only will work with utf-8.
[…]
The forceCharset option has been deprecated in version 4.5. UTF-8 is now enforced. Even though other values than “utf-8” have not been possible anymore for some time, the option’s value has been queried at plenty of places within the whole core. These references, the option in the Install Tool, as well as many defaults with charset “iso-8859-1” in several classes have been changed, so TYPO3 now works UTF-8-only internally.

So it is clearly time to make the switch.

It is not that complicated – everything is described very well over here.

As the official wiki is very long and explains lots of stuff you might just not care, here are the basic steps:

  • Backup Database and Files
  • Set the charset in your webserver (e.g. “AddDefaultCharset utf-8” in a .htaccess)
  • Adjust some settings in localconf.php:
    // For backend charset
     $TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8';
     $TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;'; 
    
     // For GIFBUILDER support
     // Set it to 'iconv' or 'mbstring'
     $TYPO3_CONF_VARS['SYS']['t3lib_cs_convMethod'] = 'mbstring';
     // For 'iconv' support you need at least PHP 5.
     $TYPO3_CONF_VARS['SYS']['t3lib_cs_utils'] = 'mbstring';
  • Adjust your typoScript (change language to your needs):
    config.locale_all = de_DE.utf-8
  • Convert your templatefiles to UTF-8 (and remap them if you use TemplaVoila) – usually in fileadmin/templates
  • Convert your DB to UTF-8
    1. Backup it first if you have not yet (believe me!)
    2. Paste this tool into fileadmin
    3. Run it by opening it in the browser (http://example.com/fileadmin/db_utf8_fix.php)
    4. If everything says “OK”, change the constant “SIMULATE” to false
    5. Run it again
    6. Clean cache of Typo3
    7. Check your site (esp. special characters). If the content is messed up or parts are missing, do the following:
      1. Restore the backup of the database (yes, I told you!)
      2. Uncomment lines 108 – 123 in db_utf8_fix.php
      3. Run it in browser againClean cache in Typo3
    8. Clean all cache in Typo3 Backend

You can find more detailed information here. There are also lots of other ways described how to convert the database.

Happy converting!

 

Update 2014-05-05: Changed link to db_utf8_fix-script as the original site is reported to be attacked and does not host the script anymore. I cannot check if the script at snipplr is exactly the same, but it looks so.