DeutschEnglish

Submenu

 - - - By CrazyStat - - -

19. March 2019

PHP: DateTime::createFromFormat fails for string read from CSV

Filed under: PHP — Tags: , , , , , — Christopher Kramer @ 19:31

I wrote a small PHP script to import a dozen csv files exported from Excel into a database. The CSV import basically looked like this:

<?php
$f = "file.csv";	
if (($handle = fopen($f, "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
         $date = DateTime::createFromFormat('d.m.Y', $data[0]);
         // Inserting this along with other data into a DB
    }
} ?>

And now what happened is that $date was false. So I fetched the errors like this:

var_dump(DateTime::getLastErrors());

And this returned:

array(4) {
  ["warning_count"]=>
  int(0)
  ["warnings"]=>
  array(0) {
  }
  ["error_count"]=>
  int(1)
  ["errors"]=>
  array(1) {
    [0]=>
    string(22) "Unexpected data found."
  }
}

So I added var_dump($date), but it gave string(13) "31.01.2019", which looked right. But looking closely, the string length of 13 seems a bit long for a 10 character date, right? I tried trim() , but without luck. And then I remembered that I had a similar problem before where invisible empty space was due to the UTF-8 Byte Order Mark (BOM). This is a sequence of “inivisible” bytes at the beginning of a textfile that define in which unicode encoding the file is (UTF-8, UTF-16, …) and its endianess (big-endian or little-endian). Microsoft Office programs such as Excel or Word like to write this to the beginning of a file, but other programs may do so as well.

So the solution is simple: In the first line, strip the BOM if it is there:

<?php
$f = "file.csv";
$bom = pack('CCC', 0xEF, 0xBB, 0xBF);
$firstline=true;
if (($handle = fopen($f, "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
         if ($firstline and substr($data[0], 0, 3) === $bom)
             $data[0] = substr($data[0], 3);
         $firstline=false;
         $date = DateTime::createFromFormat('d.m.Y', $data[0]);
         // Inserting this along with other data into a DB
    }
} ?>

So this just checks whether the first three bytes in the file match the UTF-8 BOM added by Excel and in case it detects them, it remove these bytes. Now the date parses fine. If your file has a different BOM, e.g. for UTF-16, you may need to change the definition of $bom. Just check your file in a hex-editor to find the first three bytes. This is PSPad, a great text-editor that includes a HEX-editor:

Note the first three bytes EF BB BF, which are the BOM.

If this helped you to solve your problem faster, please drop a comment below. This motivates me to keep writing these articles.

Recommendation

Try my Open Source PHP visitor analytics script CrazyStat.