I wrote a small PHP script to import a dozen csv files exported from Excel into a database. The CSV import basically looked like this:
<?php $f = "file.csv"; if (($handle = fopen($f, "r")) !== FALSE) { while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) { $date = DateTime::createFromFormat('d.m.Y', $data[0]); // Inserting this along with other data into a DB } } ?>
And now what happened is that $date
was false. So I fetched the errors like this:
var_dump(DateTime::getLastErrors());
And this returned:
array(4) { ["warning_count"]=> int(0) ["warnings"]=> array(0) { } ["error_count"]=> int(1) ["errors"]=> array(1) { [0]=> string(22) "Unexpected data found." } }
So I added var_dump($date)
, but it gave string(13) "31.01.2019"
, which looked right. But looking closely, the string length of 13 seems a bit long for a 10 character date, right? I tried trim()
, but without luck. And then I remembered that I had a similar problem before where invisible empty space was due to the UTF-8 Byte Order Mark (BOM). This is a sequence of “inivisible” bytes at the beginning of a textfile that define in which unicode encoding the file is (UTF-8, UTF-16, …) and its endianess (big-endian or little-endian). Microsoft Office programs such as Excel or Word like to write this to the beginning of a file, but other programs may do so as well.
So the solution is simple: In the first line, strip the BOM if it is there:
<?php $f = "file.csv"; $bom = pack('CCC', 0xEF, 0xBB, 0xBF); $firstline=true; if (($handle = fopen($f, "r")) !== FALSE) { while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) { if ($firstline and substr($data[0], 0, 3) === $bom) $data[0] = substr($data[0], 3); $firstline=false; $date = DateTime::createFromFormat('d.m.Y', $data[0]); // Inserting this along with other data into a DB } } ?>
So this just checks whether the first three bytes in the file match the UTF-8 BOM added by Excel and in case it detects them, it remove these bytes. Now the date parses fine. If your file has a different BOM, e.g. for UTF-16, you may need to change the definition of $bom
. Just check your file in a hex-editor to find the first three bytes. This is PSPad, a great text-editor that includes a HEX-editor:
Note the first three bytes EF BB BF
, which are the BOM.
If this helped you to solve your problem faster, please drop a comment below. This motivates me to keep writing these articles.