I wrote a small PHP script to import a dozen csv files exported from Excel into a database. The CSV import basically looked like this:
<?php
$f = "file.csv";
if (($handle = fopen($f, "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
$date = DateTime::createFromFormat('d.m.Y', $data[0]);
// Inserting this along with other data into a DB
}
} ?>
And now what happened is that $date was false. So I fetched the errors like this:
var_dump(DateTime::getLastErrors());
And this returned:
array(4) {
["warning_count"]=>
int(0)
["warnings"]=>
array(0) {
}
["error_count"]=>
int(1)
["errors"]=>
array(1) {
[0]=>
string(22) "Unexpected data found."
}
}
So I added var_dump($date), but it gave string(13) "31.01.2019", which looked right. But looking closely, the string length of 13 seems a bit long for a 10 character date, right? I tried trim() , but without luck. And then I remembered that I had a similar problem before where invisible empty space was due to the UTF-8 Byte Order Mark (BOM). This is a sequence of “inivisible” bytes at the beginning of a textfile that define in which unicode encoding the file is (UTF-8, UTF-16, …) and its endianess (big-endian or little-endian). Microsoft Office programs such as Excel or Word like to write this to the beginning of a file, but other programs may do so as well.
So the solution is simple: In the first line, strip the BOM if it is there:
<?php
$f = "file.csv";
$bom = pack('CCC', 0xEF, 0xBB, 0xBF);
$firstline=true;
if (($handle = fopen($f, "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
if ($firstline and substr($data[0], 0, 3) === $bom)
$data[0] = substr($data[0], 3);
$firstline=false;
$date = DateTime::createFromFormat('d.m.Y', $data[0]);
// Inserting this along with other data into a DB
}
} ?>
So this just checks whether the first three bytes in the file match the UTF-8 BOM added by Excel and in case it detects them, it remove these bytes. Now the date parses fine. If your file has a different BOM, e.g. for UTF-16, you may need to change the definition of $bom. Just check your file in a hex-editor to find the first three bytes. This is PSPad, a great text-editor that includes a HEX-editor:

Note the first three bytes EF BB BF, which are the BOM.
If this helped you to solve your problem faster, please drop a comment below. This motivates me to keep writing these articles.