:::: MENU ::::
Monthly Archives: February 2018

Importing a CSV UTF-8 – BOM (Byte Order Mark) Microsoft Excel.

Before I start, here is a little credit to @mark_hamstra for putting me on the right track with fixing this issue today.

CSVs

Although CSVs seem like simple files for transferring data and importing into websites, they actually end up being quite complicated when you start to need to convert to UFT-8.

EXCEL

From what I can tell, one of the biggest issues around CSVs originates from Excel. In fact it was only exporting CSV files from Excel that I was having any issues with.

None of the other programs I was using seemed to add a BOM at the start of the file. Numbers, Open Office and Google Sheets all seemed to work fine.

Solution

There are a couple of solutions that I found worked for me.

Solution 1 (courtesy of https://stackoverflow.com/questions/32184933/remove-bom-%C3%AF-from-imported-csv-file )

$file = ‘something.csv';
$content = file_get_contents($file);
var_dump(removeBomUtf8($content));
function removeBomUtf8($s){
   if(substr($s,0,3)==chr(hexdec('EF')).chr(hexdec('BB')).chr(hexdec('BF'))){
      return substr($s,3);
    } else {
      return $s;
    }
}

Solution 2

If you know for a fact that your file always has a BOM, you can simply use an fseek() to skip the first 3 bytes, which is the length of the BOM. This solution works fine, however you need to be completely sure that the csv file always has a BOM. If it doesn’t you will probably run into problems.

$file = fopen(“nam_of_cv.csv”, "r");
fseek($file, 3);