@mirrorstage opened this Issue on October 2nd 2015

It seems like the CSVs produced by the Reporting API are not "true" CSV: they come as Content-Type: application/vnd.ms-excel/ This is fine for those who want to work in Excel, but other use cases the files cause issues. For example, if you try to use one of these in R, attempts to read it into memory as an object fail.

One suggestion: allow text/csv as the content-type.

I've written up my issues found with Piwik CSVs and R here, with more details about what appears to make R choke: http://forum.piwik.org/read.php?2,129529 Happy to append them to this issue report if needed.

@MCMic commented on October 15th 2015

I got kind of the same problem.
The weird extra bytes at the beginning of CSV answer from the API are causing a lot of tools to crash or say the file is broken.
Also, the encoding is not UTF-8 and I need to convert it. And I have a problem with line endings too, which might be linked.

I end up writing a PHP script getting the PHP serialized export and creating the CSV myself :-/

@mirrorstage commented on October 15th 2015

We found a fix after changing two defaults in the file Csv.php in the Piwik package.
Filepath: ~/piwik/piwik/plugins/API/Renderer/Csv.php

In line 34 $convertToUnicode = Common::getRequestVar('convertToUnicode', true, 'int', $this->request);
change true to false and

in Line 57 Common::sendHeader("Content-Type: application/vnd.ms-excel", true);
change application/vnd.ms-excel to text/csv

We changed the content-type first, but that didn't help with the extra bytes (a UTF-16 BOM, I think). Changing the convertToUnicode flag worked.

However, we didn't reset the content-type to the default before we changed the flag. That means we didn't exactly isolate the problem, so I can't say whether this is solely a unicode issue or whether the unicode flag and the content-type default are interacting somehow. If we have some time in the next few days, I'll see if we can test it.

@mattab commented on November 26th 2015 Member

Thanks for the suggestion. Maybe we could improve this in Piwik 3.0.0! I think it's very important that CSV export create CSV files that work well for all tools. But we need to make sure our CSV files can also contain unicode exotic characters (which is the reason we have implemented this 'convert to unicode feature')

Powered by GitHub Issue Mirror