Google Sheets allows you to import data directly from HTTPS in the formula field. An example could be:
You would hope the data would look like as follows:
However, it imports like this:
If anyone has some insights why that is the case, be great to let us know
I have investigated this further. Importing via IMPORTDATA() with the file in UTF-8 encoding does not have this issue. Importing with the file import feature (File > Import) of sheets also does not have this issue with either UTF-8 or UTF-16. It seems IMPORTDATA() does not support UTF-16 encoding.
The changes made here: https://github.com/matomo-org/matomo/issues/8898#issuecomment-148374890 let you import directly.
@jmumby Welcome in the hell of CSV files and Excel.
For whatever insane reasons Excel cannot read a simple UTF-8 file that contains text separated by simple ";" or "," characters.
If you try to open this very simple UTF-8 file as a example.csv in Excel, it won't work any will only see corrupted characters:
product,€ test,2 something,20
(I can't test this myself as I don't have Windows, but when I tested this years ago, I already lost enough nerves)
The only way one can output a text file that is readable and writable* by Excel is to encode it as UTF-16 and add a Byte-Order-Mark (BOM) at the start of the file (which is a simple FEFF character that should not be needed by any software). And to make Matomo work with Excel, this is what Matomo does at the reporting API:
➜ ~/tmp curl "https://demo.matomo.org/index.php?date=yesterday&expanded=1&filter_limit=10&format=CSV&idSite=62&language=de&method=UserCountry.getContinent&module=API&period=day&segment=&token_auth=anonymous&translateColumnNames=1" -o test.csv ➜ ~/tmp file -kr test.csv test.csv: CSV text - , Little-endian UTF-16 Unicode text, with very long lines ➜ ~/tmp hexdump test.csv -n 16 0000000 feff 0042 0065 007a 0065 0069 0063 0068 0000010
Now interestingly not even
less can read the file anymore as it is detected as a binary format:
➜ ~/tmp less test.csv "test.csv" may be a binary file. See it anyway? <FF><FE>B^<a class='mention' href='https://github.com/e'>@e</a>^<a class='mention' href='https://github.com/z'>@z</a>^<a class='mention' href='https://github.com/e'>@e</a>^<a class='mention' href='https://github.com/i'>@i</a>^<a class='mention' href='https://github.com/c'>@c</a>^<a class='mention' href='https://github.com/h'>@h</a>^<a class='mention' href='https://github.com/n'>@n</a>^<a class='mention' href='https://github.com/u'>@u</a>^<a class='mention' href='https://github.com/n'>@n</a> [...]
And this is also the reason why Google sheets doesn't work anymore. Google doesn't detect a normal text encoding at the beginning of the file, so it falls back to Windows-1252 where FF FE corresponds to ÿþ
I don't really have a solution for this apart from adding at GET parameter that makes Matomo output plain normal UTF-8 files.
* bonus fun fact: There are combinations of encodings Excel can read, but then not write again or even worse, write but then not read again.
bonus bonus fun fact: The behavior of Excel also differs between versions and whether you open the file normally or use the import CSV dialog.
Adding an encoding parameter in the request may be a good solution.