Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When importing a CSV from the API with Google Sheets IMPORTDATA the data is garbled #16166

Closed
jmumby opened this issue Jul 1, 2020 · 11 comments
Labels
answered For when a question was asked and we referred to forum or answered it. Bug For errors / faults / flaws / inconsistencies etc.
Milestone

Comments

@jmumby
Copy link

jmumby commented Jul 1, 2020

Google Sheets allows you to import data directly from HTTPS in the formula field. An example could be:

=IMPORTDATA("https://example.com/analytics/index.php?date=yesterday&filter_limit=10&flat=1&format=CSV&idCustomReport=3&idSite=5&language=en&method=CustomReports.getCustomReport&module=API&period=day&reportUniqueId=CustomReports_getCustomReport_idCustomReport--3&segment=&token_auth=XYZ&translateColumnNames=1")

You would hope the data would look like as follows:

image

However, it imports like this:

image

https://support.google.com/docs/answer/3093335?hl=en

@tsteur tsteur added the Bug For errors / faults / flaws / inconsistencies etc. label Jul 1, 2020
@tsteur
Copy link
Member

tsteur commented Jul 1, 2020

If anyone has some insights why that is the case, be great to let us know

@jmumby
Copy link
Author

jmumby commented Jul 1, 2020

I have investigated this further. Importing via IMPORTDATA() with the file in UTF-8 encoding does not have this issue. Importing with the file import feature (File > Import) of sheets also does not have this issue with either UTF-8 or UTF-16. It seems IMPORTDATA() does not support UTF-16 encoding.

@jmumby
Copy link
Author

jmumby commented Jul 2, 2020

The changes made here: #8898 (comment) let you import directly.

@Findus23
Copy link
Member

Findus23 commented Jul 2, 2020

@jmumby Welcome in the hell of CSV files and Excel.
https://donatstudios.com/CSV-An-Encoding-Nightmare

For whatever insane reasons Excel cannot read a simple UTF-8 file that contains text separated by simple ";" or "," characters.

If you try to open this very simple UTF-8 file as a example.csv in Excel, it won't work any will only see corrupted characters:

product,€
test,2
something,20

(I can't test this myself as I don't have Windows, but when I tested this years ago, I already lost enough nerves)

The only way one can output a text file that is readable and writable* by Excel is to encode it as UTF-16 and add a Byte-Order-Mark (BOM) at the start of the file (which is a simple FEFF character that should not be needed by any software). And to make Matomo work with Excel, this is what Matomo does at the reporting API:

~/tmp curl "https://demo.matomo.org/index.php?date=yesterday&expanded=1&filter_limit=10&format=CSV&idSite=62&language=de&method=UserCountry.getContinent&module=API&period=day&segment=&token_auth=anonymous&translateColumnNames=1" -o test.csv
➜  ~/tmp file -kr test.csv
test.csv: CSV text
- , Little-endian UTF-16 Unicode text, with very long lines
➜  ~/tmp hexdump test.csv -n 16
0000000 feff 0042 0065 007a 0065 0069 0063 0068
0000010

Now interestingly not even less can read the file anymore as it is detected as a binary format:

➜  ~/tmp less test.csv 
"test.csv" may be a binary file.  See it anyway? 
<FF><FE>B^@e^@z^@e^@i^@c^@h^@n^@u^@n
[...]

And this is also the reason why Google sheets doesn't work anymore. Google doesn't detect a normal text encoding at the beginning of the file, so it falls back to Windows-1252 where FF FE corresponds to ÿþ

I don't really have a solution for this apart from adding at GET parameter that makes Matomo output plain normal UTF-8 files. (Update from the future: &convertToUnicode=0 is exactly that parameter that makes the Matomo API output regular UTF-8)

* bonus fun fact: There are combinations of encodings Excel can read, but then not write again or even worse, write but then not read again.
bonus bonus fun fact: The behavior of Excel also differs between versions and whether you open the file normally or use the import CSV dialog.

@jmumby
Copy link
Author

jmumby commented Jul 2, 2020

Adding an encoding parameter in the request may be a good solution.

@tsteur
Copy link
Member

tsteur commented Jul 29, 2020

The changes made here: #8898 (comment) let you import directly.

@jmumby do you remember if both changes where needed or was maybe &convertToUnicode=0 change enough?

@mattab mattab modified the milestones: 4.0.0 RC, 4.1.0 Jul 29, 2020
@jmumby
Copy link
Author

jmumby commented Jul 29, 2020

The changes made here: #8898 (comment) let you import directly.

@jmumby do you remember if both changes where needed or was maybe &convertToUnicode=0 change enough?

It appears to work with application/vnd.ms-excel

@tsteur
Copy link
Member

tsteur commented Jul 29, 2020

Sweet. so maybe we don't need to change anything and only document to set the URL parameter &convertToUnicode=0 to make this work @jmumby @mattab

mattab added a commit to matomo-org/developer-documentation that referenced this issue Aug 17, 2020
*   **convertToUnicode**; by default (when set to 1), the output is sent in `UTF-16LE`. If set to 0, then the output will be encoded in `UTF-8`.

refs matomo-org/matomo#16166 (comment)
@sgiehl
Copy link
Member

sgiehl commented Nov 27, 2020

Can we close this issue? Guess nothing needs to be changed in the code. We could maybe only update this FAQ https://matomo.org/faq/new-to-piwik/how-to-import-matomo-data-in-to-google-data-studio/ or create a new one for importing into google spreadsheet directly...

@tsteur
Copy link
Member

tsteur commented Nov 29, 2020

Ideally we do both maybe @sgiehl

@mattab
Copy link
Member

mattab commented Nov 29, 2020

fyi the FAQ for google data studio already has the hint see: Replace the “EXPORT_URL” in the above code with the URL copied in STEP 2 and append &convertToUnicode=0 at the end of the URL

I suppose this could be closed, unless we wanted to cut & paste the first 2 sections of the data studio FAQ into a new "How do I import Matomo reports in Google sheets?" and cross-link the 2 FAQs?

@tsteur tsteur closed this as completed Nov 30, 2020
@tsteur tsteur added the answered For when a question was asked and we referred to forum or answered it. label Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered For when a question was asked and we referred to forum or answered it. Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

5 participants