Hello,
one of our clients uses matomo (I do not know which version exactly), and some HTTP responses fail when downloading reports, because of charset issues in the Content-Disposition header.
Here's a hex dump of one of those responses:
00000000 43 6f 6e 74 65 6e 74 2d 44 69 73 70 6f 73 69 74 Content-Disposit
00000010 69 6f 6e 3a 20 61 74 74 61 63 68 6d 65 6e 74 3b ion: attachment;
00000020 20 66 69 6c 65 6e 61 6d 65 3d 22 45 78 70 6f 72 filename="Expor
00000030 74 20 5f 20 4d 61 69 6e 20 6d 65 74 72 69 63 73 t _ Main metrics
00000040 20 5f 20 44 65 63 65 6d 62 65 72 20 31 33 2c 20 _ December 13,
00000050 32 30 32 30 20 e2 80 93 20 4a 61 6e 75 61 72 79 2020 – January
00000060 20 31 31 2c 20 32 30 32 31 2e 63 73 76 22 0d 0a 11, 2021.csv"..
00000070 54 72 61 6e 73 66 65 72 2d 45 6e 63 6f 64 69 6e Transfer-Encodin
00000080 67 3a 20 63 68 75 6e 6b 65 64 0d 0a 43 6f 6e 74 g: chunked..Cont
right after "2020", there's the –
character, which is an en dash encoded as e2 80 93
in UTF8.
According to https://tools.ietf.org/html/rfc6266#section-4, when using the filename=""
format, the name between double quotes should be (https://tools.ietf.org/html/rfc2616#section-2.2) in ISO-8859-1 charset, or in RFC 2047 format, like this: =?iso-8859-1?q?this is some text?=
(for what it's worth, I never see anything in that format lately)
If the filename must include UTF-8 characters, it should use the filename*=""
option, like this: UTF-8''%c2%a3%20and%20%e2%82%ac%20rates
(cf https://tools.ietf.org/html/rfc5987#section-3.2.2 )
(the exact format is defined in https://tools.ietf.org/html/rfc5987#section-3.2.2 )
Unfortunately, I do not control this deployment of matomo, so my ability to test patches is limited, but I can request further information.
maybe related to #9580
the query was generated with a call to a URL with this format:https://domain/index.php?date=2020-12-13,2021-01-11&expanded=1&filter_limit=-1&format=CSV&format_metrics=1&idSite=1&language=en&method=API.get&module=API&period=day&token_auth=<token>&translateColumnNames=1
Hi,
I think the code responsible for this is the following:
https://github.com/matomo-org/matomo/blob/c870770157a3e9c893308967dc274c8feac5d4be/core/DataTable/Renderer/Csv.php#L311-L321
It just generates a nice filename and then puts it into the header without caring about the right encoding.
If you have an idea how this could be fixed, it would be great if you could create a PR.
one of our clients uses matomo and some HTTP responses fail when downloading reports,
Hi @Geal does the download not work at all in this case? Do you know what browser is being used there or is it maybe some server that fetches the file?
Downloads failed because the HTTP response went through our reverse proxy which rejects invalid headers. It is independent of the browser that is used, it can even be reproduced with a curl command.
The utf-8 data can be transformed properly with rawurlencode: https://stackoverflow.com/a/25704866
The code should know the actual encoding of the filename (ascii, iso 8859 1, utf-8 or others) and specify it in the header. Are there any guarantees on the encoding used in matomo?
AFAIK the encoding should be UTF8 or UTF 16 (there should be some parameter to request data in UTF 18). Since it seems like an easy fix will schedule this issue. Cheers @Geal