Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV reports can fail because HTTP Content-Disposition header has invalid characters in the filename field #17209

Closed
Geal opened this issue Feb 10, 2021 · 6 comments
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. Help wanted Beginner friendly issues or issues where we'd highly appreciate community's help and involvement.
Milestone

Comments

@Geal
Copy link

Geal commented Feb 10, 2021

Hello,
one of our clients uses matomo (I do not know which version exactly), and some HTTP responses fail when downloading reports, because of charset issues in the Content-Disposition header.
Here's a hex dump of one of those responses:

00000000      43 6f 6e 74 65 6e 74 2d 44 69 73 70 6f 73 69 74         Content-Disposit
00000010      69 6f 6e 3a 20 61 74 74 61 63 68 6d 65 6e 74 3b         ion: attachment;
00000020      20 66 69 6c 65 6e 61 6d 65 3d 22 45 78 70 6f 72          filename="Expor
00000030      74 20 5f 20 4d 61 69 6e 20 6d 65 74 72 69 63 73         t _ Main metrics
00000040      20 5f 20 44 65 63 65 6d 62 65 72 20 31 33 2c 20          _ December 13,
00000050      32 30 32 30 20 e2 80 93 20 4a 61 6e 75 61 72 79         2020 – January
00000060      20 31 31 2c 20 32 30 32 31 2e 63 73 76 22 0d 0a          11, 2021.csv"..
00000070      54 72 61 6e 73 66 65 72 2d 45 6e 63 6f 64 69 6e         Transfer-Encodin
00000080      67 3a 20 63 68 75 6e 6b 65 64 0d 0a 43 6f 6e 74         g: chunked..Cont

right after "2020", there's the character, which is an en dash encoded as e2 80 93 in UTF8.

According to https://tools.ietf.org/html/rfc6266#section-4, when using the filename="" format, the name between double quotes should be (https://tools.ietf.org/html/rfc2616#section-2.2) in ISO-8859-1 charset, or in RFC 2047 format, like this: =?iso-8859-1?q?this is some text?= (for what it's worth, I never see anything in that format lately)

If the filename must include UTF-8 characters, it should use the filename*="" option, like this: UTF-8''%c2%a3%20and%20%e2%82%ac%20rates (cf https://tools.ietf.org/html/rfc5987#section-3.2.2 )
(the exact format is defined in https://tools.ietf.org/html/rfc5987#section-3.2.2 )

Unfortunately, I do not control this deployment of matomo, so my ability to test patches is limited, but I can request further information.

maybe related to #9580

@Geal
Copy link
Author

Geal commented Feb 10, 2021

the query was generated with a call to a URL with this format:
https://domain/index.php?date=2020-12-13,2021-01-11&expanded=1&filter_limit=-1&format=CSV&format_metrics=1&idSite=1&language=en&method=API.get&module=API&period=day&token_auth=<token>&translateColumnNames=1

@Findus23
Copy link
Member

Hi,

I think the code responsible for this is the following:

$prettyDate = $period->getLocalizedLongString();
$meta = $this->getApiMetaData();
$name = !empty($meta['name']) ? $meta['name'] : '';
$fileName .= ' _ ' . $name
. ' _ ' . $prettyDate . '.csv';
}
// silent fail otherwise unit tests fail
Common::sendHeader('Content-Disposition: attachment; filename="' . $fileName . '"', true);

It just generates a nice filename and then puts it into the header without caring about the right encoding.

If you have an idea how this could be fixed, it would be great if you could create a PR.

@tsteur tsteur added the Bug For errors / faults / flaws / inconsistencies etc. label Feb 10, 2021
@tsteur tsteur added this to the Priority Backlog (Help wanted) milestone Feb 10, 2021
@tsteur
Copy link
Member

tsteur commented Feb 10, 2021

one of our clients uses matomo and some HTTP responses fail when downloading reports,

Hi @Geal does the download not work at all in this case? Do you know what browser is being used there or is it maybe some server that fetches the file?

@Geal
Copy link
Author

Geal commented Feb 10, 2021

Downloads failed because the HTTP response went through our reverse proxy which rejects invalid headers. It is independent of the browser that is used, it can even be reproduced with a curl command.

The utf-8 data can be transformed properly with rawurlencode: https://stackoverflow.com/a/25704866

The code should know the actual encoding of the filename (ascii, iso 8859 1, utf-8 or others) and specify it in the header. Are there any guarantees on the encoding used in matomo?

@tsteur
Copy link
Member

tsteur commented Feb 10, 2021

AFAIK the encoding should be UTF8 or UTF 16 (there should be some parameter to request data in UTF 18). Since it seems like an easy fix will schedule this issue. Cheers @Geal

@tsteur tsteur added the Help wanted Beginner friendly issues or issues where we'd highly appreciate community's help and involvement. label Feb 10, 2021
@flamisz flamisz self-assigned this Feb 25, 2021
@flamisz
Copy link
Contributor

flamisz commented Mar 2, 2021

fixed by #17276

@flamisz flamisz closed this as completed Mar 2, 2021
@mattab mattab changed the title HTTP Content-Disposition header has invalid characters in the filename field CSV reports can fail because HTTP Content-Disposition header has invalid characters in the filename field May 17, 2021
@mattab mattab modified the milestones: 4.7.0, 4.5.0 Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Help wanted Beginner friendly issues or issues where we'd highly appreciate community's help and involvement.
Projects
None yet
Development

No branches or pull requests

5 participants