Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning: htmlspecialchars() Invalid multibyte sequence in argument in core/DataTable/Renderer.php on line 223 #3259

Closed
mattab opened this issue Jul 5, 2012 · 19 comments
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Jul 5, 2012

Reported in forum
Input data:

                <url>http://example/?c_id=6&amp;sc_id=97&amp;chars=%EC%E5%F8%EA%EE%E2%FB%E5&amp;brand=ZELMER</url>

                <url>http:/example/?c_id=7&amp;sc_id=206&amp;chars=%E3%E0%E7%EE%E2%FB%E5</url>

with API request: index.php?module=API&method=Actions.getPageUrls&idSite=1&period=day&date=today&format=xml&token_auth=x

results in:

Warning: htmlspecialchars() [<a href='function.htmlspecialchars'>function.htmlspecialchars</a>]: Invalid multibyte sequence in argument in //public_html/core/DataTable/Renderer.php on line 223

Backtrace -->
#0 Piwik_ErrorHandler(...) called at [:]#1 htmlspecialchars(...) called at [//public_html/core/DataTable/Renderer.php:223]#2 Piwik_DataTable_Renderer::formatValueXml(...) called at [//public_html/core/DataTable/Renderer/Xml.php:342]#3 Piwik_DataTable_Renderer_Xml->renderDataTable(...) called at [//public_html/core/DataTable/Renderer/Xml.php:337]#4 Piwik_DataTable_Renderer_Xml->renderDataTable(...) called at [//public_html/core/DataTable/Renderer/Xml.php:142]#5 Piwik_DataTable_Renderer_Xml->renderTable(...) called at [//public_html/core/DataTable/Renderer/Xml.php:33]#6 Piwik_DataTable_Renderer_Xml->render(...) called at [//public_html/core/API/ResponseBuilder.php:221]#7 Piwik_API_ResponseBuilder->getRenderedDataTable(...) called at [//public_html/core/API/ResponseBuilder.php:325]#8 Piwik_API_ResponseBuilder->handleDataTable(...) called at [//public_html/core/API/ResponseBuilder.php:79]#9 Piwik_API_ResponseBuilder->getResponse(...) called at [//public_html/core/API/Request.php:130]#10 Piwik_API_Request->process(...) called at [//public_html/plugins/API/Controller.php:27]#11 Piwik_API_Controller->index(...) called at [:]#12 call_user_func_array(...) called at [//public_html/core/FrontController.php:138]#13 Piwik_FrontController->dispatch(...) called at [//public_html/index.php:53]

Bug fix ?
I'm not sure if we can reproduce this error with this input data, and I'm not sure which fix is correct in this case. Is it a bug in PHP? is it that we truncate the input data too early?

we could fix it by keeping the original input data if the result of htmlspecialchars is empty. But then it could maybe cause some XSS or other security issue if someone manages to generate bogus data leading to this error in order to print out a xss?

@mattab
Copy link
Member Author

mattab commented Jul 5, 2012

A temporary quick fix for users with this issue is as follwos:

Index: core/DataTable/Renderer.php
===================================================================
--- core/DataTable/Renderer.php (revision 6483)
+++ core/DataTable/Renderer.php (working copy)
@@ -220,7 +220,7 @@
            && !is_numeric($value)) 
        {
            $value = html_entity_decode($value, ENT_COMPAT, 'UTF-8');
-           $value = htmlspecialchars($value, ENT_COMPAT, 'UTF-8');
+           $value = @htmlspecialchars($value, ENT_COMPAT, 'UTF-8');

But this is only temporary, definitely not the ideal fix.

if it fails and returns empty string, maybe we could filte rthe initial input to a-zA-Z0-9 or similar and keep only the safe characters?

@mattab
Copy link
Member Author

mattab commented Aug 2, 2012

Fix suggested by Anthon: if(function_exists('iconv')) $value = @iconv('UTF-8', 'UTF-8//IGNORE', $value);

I see in the php doc: http://www.php.net/manual/en/function.iconv.php#108643

that maybe this doesn't work in all php versions?

The comment suggests:
ini_set('mbstring.substitute_character', "none");
$text= mb_convert_encoding($text, 'UTF-8', 'UTF-8');

Any particular thoughts?

@robocoder
Copy link
Contributor

We can try to run some tests between php versions and compare.

The more portable, the better.

@anonymous-matomo-user
Copy link

I have a similar error at some actions at "Visitors --> Visitor Log":

There is an error. Please report the message (Piwik 1.8.3) and full backtrace in the Piwik forums (please do a Search first as it might have been reported already!).

Warning: htmlspecialchars(): Invalid multibyte sequence in argument in /srv/www/htdocs/piwik/core/SmartyPlugins/modifier.escape.php on line 31

Backtrace -->
#0 Piwik_ErrorHandler(...) called at [:]#1 htmlspecialchars(...) called at [/srv/www/htdocs/piwik/core/SmartyPlugins/modifier.escape.php:31]#2 smarty_modifier_escape(...) called at [/srv/www/htdocs/piwik/tmp/templates_c/%%46^462^46227F0A%%visitorLog.tpl.php:337]#3 include(...) called at [/srv/www/htdocs/piwik/libs/Smarty/Smarty.class.php:1263]#4 Smarty->fetch(...) called at [/srv/www/htdocs/piwik/core/View.php:133]#5 Piwik_View->render(...) called at [/srv/www/htdocs/piwik/core/Controller.php:153]#6 Piwik_Controller->renderView(...) called at [/srv/www/htdocs/piwik/plugins/Live/Controller.php:84]#7 Piwik_Live_Controller->getVisitorLog(...) called at [:]#8 call_user_func_array(...) called at [/srv/www/htdocs/piwik/core/FrontController.php:138]#9 Piwik_FrontController->dispatch(...) called at [/srv/www/htdocs/piwik/index.php:53]

@mattab
Copy link
Member Author

mattab commented Sep 25, 2012

There is a similar bug, reported in the forums

@mattab
Copy link
Member Author

mattab commented Oct 20, 2012

@capedfuzz maybe similar to the other bug you're dealing with.

@diosmosis
Copy link
Member

This bug is due to query parameter values being encoded in non-UTF-8 charsets. When archiving, there's no way to know what this other encoding is, and so we have to use UTF-8 w/ htmlspecialchars. I came up w/ two possible ways of fixing this one:

  • Either associate the charset w/ the URLs in the DB when tracking (using the 'cs' query param I added in umlauts in sitesearch keywords #3450). (Not sure how to do this efficiently)
  • Or, add a charset field to the piwik_site table and let users set it if their site does not use UTF-8. We can display an info here saying it's better to just use UTF-8.

@mattab
Copy link
Member Author

mattab commented Oct 24, 2012

when the charset is defined and not utf-8, maybe in the Tracker, we should also re-encode the URL in UTF-8, since it's likely the URL parameters are encoded in the page charset...?

@diosmosis
Copy link
Member

Replying to matt:

when the charset is defined and not utf-8, maybe in the Tracker, we should also re-encode the URL in UTF-8, since it's likely the URL parameters are encoded in the page charset...?

I tried that and it works in that it displays the label correctly, but then the URL becomes invalid. As in, if you visit the link, it won't work anymore. As far as I can tell, there has to be some way to store the charset of the query params somewhere (like in the piwik_site table).

@mattab
Copy link
Member Author

mattab commented Oct 27, 2012

How can we quickly fix this problem, since the proper solution would have performance overhead which we can't consider for this use case.

The goal is to see a quick fix, for example, would if(function_exists('iconv')) $value = @iconv('UTF-8', 'UTF-8//IGNORE', $value); work to force convert to Utf8 the url when the htmlspecialchars return empty?

@diosmosis
Copy link
Member

Replying to matt:

How can we quickly fix this problem, since the proper solution would have performance overhead which we can't consider for this use case.

The goal is to see a quick fix, for example, would if(function_exists('iconv')) $value = @iconv('UTF-8', 'UTF-8//IGNORE', $value); work to force convert to Utf8 the url when the htmlspecialchars return empty?

From what I remember of my test, this would strip the non-UTF8 encoded characters from the string, so I guess "www.site.com?q=%EC%ED..." would turn into "www.site.com?q=". Since this only affects the query string, we might be able to do this to each individual query param and for stuff that doesn't have UTF8 encoding, we just display the encoded data (ie, %EC%ED)? Not sure if it'll work, though, but when I get a chance I'll check it out.

@diosmosis
Copy link
Member

(In [7336]) Refs #3259, strip non-UTF-8 chars from strings in outputted XML/HTML so htmlspecialchars won't fail.

@diosmosis
Copy link
Member

I've applied one part of a fix for this. This will make sure the warning doesn't display and will make sure at least SOMETHING is shown as the label for strings w/ non-UTF-8 chars. The non-UTF-8 stuff won't show though.

The only way I can think of to fix that is to look for each %.. encoded item and check if the decoded char is invalid UTF-8. If so, it gets double encoded so html_entity_decode won't turn it into the non-UTF-8 value. This seems excessive, but I can't think of another way of fixing this (specifying a charset column for piwik_site might work, but would likely take a while to implement).

@mattab
Copy link
Member Author

mattab commented Oct 31, 2012

Looks good to me! Thanks for researching and finding this fix, which is better than before. We cant really afford to store charsets per website... too complicated.

However, I have an idea for a possible "ultimate" solution: could we use http://php.net/manual/en/function.mb-detect-encoding.php ?? :) that looks interesting. Maybe we could run this function when htmlspecialchars return empty string (prior to your commmit) ? would it work to detect encoding of the URL containing non utf8 characters?

@diosmosis
Copy link
Member

Replying to matt:

Looks good to me! Thanks for researching and finding this fix, which is better than before. We cant really afford to store charsets per website... too complicated.

However, I have an idea for a possible "ultimate" solution: could we use http://php.net/manual/en/function.mb-detect-encoding.php ?? :) that looks interesting. Maybe we could run this function when htmlspecialchars return empty string (prior to your commmit) ? would it work to detect encoding of the URL containing non utf8 characters?

We can tell if it's not UTF-8, but that doesn't mean we can find out what the true encoding is. Some non-UTF8 strings can still be interpreted as UTF-8 though the result will be gibberish.

@mattab
Copy link
Member Author

mattab commented Oct 31, 2012

This function will find out the encoding I think: http://php.net/manual/en/function.mb-detect-encoding.php -- it might work on the non utf8 URL, so you could display a label that works with non UTF8 URLs without modifying the URLs, simply detect encoding > convert from detectedEncoding to UTF8 > Parse URL To get the Pretty name <label>

@diosmosis
Copy link
Member

The warning is gone, for now I think that's good enough.

@mattab
Copy link
Member Author

mattab commented Jan 5, 2014

We have a variation of similar error in: #3859

@mattab
Copy link
Member Author

mattab commented May 13, 2014

Invalid multibyte sequence in argument in /html/piwik/vendor/twig/twig/lib/Twig/Extension/Core.php on line 927

This bug has occurred again in Piwik 2.2.2. I created ticket at #5157

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc.
Projects
None yet
Development

No branches or pull requests

4 participants