Can we solve the problem of "GBK coding" in Chinese search engine?
Hi, can you further describe what issue you are referring to and how to reproduce it?
Similar to the following Chinese search engines, GBK coding is not UTF-8 encoding, and matomo is not recognized, all converted to UTF-8 recognition, so the search term is chaotic.
@qq383762126 The charset for
Sogou is defined as
gb2312. The search term detected for the URL you posted should be
Which version of Matomo are you using?
I just checked my Matomo, I too have around 5% of records from Sogou being unreadable, for example,
娉缃nag (which has no meaning). But it only accounts for less than 5% of traffic from Sogou. My guess is, Sogou has a non-UTF8 interface version for users from some older system, like IE6 on Windows XP, and since my blog is about Linux, readers usually don't use Windows XP, so I didn't notice this 5% mojibake until now.
It sounds like those are browsers that are already sending invalid UTF-8 to Matomo, so there is little that can be fixed here. And as long as Matomo gets valid UTF-8 data now with https://github.com/matomo-org/matomo/issues/9785 it should be possible to store any unicode character.