Chinese GBK code disorder code problem #12732

qq383762126 · 2018-04-17T02:54:25Z

Can we solve the problem of "GBK coding" in Chinese search engine?

Findus23 · 2018-04-17T07:13:47Z

Hi, can you further describe what issue you are referring to and how to reproduce it?

qq383762126 · 2018-04-17T07:34:40Z

Similar to the following Chinese search engines, GBK coding is not UTF-8 encoding, and matomo is not recognized, all converted to UTF-8 recognition, so the search term is chaotic.

http://www.sogou.com/web?query=%E4%B8%93%E5%88%A9%E6%9F%A5%E8%AF%A2&ie=utf8&_ast=1523950080&_asf=null&w=01029901&cid=&s_from=result_up&sut=6728&sst0=1523950055276&lkt=1%2C1523950053745%2C1523950053745&sugsuv=001D5390DED1583F5A8E5819DE6D1320&sugtime=1523950055276

sgiehl · 2018-04-21T14:01:24Z

@qq383762126 The charset for Sogou is defined as gb2312. The search term detected for the URL you posted should be 专利查询.
Which version of Matomo are you using?

fengkaijia · 2018-04-28T04:04:37Z

I just checked my Matomo, I too have around 5% of records from Sogou being unreadable, for example, 杩琛￠】 or 娉缃nag (which has no meaning). But it only accounts for less than 5% of traffic from Sogou. My guess is, Sogou has a non-UTF8 interface version for users from some older system, like IE6 on Windows XP, and since my blog is about Linux, readers usually don't use Windows XP, so I didn't notice this 5% mojibake until now.

Findus23 · 2020-05-24T18:39:47Z

It sounds like those are browsers that are already sending invalid UTF-8 to Matomo, so there is little that can be fixed here. And as long as Matomo gets valid UTF-8 data now with #9785 it should be possible to store any unicode character.

Findus23 closed this as completed May 24, 2020

Findus23 added the answered For when a question was asked and we referred to forum or answered it. label May 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chinese GBK code disorder code problem #12732

Chinese GBK code disorder code problem #12732

qq383762126 commented Apr 17, 2018

Findus23 commented Apr 17, 2018

qq383762126 commented Apr 17, 2018

sgiehl commented Apr 21, 2018

fengkaijia commented Apr 28, 2018

Findus23 commented May 24, 2020

Chinese GBK code disorder code problem #12732

Chinese GBK code disorder code problem #12732

Comments

qq383762126 commented Apr 17, 2018

Findus23 commented Apr 17, 2018

qq383762126 commented Apr 17, 2018

sgiehl commented Apr 21, 2018

fengkaijia commented Apr 28, 2018

Findus23 commented May 24, 2020