Umlauts in logfiles lead to discrepancies in the pageview statistic #10530
Labels
Bug
For errors / faults / flaws / inconsistencies etc.
not-in-changelog
For issues or pull requests that should not be included in our release changelog on matomo.org.
Milestone
I have found some discrepancies in the page statistics of piwik. (Dashboard -> Actions/Pages)
If I sort on the pageviews ascending:
If I sort on the pageviews descending:
Problem:
We import our data with the import_logs.py script from the access logs. (Piwik 2.16.2)
I have tried to analyse this problem a little bit and I believe this issue has to do with the encoding.
If the logfile contains umlauts (ü,ä, ...) in the URL, there will be problems as described above, although the logentries are imported to the database without an error.
Database-Encoding: utf8
Workaround:
After replacing these umlauts in the logfile with a proper encoding and importing it again to a new site, everything works as expected.
I have also tried to replace the umlauts in the database and to regenerate the archive data, which also leads to correct results.
The text was updated successfully, but these errors were encountered: