Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Umlauts in logfiles lead to discrepancies in the pageview statistic #10530

Closed
lato333 opened this issue Sep 20, 2016 · 3 comments
Closed

Umlauts in logfiles lead to discrepancies in the pageview statistic #10530

lato333 opened this issue Sep 20, 2016 · 3 comments
Labels
Bug For errors / faults / flaws / inconsistencies etc. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.

Comments

@lato333
Copy link

lato333 commented Sep 20, 2016

I have found some discrepancies in the page statistics of piwik. (Dashboard -> Actions/Pages)
If I sort on the pageviews ascending:
sort2

If I sort on the pageviews descending:
sort1

Problem:

  • I'll get different results (see images) for the same statistic
  • The sum of the subsites is not correct. (2+2 != 12 and 1+1+6 !=12)

We import our data with the import_logs.py script from the access logs. (Piwik 2.16.2)

I have tried to analyse this problem a little bit and I believe this issue has to do with the encoding.

If the logfile contains umlauts (ü,ä, ...) in the URL, there will be problems as described above, although the logentries are imported to the database without an error.

Database-Encoding: utf8

mysql> select name from piwik_log_action where idaction=5386055;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| name                                                                                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| testsite/XLCubedWeb/WebForm/ShowReport.aspx?rep=009+wissensbilanzen+universit�ten/004+kernprozesse+-+lehre+und+weiterbildung/012+2-a-10+studienabschlussquote.xml&toolbar=true |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Workaround:
After replacing these umlauts in the logfile with a proper encoding and importing it again to a new site, everything works as expected.

I have also tried to replace the umlauts in the database and to regenerate the archive data, which also leads to correct results.

@mattab
Copy link
Member

mattab commented Sep 26, 2016

Hi @lato333 -thank you for the report and investigation & workaround.

Could you please let us know, how to reproduce this error? What are the URLS on your website that can be used to reproduce such problem? if we can reproduce we can try to fix the issue in Piwik core.

Maybe it could be related to #9785 but maybe not.

@mattab mattab added the Bug For errors / faults / flaws / inconsistencies etc. label Sep 26, 2016
@mattab mattab added this to the 3.0.0 milestone Sep 26, 2016
@lato333
Copy link
Author

lato333 commented Oct 5, 2016

Sry for the delay in my reply.
I have investigated further in the problem and umlauts are a problem only in some cases.

If we use umlauts in a folder name in both ways (encoded: e.g.GET /%C3%A4/test.xml and unencoded: GET /ä/test.xml), it will be displayed wrong in the 'action/pages' pane.
Whereas used in a filename, it will be displayed right.

I have also attached a sample logfile, with different cases: access_log.txt

Hope this helps.

@mattab
Copy link
Member

mattab commented Dec 10, 2023

Thanks for contributing to this issue. As it has been a few months since the last activity and we believe this is likely not an issue anymore, we will now close this. If that's not the case, please do feel free to either reopen this issue or open a new one. We will gladly take a look again!

@mattab mattab closed this as not planned Won't fix, can't repro, duplicate, stale Dec 10, 2023
@sgiehl sgiehl added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org.
Projects
None yet
Development

No branches or pull requests

3 participants