New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtered monthly page views do not sum up to filtered yearly page views #17509
Comments
Hi @AJHoeh, thanks for creating the issue. I can confirm those number, really have that difference between the yearly and the sum of monthly reports (I even tried to get the monthly report with date in the middle of month, just to make sure this is not a weird issue with the edge of the months). Sorry about this. We do our best to investigate what could makes this different. |
@AJHoeh in case you can it may be worth it invalidating the reports for the entire year and reprocessing it again see https://matomo.org/faq/how-to/faq_155/ and checking if it comes right afterwards. For example you could run a console command like
Please note that this may take a long time to regenerate all reporting data and that if you have configured raw data deletion then you may not want to invalidate the reports as data could become unavailable if the raw data has been deleted already. Generally, as part of Matomo 4 and also in the next 4.3.0 release we've made several improvements around these things that might have fixed it already. Partially refs #10127 but there it's more about days vs other periods.
BTW @AJHoeh not quite sure what you mean here by urls on different websites? Are those different websites (with different domains) tracked in one Matomo site or do you mean different sites in Matomo? |
Hey @flamisz and @tsteur, thanks for the fast reply! Just to not send you down the wrong road, it came to my mind that I didn't test whether the bug is also present for unfiltered data, so it is possible that this is a more general issue and has nothing to do with filtering. @tsteur thanks for the advice, "Regularly delete old raw data" is not enabled so I will try that next week when I am at work again. Is there some kind of command to check whether the respective raw data is still available? I am pretty cautious with this kind of stuff. I probably should backup the db before anyways...
That's just me being stupid and giving irrelevant information which in the end is more confusing than helpful. It's about performing the described procedure for each of two sites in Matomo which is our concrete usecase but irrelevant information for the issue itself. I'll edit out the "on different websites" part and elaborate more on the context to keep the issue as clear as possible. |
Unfortunately there isn't. In case you have access to your database and if you are familiar with MySQL then you could check using a query below: select idvisit from matomo_log_visit where visit_last_action_time < '2020-01-01 00:00:00' LIMIT 1 You might need to adjust the table prefix |
Hi all, |
#17516 is only for the visits log but not for reports in general (they use very different underlying code). I'm quite certain the visitors being counted twice is not an issue but at the same time you can never rule anything out. The way these reports are generated I very much doubt this is the case though here. Good bringing this up though |
@tsteur Thanks for the guidance, I appreciate it. Data was there so I invalidated the reports and manually archived the data again. Unfortunately, the result is exactly the same (actually thats probably a good thing, otherwise I wouldnt have a clue whats going on and whether I could trust any of our data anymore). If there is anything I can do or test to help find out what causes the bug, please let me know. What @heurteph-ei said was my first thought aswell, it only seems almost too simple... but sometimes exactly thats the reason why things are overlooked so definitely worth looking into :) |
@AJHoeh Maybe you can try to find some visits thanks to the segmentation using one or other filter below: |
@heurteph-ei there should be already a feature request for this 👍 |
Okay, so I conducted a few experiments:
So it seems this is really an issue related to the combined use of filters and some kind of sub-periods. Thanks @heurteph-ei for the hint, I tried adding visitStartServerHour==23;visitServerHour==0 to my initial test requests for our website, but the returned monthly values sumed up to only 29 nb_visits and 44 nb_hits so I am unsure if a larger window could explain the observed (~100 fold) difference. I tried to request data for a given month and segmenting it by setting visitEndServerMonth to the following month, but it seems that is not supported for period=month @tsteur? This seems really weird to me aswell and is probably not intended? |
If you select a month, then it would be expected to only return visits from the current month if the segment also is the number for the current month. Not sure it's clear what I mean? You might be looking at the visitor log which is very different logic to regular reports and there is partially some logic to show some extra visits from very beginning of the following month but this is not the case for a report. |
@tsteur Sorry for the late reply:
I am not sure whether I got that. What I meant was a request with e.g. date=2020-01-01 period=month segment=visitEndServerMonth==2. These requests do not return data for me, so I suppose this is not supported or visits are always attributed to the month in which they endend (and not started). Also possible that my logic is flawed here.
This is true for the very last test I described (starting at "However") and for which I provided the screenshot. Thought the web interface and visit log would be the easiest tool to spot the phenomenon in question here. But if the log works differently from the other reports just forget about it. Everything else was conducted via requesting the API targeting method=Actions.getPageUrls. EDIT: Just upgraded to 4.3 and the problem persists |
When requesting data from the page url table and filtering it regarding a specific url, the data returned for requests for each month of a year independantly does not sum up to the data returned for a request regarding the whole year. This holds true both for using the webinterface as well as using the API.
Expected Behavior
Values of a metric for all months of a year should sum up to value of the same metric for the whole year.
Current Behavior
Sum of monthly values does not match yearly value (in my case: exceeds it).
Steps to Reproduce (for Bugs)
Example
In the given (and randomly chosen) example the values are:
I admit the diifferences are quite low here, but there honestly shouldn't be any at all. Furthermore for our own data it looks like this which is really non-neglectable:
Context
We need total (unique) pageviews for a certain subset of urls. Usually we report them on a yearly bases but we now needed them for every month of a year and realized that the monthly values do not add up to the respective yearly value.
Your Environment
The text was updated successfully, but these errors were encountered: