@peterbo opened this Issue on February 17th 2020 Contributor

Loading the dashboard with the All Goals report (with more than 2 goals) can take quite some time. This has been getting worse since the 3.12 release.

I notice a lot of queries like this when refreshing the dashboard:

UPDATE archive_numeric_2020_01 SET value = 4 WHERE name LIKE 'done%' AND idsite IN (195) AND (((date1 <= '2020-02-17' AND '2020-02-17' <= date2)))

One dashboard refresh seems to trigger dozens of these requests with some of them take 5+ seconds to finish (probably because of locking).

SELECT count(*) FROM archive_numeric_2020_01 where name LIKE 'done%' shows that there are more than 100k rows that have to be checked for updating, which seems quite inefficient.

This seems to be triggered in the function updateArchiveAsInvalidated to clear overlapping archives. Can this be moved to being done asynchronously via cron?

@tsteur commented on February 17th 2020 Member

@peterbo I suppose browser archiving is enabled?

@peterbo commented on February 17th 2020 Contributor

@tsteur Hey Thomas! No, disabled in all instances. Also purging archives / raw data is not active.

browser-archiving

@peterbo commented on February 17th 2020 Contributor

Seems like this has been reported before: https://github.com/matomo-org/matomo/issues/7195 (for blob tables)
Could this be a regression? Dashboard loading on a bigger instance with 30+ goals takes more than 1,5 minutes on a seriously strong server.

@tsteur commented on February 17th 2020 Member

@peterbo are there maybe ranges viewed or so? Or a segment selected? Cause generally it should not trigger these methods as far as I can see.

@peterbo commented on February 17th 2020 Contributor

@tsteur I was wondering about that too, but it's a normal "all visitors" setting for a single day (doesn't matter if you include the current date or not):

goal-trigger1

Could this be caused by a (premium) plugin? This would be something that all my tested instances have in common. (e.g. Cohort, CustomReports, Heatmap, FormAnalytics)

@peterbo commented on February 17th 2020 Contributor

I created another, easier testcase: A Dashboard only with the Goal overview report. You can see how long the queries are running (because they're blocking each other) and that the widget is still loading as long as the queries take to execute:

goal-trigger3

But it seems that also (probably) sparklines and/or graphs in general are issueing these queries, because I can find them in the processlist with only the Visitor overview widget on the dashboard.

@tsteur commented on February 17th 2020 Member

I can reproduce it. To reproduce it issue a tracking request, and then open eg a goal sparkline eg https://example.com/index.php?forceView=1&viewDataTable=sparkline&module=Goals&action=get&idGoal=12&allow_multiple=0&only_summary=1&widget=1&containerId=GoalsOverview&idSite=1&period=day&date=2020-01-19,2020-02-17&segment=&showtitle=1&random=3861&columns=conversion_rate&colors=%7B%22backgroundColor%22%3A%22%23ffffff%22%2C%22lineColor%22%3A%22%23162c4a%22%2C%22minPointColor%22%3A%22%23ff7f7f%22%2C%22maxPointColor%22%3A%22%2375bf7c%22%2C%22lastPointColor%22%3A%22%2355aaff%22%2C%22fillColor%22%3A%22%23ffffff%22%7D

Make sure to have a breakpoint at the updateArchiveAsInvalidated method.

It likely regressed by
https://github.com/matomo-org/matomo/commit/e34f66be743325e1cb9264ce0df301d40bef6d0c

What happens is it calls Goals.getMetrics which calls then calls eg Goal.getMetrics(segment=newVisit) and Goal.getMetrics(segment=ReturningVisits).

These segments aren't preprocessed so as long as browser segment archiving is enabled (which it usually is) it will launch the archiving and part of this invalidate all reports.

image

This was always an issue. However, in 3.12 we started invalidating reports for today. This means whenever there is a request for today, a report will need to be invalidated and the mentioned queries are executed. So for people that were importing visits for previous day this was already a problem. But now this is a lot more severe cause we invalidate for today's report so every time a new tracking request comes in, it will execute these queries again by the looks.

I suppose a workaround for now be to configure these segments in config.ini.php:
image

The disadvantage being that it will generates archives for all reports for these segments and not just the goal reports.

Not sure how to fix this issue in general yet. If browser archiving is disabled, an easy fix could be to not invalidate reports in a regular request. However, for segments we typically would want to invalidate these reports even during a regular request. I suppose, during a regular web request, we could only execute the invalidation logic if there is a report to invalidate for the selected date. Then people would still have the problem for today though.

Will need to see how to best fix it.

@diosmosis commented on February 17th 2020 Member

@tsteur could we create some mechanism to perform this logic after the HTTP request is finished? eg, like using register_shutdown_function() (or something better, that's just an example). We could also make sure only one invalidation query runs every N minutes via a Lock (like, say, every 15 minutes) so parallel requests don't all trigger invalidation. What do you think?

@tsteur commented on February 17th 2020 Member

@diosmosis register_shutdown_function might not work when we actually want to invalidate the reports because the user selected a segment and this segment needs invalidating and re-generation of the report. It's quite a tricky situation because archiving is disabled, and we do archive the goals report with those 2 particular segments, so somehow it would need to know that archiving is disabled for that report when one of the two segments is given. I haven't checked yet, but maybe if we are in ArchiveProcessor::processDependentArchive() then we could generally not execute that invalidation logic?

@tsteur commented on February 17th 2020 Member

I haven't checked yet, but maybe if we are in ArchiveProcessor::processDependentArchive() then we could generally not execute that invalidation logic?

This would potentially do the trick I reckon and also fix it for MediaAnalytics etc

@diosmosis commented on February 17th 2020 Member

We don't invalidate for specific segments right? If not, then that should work 👍

@tsteur commented on February 17th 2020 Member

👍

@diosmosis commented on February 18th 2020 Member

@tsteur started looking into this and noticed there's some code there already that ensures invalidation for the same idSite only happens once per request: https://github.com/matomo-org/matomo/blob/4.x-dev/core/Archive.php#L488-L492 . can you confirm that's what the code is supposed to do?

@tsteur commented on February 18th 2020 Member

@diosmosis that's generally good 👍 In this case it's though about not launching it at all if we are currently in ArchiveProcessor::processDependentArchive(). This should maybe fix the issue

@diosmosis commented on February 18th 2020 Member

@tsteur do you mean not launching archiving at all? or archive invalidation? If archive invalidation, then the code I linked to should have the same effect, shouldn't it? Ie:

  1. Archive for idSite=1, period = week is requested
  2. archiving starts
  3. archives invalidated, idSite 1 marked as processed in transient cache
  4. archive for day in week is requested
  5. idSite 1 is in transient cache, so invalidation for day is not done
@tsteur commented on February 18th 2020 Member

@diosmosis I actually wonder if that even made sense what I said. Looking at it again the moment this invalidate method is called, we might actually not even be within ArchiveProcessor::processDependentArchive()

The goal be yes to not even launch archiving when browser archiving is disabled because we're archiving for Goals reports these 2 segments during CLI archiving with the ArchiveProcessor::processDependentArchive(). It's also used by Media Analytics btw.

Not sure if / how this can be avoided. I suppose something like this be rather a hack but also can't think of anything else right now in https://github.com/matomo-org/matomo/blob/4.x-dev/core/Archive.php#L618-L624

if (isBrowserArchivingDisabled() && !isArchiveTriggered() && in_array('Goals', $plugins) && in_array($segment, array($newVisitSegment, $returningVisitSegment))) {.  //do not launch archiving }
@diosmosis commented on February 18th 2020 Member

The goal be yes to not even launch archiving when browser archiving is disabled because we're archiving for Goals reports these 2 segments during CLI archiving with the ArchiveProcessor::processDependentArchive().

I still don't think I'm clear on the issue here. If browser archiving is disabled, then we currently don't launch archiving at all in Archive.php. So the code should never get to ArchiveProcessor::processDependentArchive() in that case?

@tsteur commented on February 18th 2020 Member

image

It executes this API call which has a segment returningVisit or newVisit attached. It goes then into the Archive class, and while browser archiving is disabled, segment archiving through browser might be enabled. So it launches the archiver but it shouldn't. And you're totally right I now see it never goes into ArchiveProcessor::processDependentArchive() unfortunately making it harder to solve.

@diosmosis commented on February 18th 2020 Member

I see, the API call triggers multiple Archive queries... ok, well, it's a lot easier to solve when we know where the actual problem is :)

@diosmosis commented on February 18th 2020 Member

Is it just invalidation that causes the issue? We could simply disable invalidation if the Archive object has a segment, but the root query doesn't have a segment. Ie, if ::processRequest() is used w/ segment= but the original $_SERVER['REQUEST_URI'] segment is empty different, don't invalidate. Do you think that would work?

@tsteur commented on February 18th 2020 Member

I'm not sure. We might still need to invalidate archives when there is a segment as otherwise user might get an outdated report?

if ::processRequest() is used w/ segment= but the original $_SERVER['REQUEST_URI'] segment is empty different, don't invalidate.

I see maybe like this it could work 👍

@diosmosis commented on February 18th 2020 Member

Actually that won't work either for the same reason as before, https://github.com/matomo-org/matomo/blob/4.x-dev/core/Archive.php#L456 should save the idSite in the list as already processed, then when the next archive query is run, the invalidation will be skipped. This issue probably needs some more investigation.

@tsteur commented on March 15th 2020 Member

I suppose this might be fixed with https://github.com/matomo-org/matomo/pull/15616

Moving it into 3.13.5 so we can confirm it there. It would be still an issue if someone is tracking regularly historic data for previous days but this would have always been an issue before as well.

Powered by GitHub Issue Mirror