Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple requests may archive the same range request at the same time #18088

Closed
tsteur opened this issue Oct 1, 2021 · 17 comments · Fixed by #18243, #18399 or #18493
Closed

Multiple requests may archive the same range request at the same time #18088

tsteur opened this issue Oct 1, 2021 · 17 comments · Fixed by #18243, #18399 or #18493
Assignees
Labels
Bug For errors / faults / flaws / inconsistencies etc. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Milestone

Comments

@tsteur
Copy link
Member

tsteur commented Oct 1, 2021

We noticed a while ago that multiple requests may launch archiving for the same period.

To reproduce this:

  • Disable browser archiving & Disable browser archiving for a segment
    • eg set [General]enable_browser_archiving_triggering = 0 enable_general_settings_admin = 0 browser_archiving_disabled_enforce = 1
  • Have a site with say 30 active goal
  • Have generally data for a few days
  • Trigger archiving using ./console core:archive
  • Then go the Goals Overview page and request a date range eg 2021-09-02,2021,09,29

You might then notice that all does goals trigger the archiving. Eg you could execute a query like

select count(*) from archive_numeric_2021_09 where ts_archived > DATE_SUB(NOW(), INTERVAL 5 MINUTE) and ts_archived < NOW() and name like 'done%'; -- number of unique added archives in the last few minutes which may be pretty much roughly like the number of goals
select count(*) from archive_numeric_2021_09 where ts_archived > DATE_SUB(NOW(), INTERVAL 5 MINUTE) and ts_archived < NOW() ;// how many archive entries were added overall

-- same for blob
select count(*) from archive_blob_2021_09 where ts_archived > DATE_SUB(NOW(), INTERVAL 5 MINUTE) and ts_archived < NOW() ;

I'm meaning that each of these requests like in the screenshot seem to be launching the archiving (eg index.php?forceView=1&viewDataTable=sparklines&module=Goals&action=get&idGoal=7&allow_multiple=0&only_summary=1&idSite=12&period=week&date=today&segment=&showtitle=1&random=1482):

image

As I'm seeing this recently more often I think this was a regression somewhere between 4.3 and 4.4. I quickly looked through the archiving related changes in 4.3.0...4.4.1 and couldn't find anything so far that would explain this. This one peaked my interest but pretty sure it cannot be what caused it: https://github.com/matomo-org/matomo/pull/17657/files

Possibly the regression was in 4.2.0 because of https://github.com/matomo-org/matomo/pull/17379/files maybe

I wonder if this one maybe previously prevented that each request launches the archiving at the same time? Once we can reproduce the issue, we could maybe see if it works with that PR and wouldn't launch multiple archives.

Let me know if there are any issues reproducing this issue.

refs #17774

The goal is that the archiving is triggered only once for all these requests. Meaning below query should return 1:

select count(*) from archive_numeric_2021_09 where ts_archived > DATE_SUB(NOW(), INTERVAL 5 MINUTE) and ts_archived < NOW() and name like 'done%'; -- number of unique added archives in the last few minutes which may be pretty much roughly like the number of goals
@tsteur tsteur added Bug For errors / faults / flaws / inconsistencies etc. Regression Indicates a feature used to work in a certain way but it no longer does even though it should. labels Oct 1, 2021
@tsteur tsteur added this to the 4.6.0 milestone Oct 1, 2021
@mattab
Copy link
Member

mattab commented Oct 1, 2021

fyi this might be related/similar to the old report at #8444

@peterhashair
Copy link
Contributor

@tsteur Just trying to understand this. So every single sparkline axis is loading from the archive_numeric_20xx_xx like a cache table when the enable_browser_archiving_triggering is disabled, it won't insert into archive_numeric_20xx_xx table, only ./console core:archive does insert?

@tsteur
Copy link
Member Author

tsteur commented Oct 20, 2021

enable_browser_archiving_triggering is disabled, but for range dates by default we still initiate browser archiving and aggregate all the different periods to a range. Meaning it will insert data into the archive tables when viewing it in the browser. Does this help?

@peterhashair
Copy link
Contributor

@tsteur kind of, so the action=get request for the same period should only launch archive once, instead of each request per once.

@tsteur
Copy link
Member Author

tsteur commented Oct 20, 2021

yes 👍

@tsteur
Copy link
Member Author

tsteur commented Nov 18, 2021

This is actually not fixed yet. I added 2 new comments in DEV-2306 with details. It either doesn't help, or it helped but there are still issues.

We can try another fix in Matomo 4.7 @peterhashair

Maybe one potential problem with this is that all requests eventually try to get that lock vs they should maybe just wait until the lock is released if the lock is held by someone else and then not try to get the lock again. I don't think this works using the get_lock method though and we would need to use our other lock. This might not be the problem though. It's just some random thought.

@tsteur tsteur reopened this Nov 18, 2021
@tsteur tsteur modified the milestones: 4.6.0, 4.7.0 Nov 18, 2021
@peterhashair
Copy link
Contributor

@tsteur ah on my local it doesn't do duplications, to reproduce this. Maybe can we dump the database from the testing server?

@tsteur
Copy link
Member Author

tsteur commented Nov 18, 2021

@peterhashair there are pretty much no duplicates here either except for the goals flag. I'm not sure that's the problem though. I was going to export the data but it has been deleted again already by some task. I can export it again if that's helpful but not sure it would help too much. Because we have disabled range again for this Matomo I can't easily regenerate it right now but could do later if useful.

@peterhashair
Copy link
Contributor

peterhashair commented Nov 19, 2021

@tsteur maybe that's not related to the lock or multiple requests, we can cache the select to the lazy cache instead of MySQL select, not sure that helps. But I can see the CPU is cluster caused by the writer, is that mean insert cause the problem not the select?

@peterhashair
Copy link
Contributor

@tsteur it seems like this is really slow when there is a large date range.

public function getAnnotationCountForDates($idSite, $date, $period, $lastN = false, $getAnnotationText = false)

@tsteur
Copy link
Member Author

tsteur commented Nov 23, 2021

Good find, I can see how that one might be a bit slow. Takes 2s here for me when requesting a range of 9 months. It generally does only a one simple DB query though so it wouldn't cause the big DB load. We could create another issue to improve the performance of this script eventually.

@peterhashair
Copy link
Contributor

peterhashair commented Nov 23, 2021

@tsteur on my local is wired, if you run it just by itself, it takes 1-2s, but if you run it on the goal overview page. it takes 42s on my one.

image

@tsteur
Copy link
Member Author

tsteur commented Nov 24, 2021

We might want to try our other lock see https://github.com/matomo-org/matomo/compare/lockalt?expand=1

The benefit here is that not every request will try to acquire the lock but only one process. And as soon as it's released this lock, the other processes won't fight to get the lock but will just know the archive is finished and continue.

@peterhashair let's also try to move all these goal specific visualisations into one request
image

Happy to show how this could be done. Generally the lock we're having is already helping, but we want to avoid extreme concurrency when people have heaps of goals. We can try to put this into one view and then see if it loads faster etc.

@peterhashair
Copy link
Contributor

peterhashair commented Nov 25, 2021

@tsteur our lock looks great, but when in the larger period, eg: more than the 6-month range of period, it throws a 504 time out see screenshot. CPU usage on MySQL server is about the same 400%+, I have a feeling is the select clause of the CPU usage high, but not too sure, I trying to merge some of the select queries to see if that helps.
143328529-5592a092-8df2-4b4e-aeba-75845d263874

@peterhashair
Copy link
Contributor

@tsteur merged sparkline combine into 1 request, let me know if that still causing CPU load problems.

@tsteur
Copy link
Member Author

tsteur commented Dec 10, 2021

great, thanks 👍

@sgiehl
Copy link
Member

sgiehl commented Dec 10, 2021

reopening, as the PR actually broke the goals overview and therefor was reverted.

@sgiehl sgiehl reopened this Dec 10, 2021
@justinvelluppillai justinvelluppillai added the not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. label Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. Regression Indicates a feature used to work in a certain way but it no longer does even though it should.
Projects
None yet
5 participants