Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve multi threading in archiving to be more linear #5363

Closed
mgazdzik opened this issue Jun 18, 2014 · 14 comments
Closed

Improve multi threading in archiving to be more linear #5363

mgazdzik opened this issue Jun 18, 2014 · 14 comments
Labels
c: Performance For when we could improve the performance / speed of Matomo. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Milestone

Comments

@mgazdzik
Copy link
Contributor

Currently we can trigger manually many core:archive and each of them would trigger separate process for archiving idsites using common queue. However, each of those commands can spawn up to 3 (or more - if changed in file) processes computing segments. This causes that we cannot trigger as many archiving processes for idsites as we would like, because in worst case we will end up having 3x more processes computing segmented data at the same time. Therefore it would be good to have common limit saying how many process total can be spawned. Basically it should work the same for idsites as it works for segments now. That way we would be able to set top limit of process number regardless it's working on idsites of segments. Also it would allow easier managing number of processes - instead of multiple lines in crontab, only change single param to increase number of threads.

@hpvd
Copy link

hpvd commented Jun 18, 2014

maybe this could fit also into this topic:
"Parallelization of processes for best performance"
#4905

@mattab
Copy link
Member

mattab commented Jul 1, 2014

See related ticket where this originates from: #5396

@mgazdzik mgazdzik added this to the 2.x - The Great Piwik 2.x Backlog milestone Jul 8, 2014
@mattab mattab removed the P: normal label Aug 3, 2014
@mattab mattab modified the milestones: Mid term, Short term Aug 4, 2014
@mattab mattab added the Major label Aug 4, 2014
@mattab mattab modified the milestones: Short term, Piwik 2.6.0 Aug 4, 2014
@mattab mattab removed the Major label Aug 4, 2014
@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Sep 15, 2014
@mattab
Copy link
Member

mattab commented Sep 18, 2014

Maybe we need a new parameter --max-threads=X that would limit the max number of concurrent segment processing threads started by this main core:archive thread?

@mgazdzik
Copy link
Contributor Author

Yes, I think it would be what is needed. Also it would be great if this parameter would limit total number of threads, i.e. - we can process 10 segments or 10 sites or mixed in any other proportion.

@hpvd
Copy link

hpvd commented Sep 18, 2014

of course we like that idea :-)
see #4905
.......
What Do you think about a setting where the advanced User can put in the max number of the cores/threads which are available in the server/should be used max. by Piwik?
-this may allow max performance for piwik
-but do not slow down other tasks of the server/ the complete server
........
=> Maybe the max thread value could be a global thing?

@mattab
Copy link
Member

mattab commented Sep 22, 2014

It's a bit more complicated than we initially thought. I'm moving this to 2.8.0 to prevent a change in core:archive prior our release due Wed.

@mattab mattab modified the milestones: Piwik 2.8.0, Piwik 2.7.0 Sep 22, 2014
@mattab mattab modified the milestones: Piwik 2.9.0, Piwik 2.8.0 Oct 8, 2014
@diosmosis diosmosis self-assigned this Oct 15, 2014
@mattab mattab modified the milestones: Piwik 2.10.0 , Piwik 2.9.0 Nov 6, 2014
@mattab
Copy link
Member

mattab commented Nov 6, 2014

Moved to 2.10.0 as we don't have enough time left

@mattab
Copy link
Member

mattab commented Dec 6, 2014

I'm moving out of current milestone, because we need to think a bit more about this project, in particular our mid-term goals around scheduling archiving jobs.

Notes:

Maybe this issue depends on discussion in #6638

cc @diosmosis @mnapoli @tsteur

@mattab mattab modified the milestones: Short term, Piwik 2.10.0 Dec 6, 2014
@mattab mattab removed the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Dec 10, 2014
@mattab
Copy link
Member

mattab commented Mar 24, 2015

I guess here what we need is a simple new plugin that implements a Job queue for certain climulti:request calls. This is kinda similar to how QueuedTracking has a Redis queue with pending requests.

@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Mar 24, 2015
@mnapoli
Copy link
Contributor

mnapoli commented Mar 24, 2015

@mattab I don't understand how the work queue is going to help? I think I don't understand the issue actually, from what I could gather from the ticket the problem is because of too many threads/not enough control on those threads?

@mattab
Copy link
Member

mattab commented Mar 24, 2015

Right now each core:archive script will trigger 1-N new calls and there could be already some running in the background, which could overload the server.

Having a queue (eg. FIFO) gives us ability to decide process 1 or N jobs at a time (at most) giving us a controlled environment. does it make sense?

@mnapoli
Copy link
Contributor

mnapoli commented Mar 25, 2015

Thanks I see the point of the queue now.

each core:archive script will trigger 1-N new calls and there could be already some running in the background

Is it because core:archive doesn't wait for the requests to finish processing? Or is it because we can run multiple core:archive in parallel (but in that case this is a problem we create ourselves)? Or is there another reason why "there could be already some running in the background" (assuming archiving in the browser is disabled)?

@mattab
Copy link
Member

mattab commented Mar 25, 2015

there are several reasons this could happen, for exahmple if users add 100 sites overnight, or many segments (see also #7483 )...

it is by design that user can trigger several archiving scripts, eg. to make better use of multiple CPUs on the server it could be useful. When a script is already running (for example it could run for days), if we block other core:archive calls, then data for 'yesterday', 'today', etc. may be missing. This would not be good solution, that's why I think we need some kind of Job Queue and then a way to order the jobs (eg. FIFO or some other logic we decide at the time)

I think I will leave it in Mid term for now until we hit again a problem that will require this solution.

@mattab mattab modified the milestones: Mid term, Short term Mar 25, 2015
@mattab mattab removed the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Sep 23, 2015
@mattab
Copy link
Member

mattab commented Dec 2, 2015

Not needed for now

@mattab mattab closed this as completed Dec 2, 2015
@mattab mattab added the wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it. label Dec 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Projects
None yet
Development

No branches or pull requests

5 participants