@mgazdzik opened this Pull Request on June 17th 2014 Contributor
@mgazdzik commented on June 17th 2014 Contributor

Just a sidenote - maybe parmeter responsible for segments concurent requests could be generalized ? So that we don't only use full server processing power only for segments, but also if there are "free" threads, we process further idsites.
For ex.

  • we have 10 concurent threads
  • site 1 has 0 segments (we only process bare metrics) - 1 process
  • site 2 has 5 segments (we process bare metrics, then segments) - 6 process
  • site 3,4,5 have 0 segments, so we can start processing bare metrics also for them.

Currently threading option is only used for segments, so single big idsite can block multi threaded processing for long time, while other threads could be carrying on further idsites.

Also current implementation might be dangerous when concurent segments count is big. For ex. when you launch 5 separate core:archive commands, each of them can possibly launch another 3 threads which will give 15 threads on server, instead of desired 5.

@mattab commented on June 18th 2014 Member

Currently threading option is only used for segments, so single big idsite can block multi threaded processing for long time, while other threads could be carrying on further idsites.

IIRC to work around this issue we implemented the possibility to run several core:archive in parallel. Each processus will then calculate data for a different website: http://dev.piwik.org/trac/ticket/4903

current implementation might be dangerous when concurent segments count is big

+1, this is limitation of our algorithm, maybe we can solve this problem separately? feel free to create ticket if it's important we investigate :+1:

@mgazdzik commented on June 18th 2014 Contributor

As for paralell threading - yes, it is possible to launch several commands. However this is very inconvienient, as each of them calculates only one idsite, but can calculate multiple segments, which makes it preety hard to tune up archiving in consistent way (there should always be the same number of threads, regardless we're going through sites or segments). I'll create ticket on Trac to improve this part, as it will be big progress in matters of archiving tuning.

edit:

created ticket on trac http://dev.piwik.org/trac/ticket/5363

@mattab commented on July 1st 2014 Member

Also current implementation might be dangerous when concurent segments count is big. For ex. when you launch 5 separate core:archive commands, each of them can possibly launch another 3 threads which will give 15 threads on server, instead of desired 5.

In the case that several core:archive commands are triggered, maybe it's best to always use --concurrent-requests-per-website=1? This way you can be sure that each thread will process one request at a time and that max number of threads is predictable.

@mgazdzik commented on July 1st 2014 Contributor

I think it could be a temporary workaround for problem. Generally it will be quite problematic to manage such cronjob with i.e. 45 core:archive commands ;) also one wouldn't be able to launch archiving manually with N threads easily (for ex. while benchmarking?), instead it would require launching N screen instances/ N terminal connections to server.

@mattab commented on July 1st 2014 Member

@mgazdzik all items you've suggested have been fixed. cheers

This Pull Request was closed on July 1st 2014
Powered by GitHub Issue Mirror