Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial proposition of params to introduce to extend CoreArchiver usability #319

Closed

Conversation

mgazdzik
Copy link
Contributor

No description provided.

@mgazdzik
Copy link
Contributor Author

Just a sidenote - maybe parmeter responsible for segments concurent requests could be generalized ? So that we don't only use full server processing power only for segments, but also if there are "free" threads, we process further idsites.
For ex.

  • we have 10 concurent threads
  • site 1 has 0 segments (we only process bare metrics) - 1 process
  • site 2 has 5 segments (we process bare metrics, then segments) - 6 process
  • site 3,4,5 have 0 segments, so we can start processing bare metrics also for them.

Currently threading option is only used for segments, so single big idsite can block multi threaded processing for long time, while other threads could be carrying on further idsites.

Also current implementation might be dangerous when concurent segments count is big. For ex. when you launch 5 separate core:archive commands, each of them can possibly launch another 3 threads which will give 15 threads on server, instead of desired 5.

private function disablePurgeIfNeeded()
{
if ($this->startDate !== false) {
Rules::$purgeDisabledByTests = true; // TODO: replace this hack with some generic solution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this cannot work because this flag will be set in the core:archive CLI process, and not in the actual archiving PHP process

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe we could introduce additional flag to prevent outdated archives droping ? Idea is to prevent this to happen especially when we're computing past and completed periods like 2-3 months ago without affecting current data. It would be great if you could introduce such flag or method to be triggered. Alternatively I could give it a try, but would need a hint where to start ?

@mattab
Copy link
Member

mattab commented Jun 18, 2014

Currently threading option is only used for segments, so single big idsite can block multi threaded processing for long time, while other threads could be carrying on further idsites.

IIRC to work around this issue we implemented the possibility to run several core:archive in parallel. Each processus will then calculate data for a different website: http://dev.piwik.org/trac/ticket/4903

current implementation might be dangerous when concurent segments count is big

+1, this is limitation of our algorithm, maybe we can solve this problem separately? feel free to create ticket if it's important we investigate 👍

@mgazdzik
Copy link
Contributor Author

As for paralell threading - yes, it is possible to launch several commands. However this is very inconvienient, as each of them calculates only one idsite, but can calculate multiple segments, which makes it preety hard to tune up archiving in consistent way (there should always be the same number of threads, regardless we're going through sites or segments). I'll create ticket on Trac to improve this part, as it will be big progress in matters of archiving tuning.

edit:

created ticket on trac http://dev.piwik.org/trac/ticket/5363

$command->addOption('end-date', null, InputOption::VALUE_OPTIONAL, "If set, also requires start date to be defined. Will force Piwik to re-archive up till this date");
$command->addOption('force-periods', null, InputOption::VALUE_OPTIONAL, "Allows to define what periods only should be archived. Can be comma-separated like day,week,month");
$command->addOption('disable-outdated-archives-purge', null, InputOption::VALUE_OPTIONAL, "Allows to disable purging of outdated archives (i.e. today, this week etc.) if you are archiging past periods");
$command->addOption('nb-concurent-segment-requests', null, InputOption::VALUE_OPTIONAL, "Allows define how many segments do you want to process at a time");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo: concurrent

mattab added a commit that referenced this pull request Jul 1, 2014
mattab added a commit that referenced this pull request Jul 1, 2014
mattab added a commit that referenced this pull request Jul 1, 2014
mattab added a commit that referenced this pull request Jul 1, 2014
mattab added a commit that referenced this pull request Jul 1, 2014
…its segments, number of requests to process in parallel

refs #319
refs #5396
@mattab
Copy link
Member

mattab commented Jul 1, 2014

Also current implementation might be dangerous when concurent segments count is big. For ex. when you launch 5 separate core:archive commands, each of them can possibly launch another 3 threads which will give 15 threads on server, instead of desired 5.

In the case that several core:archive commands are triggered, maybe it's best to always use --concurrent-requests-per-website=1? This way you can be sure that each thread will process one request at a time and that max number of threads is predictable.

@mgazdzik
Copy link
Contributor Author

mgazdzik commented Jul 1, 2014

I think it could be a temporary workaround for problem. Generally it will be quite problematic to manage such cronjob with i.e. 45 core:archive commands ;) also one wouldn't be able to launch archiving manually with N threads easily (for ex. while benchmarking?), instead it would require launching N screen instances/ N terminal connections to server.

mattab added a commit that referenced this pull request Jul 1, 2014
mattab added a commit that referenced this pull request Jul 1, 2014
…ism. it should not be needed because this exact function call is already executed as a daily scheduled task. This will make archiving much faster and avoid the need of adding a flag to disable it :)

refs #319
refs #5396
@mattab
Copy link
Member

mattab commented Jul 1, 2014

@mgazdzik all items you've suggested have been fixed. cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants