New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor archive purging for clarity and resilience. #7377
Conversation
…tatic methods, move Rules::shouldPurgeOutdatedArchives since it is only used by ArchivePurger and move comment in said function.
…her through tracking or through CronArchive), so remove 6 hour forced time between purges in ArchivePurger::shouldPurgeOutdatedArchives.
…PurgeOutdatedArchives() so archive purging can be forced in contexts other than scheduled task running.
…r Archive\Purger.
…site entity, no longer need to depend on option existing in order to successfully purge invalidated archives.
…mand, so table size will be reduced and not just row count.
…ed lists, fix new purging command for php 5.3, and update relevant methods to not use site IDs when purging/invalidating.
…eparate class from InvalidatedReports. Since list is only used to purge during scheduled tasks, moved to CoreAdminHome plugin where related task resides. Introduced common base type core\Concurrency\DistributedList in order to avoid introducing code redundancy when splitting InvalidatedReports up.
…cessDistributedList and move to Piwik\CronArchive since CronArchive is only consumer of the list. Made the list derive from Piwik\Concurrency\DistributedList.
… from Piwik\DataAccess since it iss not a DAO.
Conflicts: core/ArchiveProcessor/Rules.php
…dminHome's scheduled tasks so Piwik core is no longer dependent on it.
…e only consumer for it is the scheduled task and add tests for task in CoreAdminHome (or rather move test from PurgerTest to new TasksTest class). Moved test setup to new fixture type.
…ndex method to list for future use.
…ests can inject mock commands.
…make asserts in RawArchiveDataWithTempAndInvalidated more generic
…hive purging. Move to a second method in Archive\Purger.
|
||
public function __construct(Model $model = null, Date $purgeCustomRangesOlderThan = null) | ||
{ | ||
$this->model = $model ?: new Model(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to allow the constructor parameter to be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for convenience, since DI (and thus auto-wiring) isn't available everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok I wasn't sure where you were using the class. I've taken the habit of doing StaticContainer::get('The\Class')
(i.e. always go through the container) even though that's ugly, but at least that forces things to go in the right way: to remove the ugly you'd have to use even more DI
This is awesome, especially all the explanations :) |
$commandInstance = new $command; | ||
if (!$this->has($commandInstance->getName())) { | ||
$this->add($commandInstance); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment to explain that the command might be overriden in tests? Else we'll be confused by that in the future and someone might remove it.
(I reposted the comment in the diff…)
You mean like in Symfony you write an EventListener that declares which events it listens to, and you register it in the container and that's it? So that would be an alternative to listening to events in the Plugin class right? If that's what you mean then yes I think that's a good idea. However the interface wouldn't be good enough for all this to work unfortunately (for technical reasons like the fact that you can register services using a factory/closure so you can't know the interface of the object before resolving all the definitions). It would have to be done by registering it explicitly (like in Symfony with tags) e.g.: // plugin config
return [
'event.listeners' => DI\add([
DI\link('Piwik\Plugin\Foo\RequestEventListener')
]),
]; |
That makes it less useful, but it could still be used to reduce coupling... |
…already exists in the Piwik Console class.
…or ArchivePurger for clarity.
… found, not once per archive table since dates can appear twice (once for numeric table & once for blob table).
I'm missing the point? What I'm saying is that detecting entries based on an interface is extremely limited to the point where it's probably useless. I wouldn't bother going that way. Explicitly registering classes as event listeners would be necessary. |
I know, I just don't really like having to register them, but I can't see another way to go. Should still work, I think, though would need DI to be available in more places, I think? |
@mattab Ready to merge. |
Ok, I'm going to review it shortly |
I'm happy about this refactor: it cleans up some messy code and makes this process now very clear. I have some feedback:
|
It should not use the method, since using the method will make the code confusing. It doesn't matter if the request 'is authorized to archive', what matters is 'will purging more data than expected cause problems for the UI'. Using the method will lead to confusion where developers may think that somehow archive purging requires archiving, which is not the case. Also, the wording of the method, ie, 'authorize' may result in ideas about user permissions, which will further increase confusion. |
Maybe we create a method like |
This is an acceptable compromise to me. We should be careful though. Updates to |
a comment in the function |
Conflicts: plugins/SitesManager/SitesManager.php
…ssurance that the test data is correct. Use Psr logger in purging scheduled task. And use isRequestAuthorizedToArchive() scheduled task in private method that is named well.
@@ -7,10 +7,7 @@ | |||
* | |||
*/ | |||
namespace Piwik\Plugins\SitesManager; | |||
|
|||
use Piwik\Common; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this missing use statement is causing UI build to fail... http://builds-artifacts.piwik.org/ui-tests.master/10881.7/screenshot-diffs/singlediff.html?processed=../processed-ui-screenshots/UIIntegrationTest_actions_pages.png&expected=UIIntegrationTest_actions_pages.png&github=UIIntegrationTest_actions_pages.png
refactor archive purging for clarity and resilience.
Looks good! I'll release a beta and notify users to see if that fixes the issue #7181 |
Refs #7181
Purpose
There are two purposes behind the refactoring in this pull request:
Description of Purging System
Here is a description of the refactored purging system:
Users interact primarily w/ Archive\Invalidator & Archive\Purger services.
ArchiveWriter::DONE_INVALIDATED
. Then it adds the archive's site to the SitesToReprocess list and the archive's table-month to the ArchivesToPurge list.invalidate
operation.Note: I did not modify this part of the code.
valid archives are created after a site is reprocessed by CronArchive.
** Purging Scheduled Task
List of changes
Also includes this change (for testing the command by injecting a mock command into core\Console.php): a73225f
Resilience
Resilience of the system is described in three ways, how many ways each part of the system can fail, how many ways for the users to cope if a part of the system fails, and how easy it is to debug potential bugs.
failures: can fail if hit by race condition when adding to distributed lists.
On failure to add to SitesToReprocess, user can run core:archive for the site, HOWEVER, there may be no easy way for the user to find which sites require this. A query on tables for invalidated rows w/o new data & w/o site entries in distributed list is required.
On failure to add to ArchivesToPurge, user can purge manually using new command, HOWEVER, there may be no easy way for the user to find which archive tables have invalided archives. Purging however is not as big an issue, since the user will notice table size problems, and use the command to deal w/ them.
to debug: there is no way to debug the code easily at this point. Logging can be added, since invalidation does quite a lot of different stuff, but figuring out the specific type of logging is non-trivial. How can we find out what went wrong, BEFORE we tell a user to enable debug logging?
failures: can fail if it deletes more than it should.
On failure to delete correct archives, user can run core:archive, if the log data remains. If log data does not remain, there is no way for the system to cope.
to debug: also no way to debug the code easily. Need to add logging.
failures: can fail if it deletes more than it should and UI shows 'no data' or when, and can fail if scheduled tasks are not run.
On failure to delete correct archives, user can run core:archive. No immediate action in the UI, though.
On failure of task scheduler, user can manually purge to deal w/ table sizes.
to debug: First failure can be debugged in same way as Archive\Purger failure. Second failure cannot be debugged due to current task scheduler system.
Possible future TODO:
@piwik/core-team Ready for review.