Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor archive purging for clarity and resilience. #7377

Merged
merged 33 commits into from Mar 13, 2015

Conversation

diosmosis
Copy link
Member

Refs #7181

Purpose

There are two purposes behind the refactoring in this pull request:

  • clarity: to increase the ability for developers to understand the purging system, both in terms of how it is used and how it works. This involves better scoped classes, better documentation/comments and better class names.
  • resilience: to increase resilience of the system, by decoupling intra-system dependencies and providing coping methods for eventual failures.

Description of Purging System

Here is a description of the refactored purging system:

   Users               +---------------------+             +---------------------------------+          +-------------+
+-------------+        |                     |    adds to  |                                 | consumes |             |
| API         |  calls | Archive\Invalidator +------+------> SitesToReprocessDistributedList <----------+ CronArchive |
|  +          +-------->                     |      |      |     (holds site IDs)            |          |             |
|  |          |        +---------------------+      |      +----------------+----------------+          +-------------+
|  +          |                   v -updates        |                       v                                          
| Archive.php |                archive              |                    option                                        
|  +          |                  table              |                      table                                       
|  |          |                   ^ -deletes        |                       ^                                          
|  +          |  calls +---------------------+      |      +----------------+----------------+          +------------------------+
| Commands    +-------->                     |      +------>                                 | consumes |                        |
+-------------+        | Archive\Purger      |             | ArchivesToPurgeDistributedList  <----------+ Purging Scheduled Task |
                       |                     +             |     (holds year months)         |          |                        |
                       +----------^----------+             +---------------------------------+          +--------+---------------+
                                  |                                     uses                                     |
                                  +-------------------------------------------------------------------------------

Users interact primarily w/ Archive\Invalidator & Archive\Purger services.

  • Archive\Invalidator:
    • invalidate: when archives are invalidated, the invalidator changes the archive done flags to ArchiveWriter::DONE_INVALIDATED. Then it adds the archive's site to the SitesToReprocess list and the archive's table-month to the ArchivesToPurge list.
    • invalidate later: adds information identifying an archive to a distributed list (not pictured above) which is later read and its elements sent to the invalidate operation.
      Note: I did not modify this part of the code.
  • Archive\Purger:
    • purge invalidated: when purging invalidated archives is requested, the purger will look through the requested table (an archive table must be supplied) for invalidated archives that have newer, valid archives, and deletes them. Before doing this (which depends on joining the table w/ itself), a query is issued to find all site IDs in the table for invalidated archives. This is done only for performance, so the INNER JOIN query does not look through every row in the table.
      valid archives are created after a site is reprocessed by CronArchive.
    • purge others: other archive purging (temporary + custom ranges) do not depend on distributed lists. they just issue queries and delete archives.
  • CronArchive:
    • when cron archiving is run, the SitesToReprocessDistributedList is read. Every site referenced by the list is reprocessed, which should create new valid archives, thus allowing the purger to purge the invalidated archives.

** Purging Scheduled Task

  • the daily scheduled task that purges invalidated archives will read from the ArchivesToPurgeDistributedList, and purge invalidated archives from the tables indicated. Since there can be many archive tables, and most of them won't have invalidated archives, we depend on the distributed list for efficiency. That is the sole reason for the list's existence and use.

List of changes

  1. Made ArchivePurger + ArchiveInvalidator services and moved them out of core\DataAccess. Since they access & manipulate different kinds of data in different contexts, they are not simple DAOs.
  2. Moved purge skipping logic to scheduled task for purging, since it is only required there. This allows other code to force a purge if necessary.
  3. Added command core:purge-old-archive-data so users can force archive purging.
  4. Decouple invalidated archive purging from site entity. Originally, site IDs of invalidated archives were stored in the ArchivesToPurge list, now, instead of depending on values in the option table, we do a select on the archive table to find the sites first. Then we do the expensive query using the site IDs.
  5. Decouple ArchivePurger from the ArchivesToPurgeDistributedList, so the class is not itself aware of the distributed list. Nor is it required for the list to be used when purging archive data.
  6. Split InvalidatedReports class into three classes (one is a common base type). The InvalidatedReports class was used to mange two distributed lists stored in the option table. The refactor splits them up into two classes for each list (ArchivesToPurgeDistributedList & SitesToReprocessDistributedList) + a common base type for a simple distributed list (core\Concurrency\DistributedList.php).
    • List classes were moved to the systems that were their primary consumers. SitesToReprocessDistributedList was moved to core\CronArchive & ArchivesToPurgeDistributedList to CoreAdminHome plugin for the scheduled task.
    • The base type DistributedList does exactly what InvalidatedReports did before, nothing more or less. It does not introduce more logic (save one method, removeByIndex).
  7. Remove custom range archive purging from outdated archive purging, since they are not the same thing.
  8. Added lots of tests.

Also includes this change (for testing the command by injecting a mock command into core\Console.php): a73225f

Resilience

Resilience of the system is described in three ways, how many ways each part of the system can fail, how many ways for the users to cope if a part of the system fails, and how easy it is to debug potential bugs.

  • Archive\Invalidator
    • failures: can fail if hit by race condition when adding to distributed lists.

      On failure to add to SitesToReprocess, user can run core:archive for the site, HOWEVER, there may be no easy way for the user to find which sites require this. A query on tables for invalidated rows w/o new data & w/o site entries in distributed list is required.

      On failure to add to ArchivesToPurge, user can purge manually using new command, HOWEVER, there may be no easy way for the user to find which archive tables have invalided archives. Purging however is not as big an issue, since the user will notice table size problems, and use the command to deal w/ them.

    • to debug: there is no way to debug the code easily at this point. Logging can be added, since invalidation does quite a lot of different stuff, but figuring out the specific type of logging is non-trivial. How can we find out what went wrong, BEFORE we tell a user to enable debug logging?

  • Archive\Purger
    • failures: can fail if it deletes more than it should.

      On failure to delete correct archives, user can run core:archive, if the log data remains. If log data does not remain, there is no way for the system to cope.

    • to debug: also no way to debug the code easily. Need to add logging.

  • Purging Scheduled Task
    • failures: can fail if it deletes more than it should and UI shows 'no data' or when, and can fail if scheduled tasks are not run.

      On failure to delete correct archives, user can run core:archive. No immediate action in the UI, though.

      On failure of task scheduler, user can manually purge to deal w/ table sizes.

    • to debug: First failure can be debugged in same way as Archive\Purger failure. Second failure cannot be debugged due to current task scheduler system.

Possible future TODO:

  • We could add a diagnostic command to look for invalidated archives that do not have new data.
  • We might want to make it easy to backup archive tables, since they contain the useful report data. Perhaps this could augment the log purging PrivacyManager feature.
  • The ability to re-process an individual report through the UI that forces archiving for a SINGLE report (regardless of whether browser archiving is enabled), can help w/ any 'no data for this report' inconsistencies.

@piwik/core-team Ready for review.

diosmosis added 25 commits March 1, 2015 15:09
…tatic methods, move Rules::shouldPurgeOutdatedArchives since it is only used by ArchivePurger and move comment in said function.
…her through tracking or through CronArchive), so remove 6 hour forced time between purges in ArchivePurger::shouldPurgeOutdatedArchives.
…PurgeOutdatedArchives() so archive purging can be forced in contexts other than scheduled task running.
…site entity, no longer need to depend on option existing in order to successfully purge invalidated archives.
…mand, so table size will be reduced and not just row count.
…ed lists, fix new purging command for php 5.3, and update relevant methods to not use site IDs when purging/invalidating.
…eparate class from InvalidatedReports. Since list is only used to purge during scheduled tasks, moved to CoreAdminHome plugin where related task resides. Introduced common base type core\Concurrency\DistributedList in order to avoid introducing code redundancy when splitting InvalidatedReports up.
…cessDistributedList and move to Piwik\CronArchive since CronArchive is only consumer of the list. Made the list derive from Piwik\Concurrency\DistributedList.
… from Piwik\DataAccess since it iss not a DAO.
Conflicts:
	core/ArchiveProcessor/Rules.php
…dminHome's scheduled tasks so Piwik core is no longer dependent on it.
…e only consumer for it is the scheduled task and add tests for task in CoreAdminHome (or rather move test from PurgerTest to new TasksTest class). Moved test setup to new fixture type.
…make asserts in RawArchiveDataWithTempAndInvalidated more generic
…hive purging. Move to a second method in Archive\Purger.
@diosmosis diosmosis added the Needs Review PRs that need a code review label Mar 6, 2015
@diosmosis diosmosis added this to the Piwik 2.12.0 milestone Mar 6, 2015

public function __construct(Model $model = null, Date $purgeCustomRangesOlderThan = null)
{
$this->model = $model ?: new Model();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to allow the constructor parameter to be null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for convenience, since DI (and thus auto-wiring) isn't available everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok I wasn't sure where you were using the class. I've taken the habit of doing StaticContainer::get('The\Class') (i.e. always go through the container) even though that's ugly, but at least that forces things to go in the right way: to remove the ugly you'd have to use even more DI

@mnapoli
Copy link
Contributor

mnapoli commented Mar 8, 2015

This is awesome, especially all the explanations :)

$commandInstance = new $command;
if (!$this->has($commandInstance->getName())) {
$this->add($commandInstance);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to explain that the command might be overriden in tests? Else we'll be confused by that in the future and someone might remove it.

(I reposted the comment in the diff…)

@mnapoli
Copy link
Contributor

mnapoli commented Mar 9, 2015

Btw, on a slightly related note, what's your opinion on making any class in Piwik potentially able to listen to events? maybe by implementing an interface and storing the object in DI...

You mean like in Symfony you write an EventListener that declares which events it listens to, and you register it in the container and that's it? So that would be an alternative to listening to events in the Plugin class right? If that's what you mean then yes I think that's a good idea.

However the interface wouldn't be good enough for all this to work unfortunately (for technical reasons like the fact that you can register services using a factory/closure so you can't know the interface of the object before resolving all the definitions). It would have to be done by registering it explicitly (like in Symfony with tags) e.g.:

// plugin config
return [
    'event.listeners' => DI\add([
        DI\link('Piwik\Plugin\Foo\RequestEventListener')
    ]),
];

@diosmosis
Copy link
Member Author

However the interface wouldn't be good enough for all this to work unfortunately (for technical reasons like the fact that you can register services using a factory/closure so you can't know the interface of the object before resolving all the definitions).

That makes it less useful, but it could still be used to reduce coupling...

diosmosis added 4 commits March 9, 2015 22:48
…already exists in the Piwik Console class.
… found, not once per archive table since dates can appear twice (once for numeric table & once for blob table).
@mnapoli
Copy link
Contributor

mnapoli commented Mar 10, 2015

That makes it less useful, but it could still be used to reduce coupling...

I'm missing the point? What I'm saying is that detecting entries based on an interface is extremely limited to the point where it's probably useless. I wouldn't bother going that way. Explicitly registering classes as event listeners would be necessary.

@diosmosis
Copy link
Member Author

I'm missing the point? What I'm saying is that detecting entries based on an interface is extremely limited to the point where it's probably useless. I wouldn't bother going that way. Explicitly registering classes as event listeners would be necessary.

I know, I just don't really like having to register them, but I can't see another way to go. Should still work, I think, though would need DI to be available in more places, I think?

@diosmosis
Copy link
Member Author

@mattab Ready to merge.

@diosmosis diosmosis removed the Needs Review PRs that need a code review label Mar 10, 2015
@mattab mattab added the Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change. label Mar 11, 2015
@mattab mattab changed the title Refs #7181 refactor archive purging for clarity and resilience. refactor archive purging for clarity and resilience. Mar 11, 2015
@mattab
Copy link
Member

mattab commented Mar 11, 2015

Ok, I'm going to review it shortly

@mattab mattab added the Needs Review PRs that need a code review label Mar 11, 2015
@mattab
Copy link
Member

mattab commented Mar 11, 2015

I'm happy about this refactor: it cleans up some messy code and makes this process now very clear.

I have some feedback:

  • overall: looks good to me!
  • can you add the new command to the changelog?
  • in the test functions in PurgeOldArchiveDataTest and in PurgerTest can you add some asserts that assert that the state is expected before the console commands executed (it makes the test easier to read as well as 'stronger' i'm sure you see what i mean)
  • similarly i'd add something like self::$fixture->assertInvalidatedArchivesNotPurged($this->january); in test_purgeInvalidatedArchives_PurgesCorrectInvalidatedArchives_AndOnlyPurgesDataForDatesAndSites_InInvalidatedReportsDistributedList
  • in purgeOutdatedArchives you deconstructed the method Rules::isRequestAuthorizedToArchive() and copied the code, but it should be using the method

@diosmosis
Copy link
Member Author

in purgeOutdatedArchives you deconstructed the method Rules::isRequestAuthorizedToArchive() and copied the code, but it should be using the method

It should not use the method, since using the method will make the code confusing. It doesn't matter if the request 'is authorized to archive', what matters is 'will purging more data than expected cause problems for the UI'. Using the method will lead to confusion where developers may think that somehow archive purging requires archiving, which is not the case. Also, the wording of the method, ie, 'authorize' may result in ideas about user permissions, which will further increase confusion.

@mattab
Copy link
Member

mattab commented Mar 11, 2015

what matters is 'will purging more data than expected cause problems for the UI'.

Maybe we create a method like Rules::willPurgeDataCauseProblemsForUI (or better named) that would then call the isRequestAuthorizedToArchive as implementation detail? I don't know 100% if this is good idea but I'm afraid of duplicating such code. Maybe it's ok though.

@diosmosis
Copy link
Member Author

Maybe we create a method like Rules::willPurgeDataCauseProblemsForUI (or better named) that would then call the isRequestAuthorizedToArchive as implementation detail? I don't know 100% if this is good idea but I'm afraid of duplicating such code. Maybe it's ok though.

This is an acceptable compromise to me. We should be careful though. Updates to isRequestAuthorizedToArchive() may not be applicable to willPurgeDataCauseProblemsForUI() so we may accidentally introduce regressions in the future. And I'm not sure how to effectively test this logic since it depends on the overall system (ie core:archive/browser archiving/whatever else in the future might participate/interfere).

@mattab
Copy link
Member

mattab commented Mar 11, 2015

we may accidentally introduce regressions in the future

a comment in the function isRequestAuthorizedToArchive could help prevent such regression, as well as the fact that when we change a method implementation, all of us do / should check the call tree and verify that it makes sense / is correct for all callers of the method to be changed

diosmosis added 3 commits March 12, 2015 02:26
Conflicts:
	plugins/SitesManager/SitesManager.php
…ssurance that the test data is correct. Use Psr logger in purging scheduled task. And use isRequestAuthorizedToArchive() scheduled task in private method that is named well.
@@ -7,10 +7,7 @@
*
*/
namespace Piwik\Plugins\SitesManager;

use Piwik\Common;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mattab pushed a commit that referenced this pull request Mar 13, 2015
refactor archive purging for clarity and resilience.
@mattab mattab merged commit ac87933 into master Mar 13, 2015
@mattab
Copy link
Member

mattab commented Mar 13, 2015

Looks good!

I'll release a beta and notify users to see if that fixes the issue #7181

@diosmosis diosmosis deleted the 7181_isolated_archive_purging branch March 13, 2015 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review PRs that need a code review Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants