Allow plugins to generate their reports using historical data (eg. Custom Reports) #11974

mattab · 2017-08-25T00:38:51Z

In Piwik reports are processed based using the raw data. Once an archive has been processed for a website and a specific date period, and stored in the database (during the core:archive process), it can only be invalidated (via HTTP API or CLI or InvalidateReports plugin) which will force all plugins to re-process all their respective reports in the next archiving process. This is problematic because it can take a lot of time to reprocess previous reports for 3 or 12 months.

-> Plugins should be able to easily create their own reports for some of the historical data. this is useful for all plugins which don't create new raw data but re-uses the existing raw data. For example, Funnels plugin would like to process Funnels data in the past 6 months.

To make it possible for any plugin to append their reports to the existing archives, we need to make some changes to the archiving process.

Ideas:

Piwik core to allow an existing archive to be modified later-on so new plugin's reports can be part of and appended in the existing old archive
Piwik core to allow each plugin to force re-processing of past data and the ability to specify how far back the data should be re-processed. (working for our two use cases: when using the core:archive console command in a cron, or when using broworser triggered archiving and looking at historical data.)

(not related but a bit similar to #7573 )

The text was updated successfully, but these errors were encountered:

mattab · 2018-11-29T01:10:59Z

When I create Custom reports, and want historic data, I need to invalidate all historical data, and re-process them. It can takes a very long time and is not practical. Recently we had a problem around too much data invalidated (reported in innocraft/plugin-InvalidateReports#8).

So it will be great to have the ability to "update" existing archives and process the data for specific plugins (eg. custom reports) without having to re-process everything else.

mattab · 2019-04-02T21:32:23Z

Note: this will be also especially important and valuable for Matomo Cloud service customers where invalidating old data is currently disabled, So users who create custom reports cannot get historical data for their custom reports. Whereas for self-hosted users they can at least manually invalidate reports.

ibril15 · 2019-09-03T18:32:59Z

Hi, I'm wondering if this is still planned as part of 3.12 or 3.13? Thanks.

tsteur · 2019-09-03T21:07:08Z

I reckon this will be part of Matomo 4 which we might work on after 3.12 but to be seen. This is definitely an issue we want to work on! In the meantime we created a command that let's you get this data quite easily. If you have access to the command line, you could simply execute ./console customreports:archive in the Matomo directory.

Here's an example

./console customreports:archive --idsites=1 --date=last100

There are various options (eg not for segments, only specific periods, ..) and using the latest custom reports you can even do this only for a specific report using --idreport=X

It's quite fast to archive these reports in the past.

siva538 · 2019-09-09T14:02:37Z

@tsteur , Thanks for the update.

Is the below available in 3.11 ?

There are various options (eg not for segments, only specific periods, ..) and using the latest custom reports you can even do this only for a specific report using --idreport=X

Thanks a lot.

tsteur · 2019-09-10T00:18:15Z

Yes this works with 3.11 . It's in the custom reports plugin

siva538 · 2019-09-10T12:13:47Z

Thank you. That helps. Can you also confirm if this has an equivalent for webAPI? Believe all the console commands have equivalent Web-Based API calls, to make sure this can be done with browser based instead of console.

tsteur · 2019-09-10T20:35:23Z

There is no API unfortunately as it's working bit differently from the other features.

siva538 · 2019-10-04T09:51:05Z

Hello @tsteur . Sorry for getting back to you on this late.

Validated this in 3.11 version and I don't see this parameter being available in the help documentation.

Here is the screenshot for your reference.

For #1 command: 3.11 is the Matomo version

For #2 command: missing --idreport parameter from the help documentation

For #3 command: Custom Reports archiving failing for the usage of --idreport

We have also updated the latest version of custom reports plugin, just to make sure we are not in the old one - (latest available - 3.1.18).

Please let me know if I am missing something here.

Thanks a lot.

tsteur · 2019-10-06T06:37:06Z

@siva538 it looks like you're not using the latest version of custom reports for some reason. It should definitely have the parameter.

siva538 · 2019-10-07T10:47:47Z

@tsteur , was able to finally get the parameter to working. Thank you!. Turned out to be a caching issue.

Can you please confirm if 3.1.18 is mandatory for this or is available in 3.1.15 as well for the custom reports plugin?

Thanks a lot.

tsteur · 2019-10-07T19:08:08Z

I think it was added in 3.1.15 from what I see. We always recommend using the latest version though.

siva538 · 2019-10-09T06:00:49Z

Thank you @tsteur

diosmosis · 2020-04-22T23:09:45Z

@tsteur I'm going to base this issue's solution off of #15117, and go about it like so:

Core archiving
- Allow archiving single plugins (would be outside of all archive, unless one exists, in which case we modify the existing and remove the other plugin specific)
- Allow invalidating for single plugins (must include case where archive all archive exists, but we want to invalidate for just one plugin)
- Add tests
- Workflow: when the plugin is activated (ie Funnels), invalidate past data (use core INI config? Bounded by hard upper limit if cron archiving is not enabled?) for the specific plugin for N days/months in past. Archiving then picks it up.
Cron archiving
- Support archiving single plugin archives by itself (can’t go through API.get, must go through {plugin}.get)
- Refactor code in Refactor cron archiving for simplicity #15117, remove some more code from core:archive to test and test new code as well
Browser archiving
- If an archive Request for a plugin is seen and there is an invalidated plugin specific archive there, archive just the requested plugin(s)
- tests

Do you see any potential issues w/ this plan?

tsteur · 2020-04-23T01:06:06Z

@diosmosis hard to say. I suppose we'd need specific done flags for all plugins and basically create an archive for each plugin and basically no longer use a generic done flag? Or plugins would maybe "define" whether they can retrospectively aggregate data and we'd use a specific done%s.$pluginName flag for these archives? Might not even need this actually.

I suppose ideally this would work for any plugin where data can be generated retrospectively. Especially interesting for Funnels and Custom Reports I guess.

The goal has to be as soon as a new custom report is created, the system would basically notice either through browser archiving or cron archiving that an archive for a specific custom report is missing, and would start archiving these reports. I suppose technically we'd even want this feature should a custom report be updated then indeed we would likely invalidate these plugin specific archives (but not invalidate other data). They don't have generally a $plugin.get API (yet).

#15117 maybe it would make sense to have a separate table for archive invalidations and no longer handle invalidations in the archive table directly but hard to say (I suppose we'd still need to handle a flag whether a specific archive is invalid or not). Wonder if a table like

archive_invalid(idarchive, archive_table_name, archive_name) makes sense instead of having a done flag in the archive table but not sure if that would work... we'd basically assume all archives are OK unless they are in here in which case the archiver knows they need to be reprocessed and the previous archive removed. Would avoid the many reads on all the archive tables. I haven't really thought about it though.

All I can say really is how it should work generally from a user point of view. Hope this helps.

diosmosis · 2020-04-23T02:27:16Z

Or plugins would maybe "define" whether they can retrospectively aggregate data and we'd use a specific done%s.$pluginName flag for these archives? Might not even need this actually.

This is sort of my approach, allow invalidating individual plugin archives, then plugins would just invalidate archives and they would get picked up by core:archive.

#15117 maybe it would make sense to have a separate table for archive invalidations and no longer handle invalidations in the archive table directly but hard to say (I suppose we'd still need to handle a flag whether a specific archive is invalid or not). Wonder if a table like archive_invalid(idarchive, archive_table_name, archive_name) makes sense instead of having a done flag in the archive table but not sure if that would work... we'd basically assume all archives are OK unless they are in here in which case the archiver knows they need to be reprocessed and the previous archive removed. Would avoid the many reads on all the archive tables. I haven't really thought about it though.

This could be a good idea maybe... though we'd have to query the table when querying for archive data when browser archiving is enabled. In that case we wouldn't want to use archive data that is invalidated.

diosmosis · 2020-04-23T16:54:32Z

@tsteur what do you think about using an invalidations table like:

CREATE TABLE archive_invalidations (
    idarchive INTEGER UNSIGNED NOT NULL,
    name VARCHAR(255) NOT NULL,
    idsite INTEGER UNSIGNED NULL,
    date1 DATE NULL,
    date2 DATE NULL,
    period TINYINT UNSIGNED NULL,
    ts_invalidated DATETIME NULL,
    value DOUBLE NULL,
    PRIMARY KEY(idarchive, name),
    INDEX index_idsite_dates_period(idsite, date1, date2, period, ts_invalidated)
)

The other columns are needed in order to be able to sort the table properly w/o having to look at an archive table simultaneously. Otherwise we'd have to join on an archive table.

When browser archiving is enabled we'd have to join on this table by idarchive/name to check if an archive is invalid.

We could also limit the number of rows we add to this table. Eg, if there are more than 50000 rows or something, just fail the invalidation w/ a warning requesting users to run core:archive. Though this could be an issue for browser archiving... since the user might invalidate an archive then never view it?

I guess we could do both and only add to the table if browser archiving is disabled. This would make the implementation more complicated, but might be worth it if the cost goes down?

tsteur · 2020-04-23T21:35:04Z

I reckon a limit shouldn't be needed, but it be awesome that we could show this in the UI. Meaning the number of archives that are invalidated and will need to be reprocessed, and we could even show which reports will be archived soonish.

BTW on the name. I suppose based on this we would then know whether we have to archive the whole site (eg done or done$segmenthash) or a specific plugin such as done.CustomReports?

tsteur · 2020-04-23T21:35:42Z

Haven't thought too much about it but looks good. BTW the primary index would be probably fine on idArchive alone?

diosmosis · 2020-04-23T21:45:05Z

@tsteur If we allow invalidating individual reports/metrics (or just plugins), then we'd have to allow multiple idarchive/name pairs.

And there's no issue w/ still doing DONE_INVALIDATED for browser archiving? If we don't then the rows in the invalidated table could just keep building or never be deleted.

tsteur · 2020-04-23T21:50:00Z

I suppose that would be fine considering it's currently the same behaviour (just spread across multiple tables) and we will be trying to get most users to set up cron archiving in a few months (by improving onboarding)

mattab mentioned this issue Aug 25, 2017

As a plugin developer, I want to archive my historical plugin's data without having to re-process all other plugins as well #5392

Closed

tsteur mentioned this issue May 21, 2018

Add possibility to trigger the archiving only for a specific plugin #12945

Merged

mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Nov 29, 2018

mattab added the Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. label Apr 2, 2019

mattab added this to the 3.12.0 milestone Apr 3, 2019

mattab changed the title ~~Allow plugins to generate their reports using historical data~~ Allow plugins to generate their reports using historical data (eg. Custom Reports) Apr 12, 2019

sgiehl mentioned this issue Aug 14, 2019

Feature request [CustomReports]: invalidate specific report #14768

Closed

ibril15 mentioned this issue Sep 3, 2019

Invalidate Specific Custom Report innocraft/plugin-InvalidateReports#11

Closed

sgiehl mentioned this issue Sep 4, 2019

Makes flushSpools public available #14854

Merged

mattab modified the milestones: 3.13.0, 4.0.0 Oct 7, 2019

tsteur mentioned this issue Nov 7, 2019

Refactor cron archiving for simplicity #15117

Closed

diosmosis mentioned this issue Apr 2, 2020

Rewrite cron archiving process for easier maintenance and performance #15499

Merged

diosmosis self-assigned this Apr 8, 2020

diosmosis mentioned this issue Apr 30, 2020

allow invalidating plugin archives only and archiving past data for plugins #15889

Merged

diosmosis mentioned this issue May 27, 2020

Archiving job prioritization and safety precautions #15991

Merged

diosmosis closed this as completed in #15889 Aug 4, 2020

This was referenced Sep 6, 2020

Take advantage of plugin specific archiving #16396

Closed

Allow to backfill Goal conversion data after creating a goal #6183

Open

diosmosis mentioned this issue Oct 7, 2020

Support removing invalidations for multiple sites at one time. #16542

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow plugins to generate their reports using historical data (eg. Custom Reports) #11974

Allow plugins to generate their reports using historical data (eg. Custom Reports) #11974

mattab commented Aug 25, 2017 •

edited

mattab commented Nov 29, 2018

mattab commented Apr 2, 2019 •

edited

ibril15 commented Sep 3, 2019

tsteur commented Sep 3, 2019

siva538 commented Sep 9, 2019

tsteur commented Sep 10, 2019

siva538 commented Sep 10, 2019

tsteur commented Sep 10, 2019

siva538 commented Oct 4, 2019

tsteur commented Oct 6, 2019

siva538 commented Oct 7, 2019

tsteur commented Oct 7, 2019

siva538 commented Oct 9, 2019

diosmosis commented Apr 22, 2020

tsteur commented Apr 23, 2020

diosmosis commented Apr 23, 2020

diosmosis commented Apr 23, 2020 •

edited

tsteur commented Apr 23, 2020

tsteur commented Apr 23, 2020

diosmosis commented Apr 23, 2020 •

edited

tsteur commented Apr 23, 2020

Allow plugins to generate their reports using historical data (eg. Custom Reports) #11974

Allow plugins to generate their reports using historical data (eg. Custom Reports) #11974

Comments

mattab commented Aug 25, 2017 • edited

mattab commented Nov 29, 2018

mattab commented Apr 2, 2019 • edited

ibril15 commented Sep 3, 2019

tsteur commented Sep 3, 2019

siva538 commented Sep 9, 2019

tsteur commented Sep 10, 2019

siva538 commented Sep 10, 2019

tsteur commented Sep 10, 2019

siva538 commented Oct 4, 2019

tsteur commented Oct 6, 2019

siva538 commented Oct 7, 2019

tsteur commented Oct 7, 2019

siva538 commented Oct 9, 2019

diosmosis commented Apr 22, 2020

tsteur commented Apr 23, 2020

diosmosis commented Apr 23, 2020

diosmosis commented Apr 23, 2020 • edited

tsteur commented Apr 23, 2020

tsteur commented Apr 23, 2020

diosmosis commented Apr 23, 2020 • edited

tsteur commented Apr 23, 2020

mattab commented Aug 25, 2017 •

edited

mattab commented Apr 2, 2019 •

edited

diosmosis commented Apr 23, 2020 •

edited

diosmosis commented Apr 23, 2020 •

edited