Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow plugins to generate their reports using historical data (eg. Custom Reports) #11974

Closed
mattab opened this issue Aug 25, 2017 · 21 comments · Fixed by #15889
Closed

Allow plugins to generate their reports using historical data (eg. Custom Reports) #11974

mattab opened this issue Aug 25, 2017 · 21 comments · Fixed by #15889
Assignees
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Aug 25, 2017

In Piwik reports are processed based using the raw data. Once an archive has been processed for a website and a specific date period, and stored in the database (during the core:archive process), it can only be invalidated (via HTTP API or CLI or InvalidateReports plugin) which will force all plugins to re-process all their respective reports in the next archiving process. This is problematic because it can take a lot of time to reprocess previous reports for 3 or 12 months.

-> Plugins should be able to easily create their own reports for some of the historical data. this is useful for all plugins which don't create new raw data but re-uses the existing raw data. For example, Funnels plugin would like to process Funnels data in the past 6 months.

To make it possible for any plugin to append their reports to the existing archives, we need to make some changes to the archiving process.

Ideas:

  • Piwik core to allow an existing archive to be modified later-on so new plugin's reports can be part of and appended in the existing old archive
  • Piwik core to allow each plugin to force re-processing of past data and the ability to specify how far back the data should be re-processed. (working for our two use cases: when using the core:archive console command in a cron, or when using broworser triggered archiving and looking at historical data.)

(not related but a bit similar to #7573 )

@mattab
Copy link
Member Author

mattab commented Nov 29, 2018

When I create Custom reports, and want historic data, I need to invalidate all historical data, and re-process them. It can takes a very long time and is not practical. Recently we had a problem around too much data invalidated (reported in innocraft/plugin-InvalidateReports#8).

So it will be great to have the ability to "update" existing archives and process the data for specific plugins (eg. custom reports) without having to re-process everything else.

@mattab mattab added the Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. label Nov 29, 2018
@mattab mattab added the Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. label Apr 2, 2019
@mattab
Copy link
Member Author

mattab commented Apr 2, 2019

Note: this will be also especially important and valuable for Matomo Cloud service customers where invalidating old data is currently disabled, So users who create custom reports cannot get historical data for their custom reports. Whereas for self-hosted users they can at least manually invalidate reports.

@mattab mattab added this to the 3.12.0 milestone Apr 3, 2019
@mattab mattab changed the title Allow plugins to generate their reports using historical data Allow plugins to generate their reports using historical data (eg. Custom Reports) Apr 12, 2019
@ibril15
Copy link

ibril15 commented Sep 3, 2019

Hi, I'm wondering if this is still planned as part of 3.12 or 3.13? Thanks.

@tsteur
Copy link
Member

tsteur commented Sep 3, 2019

I reckon this will be part of Matomo 4 which we might work on after 3.12 but to be seen. This is definitely an issue we want to work on! In the meantime we created a command that let's you get this data quite easily. If you have access to the command line, you could simply execute ./console customreports:archive in the Matomo directory.

Here's an example

./console customreports:archive --idsites=1 --date=last100

There are various options (eg not for segments, only specific periods, ..) and using the latest custom reports you can even do this only for a specific report using --idreport=X

It's quite fast to archive these reports in the past.

@siva538
Copy link
Contributor

siva538 commented Sep 9, 2019

@tsteur , Thanks for the update.

Is the below available in 3.11 ?

There are various options (eg not for segments, only specific periods, ..) and using the latest custom reports you can even do this only for a specific report using --idreport=X

Thanks a lot.

@tsteur
Copy link
Member

tsteur commented Sep 10, 2019

Yes this works with 3.11 . It's in the custom reports plugin

@siva538
Copy link
Contributor

siva538 commented Sep 10, 2019

Thank you. That helps. Can you also confirm if this has an equivalent for webAPI? Believe all the console commands have equivalent Web-Based API calls, to make sure this can be done with browser based instead of console.

@tsteur
Copy link
Member

tsteur commented Sep 10, 2019

There is no API unfortunately as it's working bit differently from the other features.

@siva538
Copy link
Contributor

siva538 commented Oct 4, 2019

Hello @tsteur . Sorry for getting back to you on this late.

Validated this in 3.11 version and I don't see this parameter being available in the help documentation.

Here is the screenshot for your reference.

image

For #1 command: 3.11 is the Matomo version

For #2 command: missing --idreport parameter from the help documentation

For #3 command: Custom Reports archiving failing for the usage of --idreport

We have also updated the latest version of custom reports plugin, just to make sure we are not in the old one - (latest available - 3.1.18).

Please let me know if I am missing something here.

Thanks a lot.

@tsteur
Copy link
Member

tsteur commented Oct 6, 2019

@siva538 it looks like you're not using the latest version of custom reports for some reason. It should definitely have the parameter.

@mattab mattab modified the milestones: 3.13.0, 4.0.0 Oct 7, 2019
@siva538
Copy link
Contributor

siva538 commented Oct 7, 2019

@tsteur , was able to finally get the parameter to working. Thank you!. Turned out to be a caching issue.

Can you please confirm if 3.1.18 is mandatory for this or is available in 3.1.15 as well for the custom reports plugin?

Thanks a lot.

@tsteur
Copy link
Member

tsteur commented Oct 7, 2019

I think it was added in 3.1.15 from what I see. We always recommend using the latest version though.

@siva538
Copy link
Contributor

siva538 commented Oct 9, 2019

Thank you @tsteur

@diosmosis
Copy link
Member

@tsteur I'm going to base this issue's solution off of #15117, and go about it like so:

  • Core archiving
    • Allow archiving single plugins (would be outside of all archive, unless one exists, in which case we modify the existing and remove the other plugin specific)
    • Allow invalidating for single plugins (must include case where archive all archive exists, but we want to invalidate for just one plugin)
    • Add tests
    • Workflow: when the plugin is activated (ie Funnels), invalidate past data (use core INI config? Bounded by hard upper limit if cron archiving is not enabled?) for the specific plugin for N days/months in past. Archiving then picks it up.
  • Cron archiving
    • Support archiving single plugin archives by itself (can’t go through API.get, must go through {plugin}.get)
    • Refactor code in Refactor cron archiving for simplicity #15117, remove some more code from core:archive to test and test new code as well
  • Browser archiving
    • If an archive Request for a plugin is seen and there is an invalidated plugin specific archive there, archive just the requested plugin(s)
    • tests

Do you see any potential issues w/ this plan?

@tsteur
Copy link
Member

tsteur commented Apr 23, 2020

@diosmosis hard to say. I suppose we'd need specific done flags for all plugins and basically create an archive for each plugin and basically no longer use a generic done flag? Or plugins would maybe "define" whether they can retrospectively aggregate data and we'd use a specific done%s.$pluginName flag for these archives? Might not even need this actually.

I suppose ideally this would work for any plugin where data can be generated retrospectively. Especially interesting for Funnels and Custom Reports I guess.

The goal has to be as soon as a new custom report is created, the system would basically notice either through browser archiving or cron archiving that an archive for a specific custom report is missing, and would start archiving these reports. I suppose technically we'd even want this feature should a custom report be updated then indeed we would likely invalidate these plugin specific archives (but not invalidate other data). They don't have generally a $plugin.get API (yet).

#15117 maybe it would make sense to have a separate table for archive invalidations and no longer handle invalidations in the archive table directly but hard to say (I suppose we'd still need to handle a flag whether a specific archive is invalid or not). Wonder if a table like

archive_invalid(idarchive, archive_table_name, archive_name) makes sense instead of having a done flag in the archive table but not sure if that would work... we'd basically assume all archives are OK unless they are in here in which case the archiver knows they need to be reprocessed and the previous archive removed. Would avoid the many reads on all the archive tables. I haven't really thought about it though.

All I can say really is how it should work generally from a user point of view. Hope this helps.

@diosmosis
Copy link
Member

Or plugins would maybe "define" whether they can retrospectively aggregate data and we'd use a specific done%s.$pluginName flag for these archives? Might not even need this actually.

This is sort of my approach, allow invalidating individual plugin archives, then plugins would just invalidate archives and they would get picked up by core:archive.

#15117 maybe it would make sense to have a separate table for archive invalidations and no longer handle invalidations in the archive table directly but hard to say (I suppose we'd still need to handle a flag whether a specific archive is invalid or not). Wonder if a table like archive_invalid(idarchive, archive_table_name, archive_name) makes sense instead of having a done flag in the archive table but not sure if that would work... we'd basically assume all archives are OK unless they are in here in which case the archiver knows they need to be reprocessed and the previous archive removed. Would avoid the many reads on all the archive tables. I haven't really thought about it though.

This could be a good idea maybe... though we'd have to query the table when querying for archive data when browser archiving is enabled. In that case we wouldn't want to use archive data that is invalidated.

@diosmosis
Copy link
Member

diosmosis commented Apr 23, 2020

@tsteur what do you think about using an invalidations table like:

CREATE TABLE archive_invalidations (
    idarchive INTEGER UNSIGNED NOT NULL,
    name VARCHAR(255) NOT NULL,
    idsite INTEGER UNSIGNED NULL,
    date1 DATE NULL,
    date2 DATE NULL,
    period TINYINT UNSIGNED NULL,
    ts_invalidated DATETIME NULL,
    value DOUBLE NULL,
    PRIMARY KEY(idarchive, name),
    INDEX index_idsite_dates_period(idsite, date1, date2, period, ts_invalidated)
)

The other columns are needed in order to be able to sort the table properly w/o having to look at an archive table simultaneously. Otherwise we'd have to join on an archive table.

When browser archiving is enabled we'd have to join on this table by idarchive/name to check if an archive is invalid.

We could also limit the number of rows we add to this table. Eg, if there are more than 50000 rows or something, just fail the invalidation w/ a warning requesting users to run core:archive. Though this could be an issue for browser archiving... since the user might invalidate an archive then never view it?

I guess we could do both and only add to the table if browser archiving is disabled. This would make the implementation more complicated, but might be worth it if the cost goes down?

@tsteur
Copy link
Member

tsteur commented Apr 23, 2020

I reckon a limit shouldn't be needed, but it be awesome that we could show this in the UI. Meaning the number of archives that are invalidated and will need to be reprocessed, and we could even show which reports will be archived soonish.

BTW on the name. I suppose based on this we would then know whether we have to archive the whole site (eg done or done$segmenthash) or a specific plugin such as done.CustomReports?

@tsteur
Copy link
Member

tsteur commented Apr 23, 2020

Haven't thought too much about it but looks good. BTW the primary index would be probably fine on idArchive alone?

@diosmosis
Copy link
Member

diosmosis commented Apr 23, 2020

@tsteur If we allow invalidating individual reports/metrics (or just plugins), then we'd have to allow multiple idarchive/name pairs.

And there's no issue w/ still doing DONE_INVALIDATED for browser archiving? If we don't then the rows in the invalidated table could just keep building or never be deleted.

@tsteur
Copy link
Member

tsteur commented Apr 23, 2020

I suppose that would be fine considering it's currently the same behaviour (just spread across multiple tables) and we will be trying to get most users to set up cron archiving in a few months (by improving onboarding)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants