Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve archive writing #14812

Closed
tsteur opened this issue Aug 26, 2019 · 2 comments · Fixed by #14838
Closed

Improve archive writing #14812

tsteur opened this issue Aug 26, 2019 · 2 comments · Fixed by #14838
Assignees
Labels
c: Performance For when we could improve the performance / speed of Matomo.
Milestone

Comments

@tsteur
Copy link
Member

tsteur commented Aug 26, 2019

Matomo provides a fast way to insert many archives using load data infile see https://matomo.org/faq/troubleshooting/faq_194/

This works when calling insertBlobRecord https://github.com/matomo-org/matomo/blob/3.12.0-b2/core/DataAccess/ArchiveWriter.php#L95

However, it works only when inserting multiple records at once. Mostly, we only insert one data table / archive at once and therefore it never makes use of this performance optimisation.

The idea be to not directly insert archives there, but only group insert them once there are eg 10 or 20 or 50 archives. This shouldn't be a memory issue but we may need to check that we don't hold too many data in memory.

Login be like this:

  • Call insertBlobRecord($records)
  • Add it to an archive cache like array_merge($this->archivesToInsert, $records)
  • Are there more than say 30 records in $this->archivesToInsert ?
    • Yes => Actually insert them
    • No => Do not insert them
  • After archiving, when calling finalizeArchive() we would insert all remaining archives that haven't been inserted yet.

We need to figure out re memory what is a good size of number of dataTables to store in a cache... I reckon 10 should be fine for sure, possibly even 50 or 100.

@tsteur tsteur added the c: Performance For when we could improve the performance / speed of Matomo. label Aug 26, 2019
@tsteur tsteur added this to the 3.12.0 milestone Aug 26, 2019
@tsteur
Copy link
Member Author

tsteur commented Aug 26, 2019

A similar improvement we will want to do for numeric inserts but here we don't need to use load data infile I suppose we could just use insert into XYZ values (...),(...), (...) and insert multiple values at once. Eg the method insertNumericRecords() is currently inserting each value one after another instead of adding them all at once. Ideally, we also apply the cache here partially and insert them only in groups of say 50 records.

@tsteur
Copy link
Member Author

tsteur commented Aug 27, 2019

In insertBlobRecord() be good to store the compressed record in memory, not the full text. Storing up to 50 blob records in memory should be fine. For the numeric records we could also store up to 100 records in memory before actually inserting them.

If there are more than 50 records passed to insertBlobRecord() then we can insert them right away all at once (no need to split it in chunks of 50).

At some point we could almost consider caching the entire archive in memory. Seeing here the biggest archive we store is pretty much like 10MB only (a yearly archive, for one archive). On average a regular archive is only a few hundred KB compressed. On average an archive has about 60 blob entries where some archives might have up to 100 or even more than 500 blob entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants