@tsteur opened this Issue on August 26th 2019 Member

Matomo provides a fast way to insert many archives using load data infile see https://matomo.org/faq/troubleshooting/faq_194/

This works when calling insertBlobRecord https://github.com/matomo-org/matomo/blob/3.12.0-b2/core/DataAccess/ArchiveWriter.php#L95

However, it works only when inserting multiple records at once. Mostly, we only insert one data table / archive at once and therefore it never makes use of this performance optimisation.

The idea be to not directly insert archives there, but only group insert them once there are eg 10 or 20 or 50 archives. This shouldn't be a memory issue but we may need to check that we don't hold too many data in memory.

Login be like this:

  • Call insertBlobRecord($records)
  • Add it to an archive cache like array_merge($this->archivesToInsert, $records)
  • Are there more than say 30 records in $this->archivesToInsert ?
    • Yes => Actually insert them
    • No => Do not insert them
  • After archiving, when calling finalizeArchive() we would insert all remaining archives that haven't been inserted yet.

We need to figure out re memory what is a good size of number of dataTables to store in a cache... I reckon 10 should be fine for sure, possibly even 50 or 100.

@tsteur commented on August 26th 2019 Member

A similar improvement we will want to do for numeric inserts but here we don't need to use load data infile I suppose we could just use insert into XYZ values (...),(...), (...) and insert multiple values at once. Eg the method insertNumericRecords() is currently inserting each value one after another instead of adding them all at once. Ideally, we also apply the cache here partially and insert them only in groups of say 50 records.

@tsteur commented on August 27th 2019 Member

In insertBlobRecord() be good to store the compressed record in memory, not the full text. Storing up to 50 blob records in memory should be fine. For the numeric records we could also store up to 100 records in memory before actually inserting them.

If there are more than 50 records passed to insertBlobRecord() then we can insert them right away all at once (no need to split it in chunks of 50).

At some point we could almost consider caching the entire archive in memory. Seeing here the biggest archive we store is pretty much like 10MB only (a yearly archive, for one archive). On average a regular archive is only a few hundred KB compressed. On average an archive has about 60 blob entries where some archives might have up to 100 or even more than 500 blob entries.

This Issue was closed on September 3rd 2019
Powered by GitHub Issue Mirror