@mattab opened this Issue on February 20th 2019 Member

In order to improve overall performance of Matomo (both at Tracker and Reports processing) an idea we have is to split log_* Tracker tables into two sets of tables: hot data VS cold data.

eg hot data would be data for the last 48 hours or so. Cold data would be any data older than 48 hours. As a result the "hot data" table could be 100 times smaller than the cold data (depending on data retention settings see https://matomo.org/docs/managing-your-databases-size/ )

Having tables split in this way means that:

  • Tracking requests is faster (inserting into smaller tables takes less time)
  • Archiving reports is faster (querying the same datasets but over smaller log_ tables will be faster)

There will be a few changes needed eg. need a process to move data from hot to cold, need to update SQL query generators to query the right tables as applicable (and possibly getting data from the two tables eg. Visitor log requests over multiple days), etc.

@tsteur commented on February 20th 2019 Member

It'll be also beneficial for when there are DB updates on log tables.

@tsteur commented on March 27th 2019 Member

Before working on this or as part of this issue may be good to see if horizontal MySQL partitioning can help like PARTITION BY HASH( MONTH(visit_last_action_time) ).
I think in most queries where performance matters we already do mention the visit_last_action_time and/or server_time but that would need to be checked.

Powered by GitHub Issue Mirror