New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster segment archiving #14761
Faster segment archiving #14761
Conversation
; Note: if you use any plugins, this need to be compliant with Matomo and | ||
; * depending on the segment you create you may need a newer MySQL version (eg 5.7 or newer) | ||
; * use a reader database for archiving in case you have configured a database reader | ||
enable_segments_cache = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would disable this before the release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we'll need to write tests with this setting enabled and then check the segmentation works
FYI: For now not creating temporary tables as if table be larger than the memory then we would need to execute the select another time and try to create the table on disk... if table result is > allowed memory the create table simply fails |
Just did a test btw where a query took 177s, and when using engine=memory the same archiver query was actually not much faster (171s). |
All the tests now succeed. There's one failing test in the bandwidth plugin related to this change, all the other test failures are unrelated. I will fix the bandwidth test after merging as otherwise things get complicated re the submodule etc. |
BTW: @mattab just fixed another issue while testing with a DB reader where the ranking query was before not using the Reader DB |
btw @mattab forgot to mention one thing as a side note... eg page urls we archive like The segments now include |
Seeing roughly 2-6 times faster segment archiving using this patch.
Generally: The less visitors a segment matches, the much faster the archiving query will become (say even 100 times faster)... if all visitors match the segment, there's a chance it takes slightly longer. The more complex the segment be, the faster this is as well.
fix #11900
refs #14119 (comment)
Current problems:
Db::get()->query()
instead of$this->logAgreggator->getDb()->query
() then the archiving will fail. I reckon this is edge case though since the DB reader is a new feature and not yet wildly used. All our plugins including premium features are compatiblecreate temporary table
: The syntax is likecreate temporary table logtemp select idvisit from log_visit ...
where theselect
is generated depending on the configured segment. I'm not sure if this works with older MySQL versions. I reckon so but not sure.In general it works like this when a segment is defined for archivers (not live query as there is no benefit to it):
In other words:
From what I see it should pretty much get rid of subqueries etc. and we basically always run the original archiving query and no longer join random tables.