Seeing roughly 2-6 times faster segment archiving using this patch.
Generally: The less visitors a segment matches, the much faster the archiving query will become (say even 100 times faster)... if all visitors match the segment, there's a chance it takes slightly longer. The more complex the segment be, the faster this is as well.
$this->logAgreggator->getDb()->query() then the archiving will fail. I reckon this is edge case though since the DB reader is a new feature and not yet wildly used. All our plugins including premium features are compatible
create temporary table: The syntax is like
create temporary table logtemp select idvisit from log_visit ...where the
selectis generated depending on the configured segment. I'm not sure if this works with older MySQL versions. I reckon so but not sure.
In general it works like this when a segment is defined for archivers (not live query as there is no benefit to it):
create temporary table logtemp$segmentHash select idvisit from log_visit where idsite = 1 and visit_last_action_time >= ? and visit_last_action_time <= ? AND ... // segment query appended... sometimes also more joins etc -- now each archiver query uses that temp table select $archiveSelect from logtemp$segmentHash left join log_link_visit_action where id_action_event_category is not null -- just an example... We do not exuecute the ` idsite = 1 and visit_last_action_time >= ? and visit_last_action_time <= ?` again in this query -- drop table (will also automatically deleted on connection end)
In other words:
// traditionally be $segment->getSelectQuery($select, $from, $where, ...); // now we do $segmentTable = 'logtemp445454545'; 'create temporary table ' . $segmentTable . $segment->getSelectQuery('idvisit', 'log_visit', 'idsite=1 and time > ? and time < ?', ...); array_unshift($segmentTable, $from); $noSegment->getSelectQuery($select, $from); // because we execute the segment query in advance, we can always execute basically the raw archiver query (inner joined with the segment table)
From what I see it should pretty much get rid of subqueries etc. and we basically always run the original archiving query and no longer join random tables.
FYI: For now not creating temporary tables as if table be larger than the memory then we would need to execute the select another time and try to create the table on disk... if table result is > allowed memory the create table simply fails
Just did a test btw where a query took 177s, and when using engine=memory the same archiver query was actually not much faster (171s).
All the tests now succeed. There's one failing test in the bandwidth plugin related to this change, all the other test failures are unrelated. I will fix the bandwidth test after merging as otherwise things get complicated re the submodule etc.
BTW: @mattab just fixed another issue while testing with a DB reader where the ranking query was before not using the Reader DB
btw @mattab forgot to mention one thing as a side note...
eg page urls we archive like
log_link_visit_action.server_time < ... > ...... and goals we archive on
log_conversion.server_time < ...>... but this is not 100% correct as we don't respect the visit time and whether the visit for this was also in the same day or not...
The segments now include
log_visit.last_visit_action_time which is correct so there might be slightly different number when having segment
Continent=EUR and segment
Continuent!=EUR where the total of those 2 segments might not be 100% the number without the segment.