Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve events and contents archiver performance when used with segments #12949

Closed
tsteur opened this issue May 21, 2018 · 1 comment
Closed
Labels
c: Performance For when we could improve the performance / speed of Matomo. worksforme The issue cannot be reproduced and things work as intended.

Comments

@tsteur
Copy link
Member

tsteur commented May 21, 2018

We could improve the performance of events, and likely other reports massively by changing the archiving queries. For example the below query could be improved to the 2nd version which on a big data set here improves the query time from 117 seconds to 6 seconds. This is because the join only needs to be evaluated for a limited result set whereas before it was needed to join the tables basically for all actions (including pageviews etc). I presume this will be always a big boost eg if there are many page views but only a few events. For the actions archiver it would be beneficial when there are many events but only a few page views for example. The same should be done for contents archiving.

The pattern is basically to first evaluate all events without having to join anything, then join the tables on the limited result set.

Please note that it makes the query about 0.3s slower for when not a segment is applied (on the tested big data set) from 4.2seconds to 4.5 seconds.

Current version:

            SELECT
                
               log_action_event_category.name as eventCategory,
               log_action_event_action.name as eventAction,
               log_action_event_name.name as eventName,

                count(distinct log_link_visit_action.idvisit) as `2`,
                count(distinct log_link_visit_action.idvisitor) as `1`,
                count(*) as `34`,

                sum(
                    case when custom_float is null
                        then 0
                        else custom_float
                    end
                ) as `35`,
                sum( case when custom_float is null then 0 else 1 end )
                    as `38`,
                min(custom_float) as `36`,
                max(custom_float) as `37`
       
            FROM
                log_link_visit_action AS log_link_visit_action LEFT JOIN log_action AS log_action_event_category ON log_link_visit_action.idaction_event_category = log_action_event_category.idaction LEFT JOIN log_action AS log_action_event_action ON log_link_visit_action.idaction_event_action = log_action_event_action.idaction LEFT JOIN log_action AS log_action_event_name ON log_link_visit_action.idaction_name = log_action_event_name.idaction LEFT JOIN log_media AS log_media ON log_media.idvisit = log_link_visit_action.idvisit 
            WHERE
                 log_link_visit_action.server_time >= '2018-05-16 22:00:00'
                   AND log_link_visit_action.server_time <= '2018-05-17 21:59:59'
                   AND log_link_visit_action.idsite = '1'
                   AND log_link_visit_action.idaction_event_category IS NOT NULL 
               AND
               log_media.watched_time > '0' 
            GROUP BY
                log_link_visit_action.idaction_event_category,
                   log_link_visit_action.idaction_event_action,
                   log_link_visit_action.idaction_name
            ORDER BY
                `2` DESC

Improved version:

     
                SELECT 
               log_action_event_category.name as eventCategory,
               log_action_event_action.name as eventAction,
               log_action_event_name.name as eventName,
               count(distinct innerQuerytest.idvisit) as `2`,
                count(distinct innerQuerytest.idvisitor) as `1`,
                count(*) as `34`,

                sum(
                    case when innerQuerytest.custom_float is null
                        then 0
                        else innerQuerytest.custom_float
                    end
                ) as `35`,
                sum( case when innerQuerytest.custom_float is null then 0 else 1 end )
                    as `38`,
                min(innerQuerytest.custom_float) as `36`,
                max(innerQuerytest.custom_float) as `37`
       
               FROM (

            SELECT
                    log_link_visit_action.idaction_event_category,
                   log_link_visit_action.idaction_event_action,
                   log_link_visit_action.idaction_name,
log_link_visit_action.idvisit,
                 log_link_visit_action.idvisitor, log_link_visit_action.custom_float
       
            FROM
                log_link_visit_action AS log_link_visit_action  
            WHERE
                 log_link_visit_action.server_time >= '2018-05-16 22:00:00'
                   AND log_link_visit_action.server_time <= '2018-05-17 21:59:59'
                   AND log_link_visit_action.idsite = '1'
                   AND log_link_visit_action.idaction_event_category IS NOT NULL 
             
            GROUP BY
                log_link_visit_action.idaction_event_category,
                   log_link_visit_action.idaction_event_action,
                   log_link_visit_action.idaction_name
          )
                innerQuerytest
                LEFT JOIN log_action AS log_action_event_category ON innerQuerytest.idaction_event_category = log_action_event_category.idaction LEFT JOIN log_action AS log_action_event_action ON innerQuerytest.idaction_event_action = log_action_event_action.idaction LEFT JOIN log_action AS log_action_event_name ON innerQuerytest.idaction_name = log_action_event_name.idaction  LEFT JOIN log_media AS log_media ON log_media.idvisit = innerQuerytest.idvisit
                WHERE  log_media.watched_time > '0' 
               
            GROUP BY
                    innerQuerytest.idaction_event_category,
                   innerQuerytest.idaction_event_action,
                   innerQuerytest.idaction_name
                  ORDER BY
                `2` DESC 
@tsteur tsteur added the c: Performance For when we could improve the performance / speed of Matomo. label May 21, 2018
@tsteur
Copy link
Member Author

tsteur commented May 21, 2018

Sorry I just noticed I had 2 group by's in there and removing the inner group by returns in the original query speed so it is not rally making anything faster by the looks.

@tsteur tsteur closed this as completed May 21, 2018
@tsteur tsteur added the worksforme The issue cannot be reproduced and things work as intended. label May 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. worksforme The issue cannot be reproduced and things work as intended.
Projects
None yet
Development

No branches or pull requests

1 participant