@tsteur opened this Issue on December 7th 2020 Member

Say you are tracking apps and you send a userId with every tracking request. Then by default the userId will be used to generate the visitorId.

Queries like this will become very slow:

explain extended SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_seconds_since_first, visitor_seconds_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, idsite, profilable, visit_entry_idaction_url, visit_total_actions, visit_total_interactions, visit_total_searches, referer_url, config_browser_name, config_client_type, config_device_brand, config_device_model, config_device_type, visit_total_events, visit_total_time, location_ip, location_browser_lang, campaign_content, campaign_group, campaign_id, campaign_keyword, campaign_medium, campaign_name, campaign_placement, campaign_source, last_idlink_va, custom_dimension_1, custom_dimension_2, custom_dimension_3, custom_dimension_4, custom_dimension_5, custom_dimension_6, custom_var_k1, custom_var_v1, custom_var_k2, custom_var_v2, custom_var_k3, custom_var_v3, custom_var_k4, custom_var_v4, custom_var_k5, custom_var_v5  FROM log_visit FORCE INDEX (index_idsite_idvisitor)  WHERE idsite = '4' AND visit_last_action_time <= '2020-12-07 20:34:48' AND idvisitor = '...'
                ORDER BY visit_last_action_time DESC
                LIMIT 1

Explain:
image

To workaround this performance issue set [Tracker]enable_userid_overwrites_visitorid=1 in your config/config.ini.php.

Ideally we find a way to not needing this setting though. Maybe we could at least limit time range by adding to the where visit_last_action_time > 1 year before upper limit. It won't make it much better but at least it be slightly faster especially when the table has over the years collected many hundred thousands or millions of requests for the same visitor. In our use case we have for example for the same idVisitor 1.6M entries which is roughly 30% of all visits for the same visitor.

We used to have this lower limit in the where clause but AFAIK we removed it for Cohorts. @diosmosis do you maybe remember? To make cohorts still useful, how far back would we need to look? 1 year? Or would it be fine to only look back 6 months? I suppose we'd need to look back kind of ideally at least 1 year for the reports to be fine when you look at months for cohorts.

Any other thoughts on how to make this use case faster be appreciated

@tsteur commented on December 7th 2020 Member

The where clause with one year back might not really help though because most visits might be within one year (seeing such a DB here). Maybe a config setting forcing look back to be only few days might be better but this would need a config. Ideally it was fixed automatically.

@tsteur commented on December 7th 2020 Member

A solution might be otherwise a custom index.

Powered by GitHub Issue Mirror