Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Tracker Concept: Ongoing Visits Cache #8476

Closed
5 tasks
diosmosis opened this issue Aug 3, 2015 · 0 comments
Closed
5 tasks

New Tracker Concept: Ongoing Visits Cache #8476

diosmosis opened this issue Aug 3, 2015 · 0 comments
Labels
c: Performance For when we could improve the performance / speed of Matomo. c: Platform For Matomo platform changes that aren't impacting any of our APIs but improve the core itself. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Milestone

Comments

@diosmosis
Copy link
Member

While tracking actions, conversions or whatever for a visit, it is often necessary to get information about the current visit such as the current visit properties. This is done currently by selecting the last known visit from the database. If information is needed during tracking, a VisitDimension is used, adding a new column to the log_visit table which will be selected automatically. This approach works, but has several downsides:

  • If the information is only needed during tracking and will not be used during aggregation, then the extra column in the database will be a waste of space. Since it will only be needed for ongoing visits, the information would be useless after a visit ends.
  • Not all useful information can be stored as a visit dimension. Using a dimension of any type to associate data w/ a log_ table row assumes that the information has a 1-1 correspondence with the row. Which means if you need to get multitudinous data with a log_ table row (such as multiple rows of another log_ table) it is not possible.
  • Using the log_ tables as a way to cache current visits information means there will always be at least one SELECT on the table(s) per tracking request, even if the visitor has not appeared in a long time. Since there is no "time to idle" when querying the database, we cannot be sure if a visit is ongoing w/o doing a select.

A New Hope Approach

Ongoing visit data has one all important property: it is temporary. It is initialized when a new visit is created, modified when the visit is updated and discarded when the visit ends. It will not exist for long and is identified by a single key (the visitor ID). Thus using a key-value store that acts as a cache, removing data when it is old, would be ideal and very performant.

The caching approach would look like this:

  • First, using the current tracker request info, we create a unique ID for the visit, which would be a mix of the visit's config ID or visitor ID or whatever.
  • Then we try to get the current visit from the cache. If it is not there, we know there is no on-going visit, because when we save the data, we'll set a TTL to the configured visit length.
  • If there is data, we continue w/ tracking. If there is no data, we have to do a select on the log_visit table to check if the visitor is known. After that, we continue w/ tracking assuming a new visit.
  • ... normal tracking ... (all inserted logs will also be appended to in-memory current visit data)
  • The current visit data, now recorded, is set back into the cache. The configured visit length is used as the TTL.

Benefits include:

  • Being able to store whatever you want, not just simple values.
  • Not having to modify existing log tables w/ data that will not be used during aggregation.
  • Increased performance for those who want to setup redis or whatever else.

TODO

  • Move all visit information to new extendable value object for visits. This should include all visit dimension values, as well as connected log data, such as the list of actions. This log data should also be encapsulated in objects. Plugins must be able to extend this information non-intrusively.
  • Create a service (stored in DI) that fills the new visit object object lazily. Should replace VisitorRecognizer (or change it) and by default select data directly from the log_visit table. It should use an intermediate service (ie, OngoingVisitDataProvider) to get information about the current visit. Plugins have to be able to specify their own querying logic for this data, so perhaps there should be an array of them in DI.
  • Create the ongoing visit cache configuration option and handle it in DI by replacing the default OngoingVisitProvider w/ one that uses a cache.
  • Make sure visit recording logic will update the value object, and that this change is reflected in the cache, if a cache is used.
  • Test difference w/ load tests.
@diosmosis diosmosis added the c: Performance For when we could improve the performance / speed of Matomo. label Aug 3, 2015
@diosmosis diosmosis added this to the 3.0.0 milestone Aug 3, 2015
@diosmosis diosmosis added the c: Platform For Matomo platform changes that aren't impacting any of our APIs but improve the core itself. label Aug 3, 2015
@mattab mattab modified the milestones: Mid term, 3.0.0 Aug 13, 2015
@mattab mattab closed this as completed May 5, 2016
@mattab mattab added the wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it. label May 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. c: Platform For Matomo platform changes that aren't impacting any of our APIs but improve the core itself. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Projects
None yet
Development

No branches or pull requests

2 participants