Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we queue most tracking requests into one? #15050

Closed
tsteur opened this issue Oct 24, 2019 · 8 comments
Closed

Can we queue most tracking requests into one? #15050

tsteur opened this issue Oct 24, 2019 · 8 comments
Labels
c: Performance For when we could improve the performance / speed of Matomo. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. RFC Indicates the issue is a request for comments where the author is looking for feedback. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.

Comments

@tsteur
Copy link
Member

tsteur commented Oct 24, 2019

We should investigate whether all tracking requests could go by default into a queue and are sent together in one request when possible. Currently, only few premium features make use of the queue. We could enable it for ALL kind of tracking requests.

We have currently a queue timeout of 2.5s. If within 2.5s another request is queued, we'll currently wait another 2.5s before sending the currently queued requests. Could add some logic to make sure to send the requests within 10s max and then emptying the queue or so. There's not really any risk of losing any tracking requests now that we can use sendBeacon and it's already implemented to send the tracking requests for any queued requests on page unload.

This behaviour would only apply when queuedRequests are enabled. Currently, this is the case by default. The problem is that requests will be send using POST and can therefore no longer be replayed using log analytics etc. This means we likely need to disable this feature by default and users can enable this more efficient way of tracking if they don't use log analytics anyway.

This will tracking more effective since less tracking requests will be sent and more requests will be inserted at once. It might even reduces the chances for "0 action in visitor log" #6415 since it works somewhat just like QueuedTracking but in the browser...

The only thing I'm not sure about is whether it's a problem with server side timestamps. Say we queue 1 pageview, 10 content tracking requests, and 3 events into one bulk request... will Matomo record the same server time for all requests instead of maybe knowing the exact time when something happened?

@tsteur tsteur added the RFC Indicates the issue is a request for comments where the author is looking for feedback. label Oct 24, 2019
@tsteur tsteur added this to the 4.0.0 milestone Oct 24, 2019
@mattab
Copy link
Member

mattab commented Oct 24, 2019

Sounds like an impactful performance improvement 🚀

This means we likely need to disable this feature by default and users can enable this more efficient way of tracking if they don't use log analytics anyway.

Would be interesting to know approx how much % of tracking requests would be POST vs GET with the new setting VS currently.

Another solution could be to leave it enabled and communicate around Log Analytics Replay (in https://matomo.org/faq/log-analytics-tool/faq_19221/ and other places that this faq is linked from), that Log Analytics won't replay all requests accurately unless the setting xyz is enabled, to force Tracking API requests to be sent as GET to conserve all URL parameters of all Tracking requests.

Maybe people use Log Analytics replay as a backup strategy, especially around updates/maintenance of Matomo? Before Upgrades (in app + guides + faqs) we could mention to enable the setting temporarily.
Ideally it would be possible to toggle this directly in the matomo.js and not require a JS line added to the code, to make it easier to switch the setting from the UI.

We could even display an info message in the System check in the default case eg. If your Matomo server has issues and fails to track the data for a few hours or a few days, some of your tracking data will be lost. Learn how to mitigate this risk. or so.

-> In the end it may be easier indeed to just leave it disabled and make people opt-in the performance improvement.

@tsteur
Copy link
Member Author

tsteur commented Oct 24, 2019

It's only needed for few people with high traffic anyway so be easiest to just have it disabled by default.

Would be interesting to know approx how much % of tracking requests would be POST vs GET with the new setting VS currently.

It's impossible to answer since it depends on the site how many requests they track. If someone only has pageviews they will be mostly GET requests. If they track events / content tracking etc many of them will be POSTS. Probably even depends on the page etc.

The only thing I'm not sure about is whether it's a problem with server side timestamps. Say we queue 1 pageview, 10 content tracking requests, and 3 events into one bulk request... will Matomo record the same server time for all requests instead of maybe knowing the exact time when something happened?

This might be a problem though. I haven't looked how Matomo timestamp works but my understanding is that Matomo wouldn't know exactly at which second something happened. This is why we would need to make sure to at least ping every 5-10 seconds otherwise the timestamps become inaccurate. Otherwise we could even group all requests up to 30 or 60 seconds, or 5 minutes... but then we'd need some logic to restore the correct time it happened

@mattab
Copy link
Member

mattab commented Oct 27, 2019

The only thing I'm not sure about is whether it's a problem with server side timestamps. Say we queue 1 pageview, 10 content tracking requests, and 3 events into one bulk request... will Matomo record the same server time for all requests instead of maybe knowing the exact time when something happened?

AFAIK it should work fine as long as each request in the bulk request has each a cdt parameter to set the exact timestamp of that request

@tsteur
Copy link
Member Author

tsteur commented Oct 28, 2019

AFAIK it should work fine as long as each request in the bulk request has each a cdt parameter to set the exact timestamp of that request

I don't think that works since you can't know the time in UTC as local times are often wrong

@mattab
Copy link
Member

mattab commented Oct 31, 2019

I don't think that works since you can't know the time in UTC as local times are often wrong

True.. So maybe we could have a new Tracking API for a bulk request to set the timestamp of parameters relative to each other like "5 seconds after the first one, 74 seconds after the first one".

Btw another related improvement we could make is probably making Content Tracking more efficient. Taking your use case:

Say we queue 1 pageview, 10 content tracking requests, and 3 events into one bulk request...

Maybe the 10 content tracking requests could also be merged into only one request, which would make the tracker api much more efficient and save queries. All content block impressions and interactions could be merged into one instead of often 10 or even 20 requests. This would be a big improvement whenever Content tracking is used and the page has blocks visible when scrolling. (we learnt recently of an on-premise user having to disable Content tracking because it caused too much load and made QueuedTracking lag behind processing several hours as a result).

@mattab mattab added c: Performance For when we could improve the performance / speed of Matomo. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. labels Oct 31, 2019
@tsteur
Copy link
Member Author

tsteur commented Oct 31, 2019

Yes the point be really to send them altogether as many as possible in a reasonable interval. It wouldn't save crazy heaps on resources but a few things will be cached and therefore faster. Especially when tracking a lot per page view

@mattab mattab removed this from the 4.0.0 milestone Feb 20, 2020
@tsteur
Copy link
Member Author

tsteur commented Jun 16, 2020

BTW queueing more requests into one request has a slight disadvantage that more bulk requests will be used meaning more transactions will be happening and there's more of a risk of deadlocks.

However, we are planning to take care of these locks soon anyway.

@tsteur
Copy link
Member Author

tsteur commented Oct 20, 2020

Closing for now as it increases the chance of deadlocks.

@tsteur tsteur closed this as completed Oct 20, 2020
@tsteur tsteur added wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. labels Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Performance For when we could improve the performance / speed of Matomo. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. not-in-changelog For issues or pull requests that should not be included in our release changelog on matomo.org. RFC Indicates the issue is a request for comments where the author is looking for feedback. wontfix If you can reproduce this issue, please reopen the issue or create a new one describing it.
Projects
None yet
Development

No branches or pull requests

2 participants