@tsteur opened this Issue on October 24th 2019 Member

We should investigate whether all tracking requests could go by default into a queue and are sent together in one request when possible. Currently, only few premium features make use of the queue. We could enable it for ALL kind of tracking requests.

We have currently a queue timeout of 2.5s. If within 2.5s another request is queued, we'll currently wait another 2.5s before sending the currently queued requests. Could add some logic to make sure to send the requests within 10s max and then emptying the queue or so. There's not really any risk of losing any tracking requests now that we can use sendBeacon and it's already implemented to send the tracking requests for any queued requests on page unload.

This behaviour would only apply when queuedRequests are enabled. Currently, this is the case by default. The problem is that requests will be send using POST and can therefore no longer be replayed using log analytics etc. This means we likely need to disable this feature by default and users can enable this more efficient way of tracking if they don't use log analytics anyway.

This will tracking more effective since less tracking requests will be sent and more requests will be inserted at once. It might even reduces the chances for "0 action in visitor log" https://github.com/matomo-org/matomo/issues/6415 since it works somewhat just like QueuedTracking but in the browser...

The only thing I'm not sure about is whether it's a problem with server side timestamps. Say we queue 1 pageview, 10 content tracking requests, and 3 events into one bulk request... will Matomo record the same server time for all requests instead of maybe knowing the exact time when something happened?

@mattab commented on October 24th 2019 Member

Sounds like an impactful performance improvement :rocket:

This means we likely need to disable this feature by default and users can enable this more efficient way of tracking if they don't use log analytics anyway.

Would be interesting to know approx how much % of tracking requests would be POST vs GET with the new setting VS currently.

Another solution could be to leave it enabled and communicate around Log Analytics Replay (in https://matomo.org/faq/log-analytics-tool/faq_19221/ and other places that this faq is linked from), that Log Analytics won't replay all requests accurately unless the setting xyz is enabled, to force Tracking API requests to be sent as GET to conserve all URL parameters of all Tracking requests.

Maybe people use Log Analytics replay as a backup strategy, especially around updates/maintenance of Matomo? Before Upgrades (in app + guides + faqs) we could mention to enable the setting temporarily.
Ideally it would be possible to toggle this directly in the matomo.js and not require a JS line added to the code, to make it easier to switch the setting from the UI.

We could even display an info message in the System check in the default case eg. If your Matomo server has issues and fails to track the data for a few hours or a few days, some of your tracking data will be lost. Learn how to mitigate this risk. or so.

-> In the end it may be easier indeed to just leave it disabled and make people opt-in the performance improvement.

@tsteur commented on October 24th 2019 Member

It's only needed for few people with high traffic anyway so be easiest to just have it disabled by default.

Would be interesting to know approx how much % of tracking requests would be POST vs GET with the new setting VS currently.

It's impossible to answer since it depends on the site how many requests they track. If someone only has pageviews they will be mostly GET requests. If they track events / content tracking etc many of them will be POSTS. Probably even depends on the page etc.

The only thing I'm not sure about is whether it's a problem with server side timestamps. Say we queue 1 pageview, 10 content tracking requests, and 3 events into one bulk request... will Matomo record the same server time for all requests instead of maybe knowing the exact time when something happened?

This might be a problem though. I haven't looked how Matomo timestamp works but my understanding is that Matomo wouldn't know exactly at which second something happened. This is why we would need to make sure to at least ping every 5-10 seconds otherwise the timestamps become inaccurate. Otherwise we could even group all requests up to 30 or 60 seconds, or 5 minutes... but then we'd need some logic to restore the correct time it happened

@mattab commented on October 27th 2019 Member

The only thing I'm not sure about is whether it's a problem with server side timestamps. Say we queue 1 pageview, 10 content tracking requests, and 3 events into one bulk request... will Matomo record the same server time for all requests instead of maybe knowing the exact time when something happened?

AFAIK it should work fine as long as each request in the bulk request has each a cdt parameter to set the exact timestamp of that request

@tsteur commented on October 28th 2019 Member

AFAIK it should work fine as long as each request in the bulk request has each a cdt parameter to set the exact timestamp of that request

I don't think that works since you can't know the time in UTC as local times are often wrong

@mattab commented on October 31st 2019 Member

I don't think that works since you can't know the time in UTC as local times are often wrong

True.. So maybe we could have a new Tracking API for a bulk request to set the timestamp of parameters relative to each other like "5 seconds after the first one, 74 seconds after the first one".

Btw another related improvement we could make is probably making Content Tracking more efficient. Taking your use case:

Say we queue 1 pageview, 10 content tracking requests, and 3 events into one bulk request...

Maybe the 10 content tracking requests could also be merged into only one request, which would make the tracker api much more efficient and save queries. All content block impressions and interactions could be merged into one instead of often 10 or even 20 requests. This would be a big improvement whenever Content tracking is used and the page has blocks visible when scrolling. (we learnt recently of an on-premise user having to disable Content tracking because it caused too much load and made QueuedTracking lag behind processing several hours as a result).

@tsteur commented on October 31st 2019 Member

Yes the point be really to send them altogether as many as possible in a reasonable interval. It wouldn't save crazy heaps on resources but a few things will be cached and therefore faster. Especially when tracking a lot per page view

Powered by GitHub Issue Mirror