New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write tracking requests into a queue and don't process them immediately #6075
Comments
Started working on this but is not that easy to make sure it works under all circumstances. I guess we can have one queue per idsite? and process them in parallel? Meaning the same user will have a different visitorid etc in different sites right? |
that's already the case that Piwik does not do any cross website reporting so +1 (FYI The only cross wesite reporting is done by our plugin InterSites which uses the config_id to figure out visitors across websites). |
Is there a use case that someone might want to enable the queue only for some websites but not for all? Eg for sites having only a view visits still track it directly? Note: Makes it more complicated but still asking ^^ Maybe supported in V2 otherwise. Maybe we will have to develop it like this anyway... will see |
I don't think we need this for now, easier to consider that whole of Piwik will use only one queue for all websites Note: in this issue we will work on Redis support for the queue |
I got this functionality implemented 2 years ago for my company. It can handle up to 30 millions pageviews a day with queue. But currently the policy is expired after leaving so that I can help you to make this functionality public :). Do I need to join on this thread or I can just implement it on my own repo then make a Pull request?
Cookies for user ID: I write an API for cookie sync then each time Piwik.js call it will sync the Cookies with other sites.
My last version that I worked on was 1.8 so that I need to read Piwik 2.0 code before contribute :P |
Hi @afterlastangel thanks for the note. @tsteur is really actively working on it for days now, and we could definitely do with some testing especially performance testing. You can see his work in this branch: https://github.com/piwik/piwik/tree/6075_tracker_queue_ondemand |
Some ideas for https://github.com/piwik/plugin-QueuedTracking |
We do not have the visitorId at this stage. Another possibility would be to shard by siteId later but also this doesn't always help if there is one big site and many small ones. We will have a look at this kinda stuff later. Optimizing upfront is no good ;) |
@tsteur : I'm very interesting in piwik development, I can contribute up to 10 hrs a week. How can we discuss about what I can help? As I remembered the Piwik javascript will send the _id for visitorID, we can even use the IP address for hashing. |
ping @mattab Re visitorId: we could maybe use the IP address although it would probably have to use the anonymized. It is planned to anonymize/randomize the IP even further more so it would not really work I guess see #5907 (comment). There are surely some possibilities here but would prefer to think about it once we have a concrete case for that. Also on mobile devices the IP can change very often I think (which generates a new visit currently I think). |
Hi @tsteur here is the exact algorithm we should use, which we use currently in import_logs.py in #6664 :
so the parameters that should be used to assign visitors to a given queue are in this order of importance (first is most important)
|
Please note that we won't implement this in V1 in case anyone is wondering. For the first we need to make sure the current solution works and is stable. When needed we can add further complexity later. |
Sure makes sense, I wrote it down because you said |
Didn't think it is a good idea that's why there is the "although" ;) No it is already I asked for it... ;) I just wanted to make sure it is clear that we won't implement it right now as a few users asked for it. |
Hi @tsteur is there any work left for this issue? |
Not really for now. Only wanted to await the test which seems to work now. guess will see how it goes after a try on a bigger environment on a cloud test instance |
Kinda similar to replaying the piwik.php logs
It would make tracking really fast when we only store the piwik.php requests in a queue (like Redis or actual queues such as RabbitMQ) and a tracker node would then always pick like 100 out of the queue when there are resources and process them via bulk tracking.
So redirects via piwik.php would be very fast and there should be no problem when there is a peak.
The text was updated successfully, but these errors were encountered: