@quba opened this Issue on November 10th 2014 Contributor

There should be a way to switch from POST bulk requests to GET.

Example scenario
Users want to migrate Piwik to new infrastructure without losing data.

There won't be possibility to replay the gap from logs because access log will contain POST requests without needed parameters.

@mattab commented on November 11th 2014 Member

This issue also causes some partial data loss anytime the log replay --replay-logs will be used (learn more in newly added FAQ about log replay

maybe web servers can be configured to log the POST data as well, if not, maybe the only alternative would be to let users eg. disable the POSTing of Content Tracking data and have them sent as GET requests. This would have performance implications for end user on the website. cc @tsteur maybe you have some idea about this?

@tsteur commented on November 11th 2014 Member

Not really. We can use bulk tracking with GET. I tried it with content tracking initially but ran into issues with max URL length which is configured differently everywhere. We'd have to assume a length that is most likely save everywhere. Depending on the content tracking data we have to maybe send a request per content impression, sometimes we can maybe group two or three content impressions together. Problem is when having to do multiple requests we will run into 0 visits bug so we'd have to delay each request by about 800ms meaning some banners might not be tracked if user is not long enough on the site etc. See #6415 . This bug can be fixed with a queue see #6075 but needs to be implemented and special software eg Redis. Personally I'd like to work on #6075 soonish anyway but it won't fix it for all users.

@mattab commented on November 12th 2014 Member

@quba you have some more information - honestly I'm not sure what is the best next step here, besides accepting some data loss until we figure out proper solution to this challenge

@quba commented on November 18th 2014 Contributor

@mattab, @tsteur: maybe we should increase priority of building a queue solution so there won't be need to replay from logs (instead we could replay from a queue that accepts both types of requests).

@tsteur commented on November 18th 2014 Member

I am working on the queue already

@quba commented on November 18th 2014 Contributor


@mattab commented on December 18th 2014 Member

The solution for this issue will be to use the new QueuedTracking plugin. This plugin will store the requests in Redis database including the POST values. This will let us replay all the requests the same way they were initially sent via piwik.js. Unfortunately, we cannot make Content Tracking Log Replay work with Mysql only as we do not store POST values and the web server log files don't log POST values.

See the doc at: https://github.com/piwik/plugin-QueuedTracking#readme

This Issue was closed on December 18th 2014
Powered by GitHub Issue Mirror