Currently the tracking API accepts a set of GET parameters. each request is one page view. For performance reasons, it sometimes is desired to send multiple page views at once:
The changes to make are:
Attachment: Patch for this issue.
3134.diff.tar.gz
Attachment: New patch for this issue.
3134.diff.tar.2.gz
This is needed quickly for faster log imports
I uploaded a patch for this issue. It provides bulk tracking & bulk API requests.
Some notes on the patch:
What do you think of my patch?
Sorry for the long delay in replying.
When bulk import is enabled, doTrack* should return "true" I think. Then we should test that the functions return true instead of not testing the output.
I uploaded a new patch, let me know what you think.
Replying to matt:
- +if (empty($_GET) && empty($_POST) && $_SERVER['REQUEST_METHOD'] == 'POST')
Piwik_Config::getInstance()->Tracker['tracking_requests_require_authentication'] = 0;
which use case has empty _POST but still requiring no authentication?- why reading the _POSt values from php://input rather than $_POST?
When posting JSON (and not a form-encoded string), $_POST won't be populated. My code will POST JSON so urls don't get encoded (which saves space).
- connectDatabase() is called within the loop, but we should connect only once to mysql for performance reasons
connectDatabase will check if the DB connection is already created, and the DB is disconnected only after the loop. I've renamed the function to make this clearer.
- I am wondering about the sanitizing then unsanitizing of values in core/Tracker.php __construct () -- is it necessary? I thought we should just forward the raw values and the getRequestVar() calls would take care of the sanitizing? Maybe i'm missing something :)
Oops :)
- can you please confirm my note in: #3163 is correct as per your API?
Yes, that's right. And I'll do some benchmarks to see what the best # of requests to send are.
I did some performance testing of my patch:
REQUESTS SENT TIME TIME PER REQUEST
10(>15) .4s .04s
20(>15) .5s .025s
30(>15) .6s .02s
50(>15) .85s .017s
100(>15) 1.5s .015s
200(>15) 2.55s .01275s
300(>15) 3.55s .0118s
400(10-15) 4.6s .0115s
500(10-15) 5.9s .0118
1000(5) 11.3s .0113
2000(5) 23s .0115
5000(5) 56.25s .01125
The number in the parentheses is the number of times I ran the test. I was pessimistic and used the longest elapsed time I encountered. The sweet spot seems to be around 400 requests. After that it stays pretty constant (I tried 10000 once and it was one request every .0113s.
This was done going through localhost. I assume when tracking from a different machine, it would be faster just to send everything in one POST.
capedfuzz: awesome, I'm eager to see how much faster the log import script will be. I'm also glad you chose to use JSON, it will make things easier.
Benaka, beautiful patch, please commit, this is a significant improvement to the tracking API that Log Analytics users will enjoy a lot (since performance was the only negative feedback from the log import feature which MANY users like a lot!)
(In [6524]) Fixes #3134, added bulk tracking feature and bulk request API method.
Notes:
(In [6525]) Refs #3134, forgot to add expected output for getBulkRequest test.
Thanks - could you please also send the documentation to add to: http://piwik.org/docs/tracking-api/reference/ to explain how to do bulk requests?
There will be several users: mobile app piwik anynomous tracking, iOs SDK, Cyril with the Log Import script. A doc will be appreciated for sure :)
(In [6551]) Refs #3134 Allowing POST to trigger bulk request + requiring authentication for bulk requests
Thanks, doc is online at: bulk tracking API requests
(In [6552]) Refs #3134 Fixing bug thanks Jenkins
(In [6581]) Refs #3134, fixing build & make sure PiwikTracker throws when token auth not set and bulk track attempted.
(In [6586]) Doing general reset checks Refs #3134
I think that otherwise, if some requests were forcing the IP, the forced IP or datetime would have be enforeced for the next request since not reset.
Hi capedfuzz,
Is this enhancement integrated in the log importer python script in piwik 1.8.3 ?
Is this enhancement integrated in the log importer python script in piwik 1.8.3 ?
not yet
(In [Fix for "PHP Fatal error: ) operator not supported for strings in /var/www/piwik/libs/PiwikTracker/PiwikTracker.php on line 849"
when php5-curl is not installed Refs #3134