configurable Thread-Queue size
added debug for PHP calls / Recorder usage
Reason for this change is, that the importer slows markably down if one recorder gets far more hits, so the PHP is slower than the parser, and then the Importer get's stuck in Queue.put (=>threading.acquire). ( improved our import speed from almost 1h to 12 minutes ).
Thanks for PR! What value do you set --queue-size to, to go from 1h to 12min?
we had set it to 12, but it did not use more than 5 in our case.
To be precise: we found in python cProfile, that it spend really much time in waiting for add_hits()->Queue.put .. causing the cpython process hung, so we just enable the main thread to continue working when making Thread-Queue configurable
if you like, i can deliver precise performance data next week.
Reason for this change is, that the importer slows markably down if one recorder gets far more hits, so the PHP is slower than the parser, and then the Importer get's stuck in Queue.put (=>threading.acquire).
ok thanks for explanation. Do you understand why the importer gets stuck in Queue.put?
Also, do you think it would make sense to increase from default 2 to for example 5 for all users?
yes, and thats why i am uncertain, if this is a specific problem to us. We have several very busy sites, but some IPs belong to customers, and they do alot traffic from one IP. The effect is: we have the more or less common traffic, which fills recorders evenly, and we have customers, which fill recorder for example recorder 6. if recorder 6 is full, the importer waits. if the customer ends his session, we have random traffic, until another customer fills for example recorder 11, where it gets stuck again.
increasing backlog for importers keeps the recorders busy more evenly.
so, i don't know how other big sites are equipped, how they do get along, but as the PHP Part is the slow part, keeping as much php-fpm childs as busy as possible is a big gain.
5 suits our needs now, but i dont dare to say, if thats true for ech other user - and probably its not necessary for most small sites.
Increasing Recorders does not work in this case ( same IP == same Hash ), also using much more recorders than php-processes does increase context-switches, which slows it down considerably.
Increasing maxsize induces a problem to the PHP runtime limit.
so, the only way to solve that is to modify the backlog, which is currently hardcoded (and changed by this PR).
Ok I get it now! For sure that it is a problem that many other users will experience. Because it's common to have one or several IPs with hundreds of requests...
How many --recorders
did you set, when you had --queue-size
to 5?
Maybe we could set --queue-size
to --recorders * 2
? or --recorders * 3
? This way we dont have to add a new parameter to the script. Wondering if that would work?
i found that recorders works best for me if it's * ( 1.5 - 2 )
i will retract the PR, as with the new transactional mode, the gain dropped enormous ( less than 5%).
Nice, always better to not add new code that makes harder to understand :+1:
But, I still think, it could be a good idea, to process the Queue size based on numbers of recorders (automatically) ;)