@medic123de opened this Pull Request on May 15th 2014 Contributor

configurable Thread-Queue size
added debug for PHP calls / Recorder usage

@medic123de commented on May 15th 2014 Contributor

Reason for this change is, that the importer slows markably down if one recorder gets far more hits, so the PHP is slower than the parser, and then the Importer get's stuck in Queue.put (=>threading.acquire). ( improved our import speed from almost 1h to 12 minutes ).

@mattab commented on May 15th 2014 Member

Thanks for PR! What value do you set --queue-size to, to go from 1h to 12min?

@medic123de commented on May 16th 2014 Contributor

we had set it to 12, but it did not use more than 5 in our case.

To be precise: we found in python cProfile, that it spend really much time in waiting for add_hits()->Queue.put .. causing the cpython process hung, so we just enable the main thread to continue working when making Thread-Queue configurable

if you like, i can deliver precise performance data next week.

@mattab commented on June 2nd 2014 Member

Reason for this change is, that the importer slows markably down if one recorder gets far more hits, so the PHP is slower than the parser, and then the Importer get's stuck in Queue.put (=>threading.acquire).

ok thanks for explanation. Do you understand why the importer gets stuck in Queue.put?

Also, do you think it would make sense to increase from default 2 to for example 5 for all users?

@medic123de commented on June 2nd 2014 Contributor

yes, and thats why i am uncertain, if this is a specific problem to us. We have several very busy sites, but some IPs belong to customers, and they do alot traffic from one IP. The effect is: we have the more or less common traffic, which fills recorders evenly, and we have customers, which fill recorder for example recorder 6. if recorder 6 is full, the importer waits. if the customer ends his session, we have random traffic, until another customer fills for example recorder 11, where it gets stuck again.
increasing backlog for importers keeps the recorders busy more evenly.

so, i don't know how other big sites are equipped, how they do get along, but as the PHP Part is the slow part, keeping as much php-fpm childs as busy as possible is a big gain.

5 suits our needs now, but i dont dare to say, if thats true for ech other user - and probably its not necessary for most small sites.
Increasing Recorders does not work in this case ( same IP == same Hash ), also using much more recorders than php-processes does increase context-switches, which slows it down considerably.
Increasing maxsize induces a problem to the PHP runtime limit.

so, the only way to solve that is to modify the backlog, which is currently hardcoded (and changed by this PR).

@mattab commented on June 2nd 2014 Member

Ok I get it now! For sure that it is a problem that many other users will experience. Because it's common to have one or several IPs with hundreds of requests...

@mattab commented on June 2nd 2014 Member

How many --recorders did you set, when you had --queue-size to 5?

Maybe we could set --queue-size to --recorders * 2 ? or --recorders * 3 ? This way we dont have to add a new parameter to the script. Wondering if that would work?

@medic123de commented on June 2nd 2014 Contributor

i found that recorders works best for me if it's * ( 1.5 - 2 )
i will retract the PR, as with the new transactional mode, the gain dropped enormous ( less than 5%).

@mattab commented on June 2nd 2014 Member

Nice, always better to not add new code that makes harder to understand :+1:

But, I still think, it could be a good idea, to process the Queue size based on numbers of recorders (automatically) ;)

This Pull Request was closed on June 2nd 2014
Powered by GitHub Issue Mirror