Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configurable Thread-Queue size #277

Closed

Conversation

medic123de
Copy link
Contributor

configurable Thread-Queue size
added debug for PHP calls / Recorder usage

added debug for PHP calls / Recorder usage
@medic123de
Copy link
Contributor Author

Reason for this change is, that the importer slows markably down if one recorder gets far more hits, so the PHP is slower than the parser, and then the Importer get's stuck in Queue.put (=>threading.acquire). ( improved our import speed from almost 1h to 12 minutes ).

@mattab
Copy link
Member

mattab commented May 15, 2014

Thanks for PR! What value do you set --queue-size to, to go from 1h to 12min?

@medic123de
Copy link
Contributor Author

we had set it to 12, but it did not use more than 5 in our case.

To be precise: we found in python cProfile, that it spend really much time in waiting for add_hits()->Queue.put .. causing the cpython process hung, so we just enable the main thread to continue working when making Thread-Queue configurable

if you like, i can deliver precise performance data next week.

@mattab
Copy link
Member

mattab commented Jun 2, 2014

Reason for this change is, that the importer slows markably down if one recorder gets far more hits, so the PHP is slower than the parser, and then the Importer get's stuck in Queue.put (=>threading.acquire).

ok thanks for explanation. Do you understand why the importer gets stuck in Queue.put?

Also, do you think it would make sense to increase from default 2 to for example 5 for all users?

@medic123de
Copy link
Contributor Author

yes, and thats why i am uncertain, if this is a specific problem to us. We have several very busy sites, but some IPs belong to customers, and they do alot traffic from one IP. The effect is: we have the more or less common traffic, which fills recorders evenly, and we have customers, which fill recorder for example recorder 6. if recorder 6 is full, the importer waits. if the customer ends his session, we have random traffic, until another customer fills for example recorder 11, where it gets stuck again.
increasing backlog for importers keeps the recorders busy more evenly.

so, i don't know how other big sites are equipped, how they do get along, but as the PHP Part is the slow part, keeping as much php-fpm childs as busy as possible is a big gain.

5 suits our needs now, but i dont dare to say, if thats true for ech other user - and probably its not necessary for most small sites.
Increasing Recorders does not work in this case ( same IP == same Hash ), also using much more recorders than php-processes does increase context-switches, which slows it down considerably.
Increasing maxsize induces a problem to the PHP runtime limit.

so, the only way to solve that is to modify the backlog, which is currently hardcoded (and changed by this PR).

@mattab
Copy link
Member

mattab commented Jun 2, 2014

Ok I get it now! For sure that it is a problem that many other users will experience. Because it's common to have one or several IPs with hundreds of requests...

@mattab
Copy link
Member

mattab commented Jun 2, 2014

How many --recorders did you set, when you had --queue-size to 5?

Maybe we could set --queue-size to --recorders * 2 ? or --recorders * 3 ? This way we dont have to add a new parameter to the script. Wondering if that would work?

@medic123de
Copy link
Contributor Author

i found that recorders works best for me if it's * ( 1.5 - 2 )
i will retract the PR, as with the new transactional mode, the gain dropped enormous ( less than 5%).

@medic123de medic123de closed this Jun 2, 2014
@mattab
Copy link
Member

mattab commented Jun 2, 2014

Nice, always better to not add new code that makes harder to understand 👍

But, I still think, it could be a good idea, to process the Queue size based on numbers of recorders (automatically) ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants