Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure that the archive algorithm cannot be triggered multiple times for a same site/period/segment #1938

Closed
mattab opened this issue Jan 4, 2011 · 20 comments
Assignees
Labels
Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Jan 4, 2011

If the script is already running ( ps aux | grep archive.sh | wc -l or similar) if it is running, script echos "archive.sh already running, exiting..."

is there any reason the simple ps aux would not work on all linux servers?

Ideally the windows powershell version should do this as well, but optional for now

@robocoder
Copy link
Contributor

In #1698, hansfn suggests changing archive.sh to use /bin/sh instead of /bin/bash. I'm not opposed since /bin/sh is often symlinked to a variant, e.g., bash or dash.

@hansfn
Copy link

hansfn commented Jan 6, 2011

"ps aux" should work on all Linux servers. (Maybe drop the "u" since it just adds unnecessary output?) However, be aware that some times the grep command itself will be included in the ps listing - ref http://stackoverflow.com/questions/4473610 For me the "grep -v grep" addition normally works.

PS! You could of course use locking files - create at start of the script, remove at the end and if killed (using trap).

@halfdan
Copy link
Member

halfdan commented Jan 7, 2011

"ps aux" does work on GNU ls but certainly not on all systems. Solaris does not use GNU ls but a ls which conforms to POSIX (see http://en.wikipedia.org/wiki/Ls). We should use "ps -A" which works on all systems I got at hand (Linux, FreeBSD/NetBSD, Solaris 10).

File locking as you described it can run into race conditions due to OS task scheduling.

@halfdan
Copy link
Member

halfdan commented Jan 8, 2011

Note to self: Don't write comments in the middle of the night.

I obviously meant GNU ps. http://en.wikipedia.org/wiki/Ps_%28Unix%29
Same applies, ps on Linux does not support the POSIX standard. We should use "ps -A".

@mattab
Copy link
Member Author

mattab commented May 5, 2011

This is covered in: http://stackoverflow.com/questions/185451/quick-and-dirty-way-to-ensure-only-one-instance-of-a-shell-script-is-running-at-a

and http://stackoverflow.com/questions/185451/how-to-check-in-a-bash-script-if-something-is-running-and-exit-if-it-is

We just had the problem on the demo, process running 6 times in parrallel causing massive IO and crash, we should fix this..!

I don't like the solution with a lock file. Instead we should use the ps XXX | grep archive.sh

@anonymous-matomo-user
Copy link

matt asked me to check this issue in a comment to #2440

using ps|grep is really, really ugly... (my opinion as "shell expert")

for example if some other script also named archive.sh exists on the system, and is being run, maybe even by another user...

i'd strongly prefer lockfiles, and generate a warning for the user if the lockfile is locked (and of a certain age maybe)...

but there's one perfectly clean and elegant solution that beats all that cruft:

using a lock in the database.
and the job already uses a database, because it's working on the data in it (duh!).

such a lock is managed on the server, and automatically released when the connection is closed (script dies). (mysql has get_lock(name,timeout) )

the only issue here is that the archive job opens a new database connection (opened by a separate api/php invocation) for each processing step...

i would suggest to rewrite the bottom part of the script in php and move it into the piwik api as a function,
then it could simply acquire a lock in the database...
(beware that this might cause issues with the php process growing due to memory leaks, but that could in worst case be worked around by running separate php instances for the steps like the shellscript does, while keeping the db connection/lock.)

that would also help reduce the language diversity, and the need for shell programmers, which seem to be lacking in piwik project staff... ;)

@anonymous-matomo-user
Copy link

Replying to halfdan:

File locking as you described it can run into race conditions due to OS task scheduling.

wrong! not if done correctly.

what is needed is some kind of atomic operation that can only succeed once...
ofcourse checking for a file's existance and then creating it is not atomic.
but for example opening the file with O_EXCL is atomic, as are file locking functions...

the by far easiest alternative in a shellscript is mkdir though, which i prefer to use for locking.

if ! mkdir /tmp/mylock ; then

echo failed to get lock

exit 1

fi

trap "rmdir /tmp/mylock" EXIT

a bunch of stuff worth reading:
http://mywiki.wooledge.org/BashFAQ/045
http://mywiki.wooledge.org/ProcessManagement#How_do_I_make_sure_only_one_copy_of_my_script_can_run_at_a_time.3F
actually all of http://mywiki.wooledge.org/ProcessManagement

@anonymous-matomo-user
Copy link

i updated my rework of the script for #2440 to use a lockfile.
http://issues.piwik.org/attachments/2440/archive.sh

@mattab
Copy link
Member Author

mattab commented Jun 20, 2011

I agree that the DB lock would be good to have, but like you say it is nice to do it in the script also ;-)

Thanks for implementing lock file! Maybe when a lock file was detected, the error message could say "If no other process is running, you can manually delete the lock file at: $URL" to help users know what to do next if for some reasons lock wasn't deleted (server crashed maybe?)

Otherwise it's great, I think the next step for improving the script would be, multithread the requests to make use of the cores of a system ;)

See also #2440

@anonymous-matomo-user
Copy link

Replying to matt:

I think the next step for improving the script would be, multithread the requests to make use of the cores of a system ;)

that would be possible to add, with a little restructuring...
but is it really a good idea?
we just added a lock to prevent multiple instances, and now we want to re-introduce them inside the single instance?
has it been verified that this actually improves performance?
i don't know the internals of the "archiving" process, but would assume that it runs a bunch of aggregate queries over the log data and caches the result...? does running those in parallel realy improve performance, or will they just slow eachother down due to table contention?
i'd suggest you open a new issue for that feature and do some testing first...

@mattab
Copy link
Member Author

mattab commented Jul 9, 2011

Running in parallel will increase performance, because the data queried will be different, and all the time spent in PHP processing can be multithreaded very efficiently. It might not work well for very large websites, but will make a huge difference in the use case of a piwik with thousands of small websites to process.

@cbay
Copy link
Contributor

cbay commented Jul 12, 2011

For the record, using a multithreaded archive.sh (from #2563), performance increases almost linearly. I have a 8-core machine and more than 20,000 sites in my Piwik.

Having a lock on archive.sh doesn't seem to be an issue to me: multithreaded is handled by xargs running inside archive.sh.

@robocoder
Copy link
Contributor

I don't know if this will help or hinder, but in [5065], we are now using get_lock/release_lock around the archive processing.

@mattab
Copy link
Member Author

mattab commented Aug 10, 2011

vipsoft, Re: [5065]

The release lock name I think should contain the segment as well: $this->segment since you can archive several segments for a same site/period.

Adding here as a note to check before release.

@robocoder
Copy link
Contributor

re: lock name. Feel free to add segment to the name.

@mattab
Copy link
Member Author

mattab commented Aug 12, 2011

(In [5102]) Refs #2327

@tsteur
Copy link
Member

tsteur commented Jul 7, 2015

FYI: I think I've seen this here in UI tests
archive

@mattab
Copy link
Member Author

mattab commented Jul 7, 2015 via email

@tsteur
Copy link
Member

tsteur commented Jul 8, 2015

Is it a different issue than this one?

@mattab
Copy link
Member Author

mattab commented Jul 8, 2015

It's the same issue, but because it has regressed we usually create a new issue

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical. Task Indicates an issue is neither a feature nor a bug and it's purely a "technical" change.
Projects
None yet
Development

No branches or pull requests

7 participants