New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share idsites sequence when forcing archiving of all sites #7614
Comments
Hi @mgazdzik if we can make the archive command smarter so that we wouldn't even need the new parameter, it would be really great to go this direction 👍
sounds interesting, can you explain in which situations you have to add the parameter if we can list each such use case, we would brainstorm how to improve archive console so it is smart and archives always the data when it is expected to. |
hi @mattab, Also having this param could benefit us in managing archiving more flexibly.
So as you can see there are at least two cases when we can possibly better manage archivings and split work to be done using just params. Also do you think it would be possible to move this param to 2.13.0 milestone? |
it makes sense @mgazdzik - moved this needed request to 2.13.0 👍 |
That allows to run a second archive process which will reuse the same (shared) site list.
See a pull request here: #7682 |
#7682 has been merged. |
It would be great feature to have ability of running all sites archiving and sharing idsites across multiple threads. Currently when we force archiving for all sites using
--force-all-websites
process is holding sites list within. This makes it impossible to share this sequence across other threads. It could be additional option like--share-idsites-sequence
. This would also affect forcing defined idsites to be split across all threads.Another perk of this would be ability to reduce time consumed on computing which sites:
This affects especially big instnaces archived once per day where we want to archive all sites anyway. Currently on our observations establishing list of idsites can take 15 to 60 minutes.
Also maybe Cron archiver could be smart enough to detect situation that we want to process all sites anyway (i.e. archiving was 24hrs ago, or so?) so we don't need to play with params?
Please let me know your thoughts
The text was updated successfully, but these errors were encountered: