It would be great feature to have ability of running all sites archiving and sharing idsites across multiple threads. Currently when we force archiving for all sites using
--force-all-websites process is holding sites list within. This makes it impossible to share this sequence across other threads. It could be additional option like
--share-idsites-sequence . This would also affect forcing defined idsites to be split across all threads.
Another perk of this would be ability to reduce time consumed on computing which sites:
Please let me know your thoughts
if we can make the archive command smarter so that we wouldn't even need the new parameter, it would be really great to go this direction :+1:
Also maybe Cron archiver could be smart enough to detect situation that we want to process all sites anyway (i.e. archiving was 24hrs ago, or so?) so we don't need to play with params?
sounds interesting, can you explain in which situations you have to add the parameter
--force-all-websites currently? maybe you also have other situations where you need to manually add any other of those
if we can list each such use case, we would brainstorm how to improve archive console so it is smart and archives always the data when it is expected to.
Currently we don't use force-all-websites besides of some manual archiving runs, or rare ocassions when for some reasons some mechanisms don't work (for ex. processing all sites after midnight). One of reasons is that we cannot plug many threads to archive single queue of all sites.
Also having this param could benefit us in managing archiving more flexibly.
Consider case where we have lots of sites, and initial checks take 1-2 hours.
It is of course possible to force idsites to process, but it's not really an option when instance has 30k+ websites.
Also currently there's no option to run multi-threaded archiving for all sites, while it actually is heaviest archiving possible I guess.
Changing default behaviour to share idsites wouldn't be good as well, as we may also have following case:
So as you can see there are at least two cases when we can possibly better manage archivings and split work to be done using just params.
Please let me know if I can elaborate a bit more on described use cases?
Also do you think it would be possible to move this param to 2.13.0 milestone?
it makes sense @mgazdzik - moved this needed request to 2.13.0 :+1: