I received a request in email to share how I am doing my batch
authority control linking in Evergreen, so I thought I'd write a blog
post to explain it.
In order to batch authority control on Evergreen, you will need three
pieces of software:
- authority_control_fields_batcher.pl
-
You can get authority_control_fields_batcher.pl in my
evergreen_utilities repository.
- disbatcher.pl
-
disbatcher.pl is available from
here.
- authority_control_fields.pl
-
This program comes with Evergreen and recent installations should put
it in your
/openils/bin/ directory.
Run authority_control_fields_batcher.pl and direct the output to a
file:
authority_control_fields_batcher.pl > batches
This will produce a file that you can use with disbatcher.pl. This
file will have entries that will run authority_control_fields.pl over
all of the undeleted bibs in your Evergreen database in batches of
10,000. If you want different options, you should read the comments
in authority_control_fields_batcher.pl.
Next, you should schedule disbatcher.pl to run via at
or cron with some appropriate options:
disbatcher.pl -s 7200 -n 8 -f /full/path/to/batches -v
Depending on your system and where you are running this, see below,
you will likely need different options.
If you just want to start it now, and don't care to specify any extra
options you could just run the following. Remember not to logout until
it finishes or use a screen session:
auhtority_control_fields_batcher.pl | disbatcher.pl
Again, you will likely want to specify some options, particularly to
disbatcher.pl.
I run this on my workstation in the MVLC Central Site offices. I can
do this because I use Ubuntu GNU/Linux and have installed the OpenSRF
and OpenILS libraries and configured them to communicate with our
production installation. If you don't have a GNU/Linux workstation,
then you could run this on your utility server. If you don't have a
utility server, then you could run this directly on your prodcution
server. However, in that case, you may be no better off than just
running authority_control_fields.pl over your entire database without
batching.--I found this to be the case when running it on my
development virtual machine image.
Ideally, you want to run this when your system isn't that busy.
Nights and weekends seem to work well for us. Determining the best
time to run the batches requires a bit of experimentation. I started
by running just 4 batches simultaneously and only running 4 batches by
editing the input file to include only the first four lines of output
from authority_control_fields_batcher.pl. When that went well, I
upped the number to 8 the next night. After that run, I decided to
run all of the remaining files in batches of 8 until they finished.
This last started on a Saturday night. I have not actually run that
last batch, yet, so I won't how it worked until tomorrow, but I
suspect it should finish by 9:00 pm on Sunday night.--I'll post a
follow up blog on Monday to share how it went.