MVLC: 2012

Wednesday, October 3, 2012

Backstage Authority Update

This post is to announce some improvements to the software we use to import records from Backstage Library Works. The new features include the ability to run authority_control_fields.pl on updated bibs and a "rerun" option to allow you to run the software again in the event of a failure.

If you have found this software useful, then you might want to checkout the latest changes with git and see what the improvements are.

Monday, September 17, 2012

Authority Control: After Action Report

The run of authority_control_fields.pl that I start at 7:00 pm on Saturday ran through Sunday and finished at 4:35 am this (Monday) morning. At 33 hours and 35 minutes, it took a little longer than I had hoped, but it finished well within the bounds of what I needed.

For those of you following along at home, there were some clean up issues this morning.

The output contained 1,699 lines about what appeared to be bib records that were missing subfield codes in various tags, mostly 400, 410 and 670. These lines were typically surrounded by messages about wide characters in warn.

I checked all of the reported bib records and the one thing that they all had in common was that they did not contain the datafield that was supposedly missing subfield entries. I mentioned this on IRC and Galen Charlton suggested that it could be bad authorities.

So, I modified my copy of authority_control_fields.pl to add print("$rec_id : $auth_id\n"); on or about line 461. This way it would print all of the bibliographic records and matching authority record ids. I then wrote a script to take the list of bibs and run this authority_control_fields.pl and capture the output to a file. This script ran each of the bad records individually using the --record parameter of authority_control_fields.pl. This run mysteriously produced no error output and all of the bibs now appear to be linked to authorities.

I then sorted the output of authority ids and uniquified the list. After checking the authorities by dumping their MARCXML to a file and going over it, none of them looked bad.

Galen called this a "heisenbug" since the behavior seems to change as you observe it. However, I think the strange output maybe due to some difference in the environment when I run jobs via at. I normally use the UTF-8 character set, and this may not be sent in the environment when at runs a job.

The upshot of the above is, if you get errors when running your batched authority_control_fields.pl jobs, then run it again on the errored records. This may just fix those.

Saturday, September 15, 2012

HOWTO: Batch Authority Control

I received a request in email to share how I am doing my batch authority control linking in Evergreen, so I thought I'd write a blog post to explain it.

In order to batch authority control on Evergreen, you will need three pieces of software:

authority_control_fields_batcher.pl: You can get authority_control_fields_batcher.pl in my evergreen_utilities repository.
disbatcher.pl: disbatcher.pl is available from here.
authority_control_fields.pl: This program comes with Evergreen and recent installations should put it in your /openils/bin/ directory.

Run authority_control_fields_batcher.pl and direct the output to a file:

authority_control_fields_batcher.pl > batches

This will produce a file that you can use with disbatcher.pl. This file will have entries that will run authority_control_fields.pl over all of the undeleted bibs in your Evergreen database in batches of 10,000. If you want different options, you should read the comments in authority_control_fields_batcher.pl.

Next, you should schedule disbatcher.pl to run via at or cron with some appropriate options:

disbatcher.pl -s 7200 -n 8 -f /full/path/to/batches -v

Depending on your system and where you are running this, see below, you will likely need different options.

If you just want to start it now, and don't care to specify any extra options you could just run the following. Remember not to logout until it finishes or use a screen session:

auhtority_control_fields_batcher.pl | disbatcher.pl

Again, you will likely want to specify some options, particularly to disbatcher.pl.

I run this on my workstation in the MVLC Central Site offices. I can do this because I use Ubuntu GNU/Linux and have installed the OpenSRF and OpenILS libraries and configured them to communicate with our production installation. If you don't have a GNU/Linux workstation, then you could run this on your utility server. If you don't have a utility server, then you could run this directly on your prodcution server. However, in that case, you may be no better off than just running authority_control_fields.pl over your entire database without batching.--I found this to be the case when running it on my development virtual machine image.

Ideally, you want to run this when your system isn't that busy. Nights and weekends seem to work well for us. Determining the best time to run the batches requires a bit of experimentation. I started by running just 4 batches simultaneously and only running 4 batches by editing the input file to include only the first four lines of output from authority_control_fields_batcher.pl. When that went well, I upped the number to 8 the next night. After that run, I decided to run all of the remaining files in batches of 8 until they finished. This last started on a Saturday night. I have not actually run that last batch, yet, so I won't how it worked until tomorrow, but I suspect it should finish by 9:00 pm on Sunday night.--I'll post a follow up blog on Monday to share how it went.

Color Me Impressed: Another Authority Control Report

The batch of 8 files all processed within two hours last night. (See the output below.) That's a full 20 minutes ahead of the 4 that ran on Thursday. I chalk the improved performance up to there being less happening on the servers on a Friday night.

Given that level of success, I plan to run the remaining batches starting tonight at 7:00 pm. I'll have it run 8 at a time and give it the full list of remaining commands. It should finish sometime Sunday night or early Monday morning.

My next blog post will explain how I'm doing this, so watch this space.

Output:

dispatched: /openils/bin/authority_control_fields.pl --start_id=11 --end_id=21439
1 of 8 running
dispatched: /openils/bin/authority_control_fields.pl --start_id=21441 --end_id=43928
2 of 8 running
dispatched: /openils/bin/authority_control_fields.pl --start_id=43931 --end_id=65785
3 of 8 running
dispatched: /openils/bin/authority_control_fields.pl --start_id=65791 --end_id=86020
4 of 8 running
dispatched: /openils/bin/authority_control_fields.pl --start_id=86026 --end_id=102506
5 of 8 running
dispatched: /openils/bin/authority_control_fields.pl --start_id=102507 --end_id=119262
6 of 8 running
dispatched: /openils/bin/authority_control_fields.pl --start_id=119263 --end_id=136363
7 of 8 running
dispatched: /openils/bin/authority_control_fields.pl --start_id=136364 --end_id=152938
8 of 8 running
1 of 8 processed
7 of 8 running
2 of 8 processed
6 of 8 running
3 of 8 processed
5 of 8 running
4 of 8 processed
4 of 8 running
5 of 8 processed
3 of 8 running
6 of 8 processed
2 of 8 running
7 of 8 processed
1 of 8 running
8 of 8 processed
0.01user 0.00system 1:59:44elapsed 0%CPU (0avgtext+0avgdata 14480maxresident)k
752inputs+8outputs (0major+1308minor)pagefaults 0swaps

Friday, September 14, 2012

More issa changes.

After some thinking and some other bib-related work. I've decided to make issa create new copies as pre-cat bibs like it should have to begin with. Since there is an installed base (however small) of issa users, this new feature will be optional, but turned on by default, so any it will be there for any new installations. If an existing installation wants to use the new feature, then they'll need to update their issa code and add the option in their configuration file. I'll explain how it works and how to activate the feature once I've actually coded the solution.

Authority Control Linking: Results

The first batch of authority control fields linking went well last night. It finished in two hours and twenty-one minutes. Here's the report that I received in email:

dispatched: /openils/bin/authority_control_fields.pl --start_id=11 --end_id=21439
1 of 4 running
dispatched: /openils/bin/authority_control_fields.pl --start_id=21441 --end_id=43928
2 of 4 running
dispatched: /openils/bin/authority_control_fields.pl --start_id=43931 --end_id=65785
3 of 4 running
dispatched: /openils/bin/authority_control_fields.pl --start_id=65791 --end_id=86020
4 of 4 running
4 of 4 running
4 of 4 running
1 of 4 processed
3 of 4 running
2 of 4 processed
2 of 4 running
3 of 4 processed
1 of 4 running
4 of 4 processed
0.00user 0.01system 2:21:18elapsed 0%CPU (0avgtext+0avgdata 14480maxresident)k
0inputs+8outputs (0major+1189minor)pagefaults 0swaps

Tonight, we'll try running 8 in a batch of 8 to see if that takes longer or just as long. Depending on the results of tonight's test, we may just run the rest through starting Saturday night, or we'll continue running batches each night.

Thursday, September 13, 2012

Authority Control Linking

This post is more for MVLC member libraries' staff than for the community at large, which is a bit of a switch for us. This blog is meant to be for the benefit of our members as much as it is for the benefit of the community.

Starting tonight, September 13, 2012, MVLC central site staff will run the script for linking authorities with bibs in Evergreen, authority_control_fields.pl. We plan to run it on batches of 10,000 bibs at a time with up to four batches running simultaneously. We will do just four batches on the first night to see how long that takes. Depending on the results, we may bump the number up to 8 or 16 batches per night, or adjust the number of simultaneously running batches downward.

Depending upon how many batches we can successfully complete in a night, this will take us anywhere from six to twenty-two days to complete.

While I don't expect this to have any impact on production performance whatsoever, we are still running this at night as a precaution.

If there's any interest in the comments, I'll post updates as this progresses or not.

Saturday, September 1, 2012

All issa, all the time

Yes, another post concerning issa. Now that other sites are using it some unexpected situations have come up. The latest changes to the code will fail more gracefully if your configuration file still says to use a stat cat entry on copies created by issa, but neither that stat cat entry nor the corresponding stat cat exist.

Thursday, August 9, 2012

YAFFI: Yet-Another-Feature-For-issa

Hot on the heels of yesterday's changes (arguably fixes) for issa, we have today's new feature to trumpet: Force Holds.

issa can now be configured to do force holds on copies sent from outside your Evergreen system to fill your patron's ILL requests. This is done simply by assigning the COPY_HOLDS_FORCE permission to the group you created for your issa circulator. If you're setting up a new issa installation from scratch, then setup.pl will ask you if you want to use this feature. Answering Y (or anything beginning with a letter y) will assign the permission and thus turn the feature on. Answering N (or anything beginning with a letter n) will not assign the permission and thus not turn the feature on.

Turning this feature on has the intended effect of setting the holdable flag to false on all new copies created by the issa software. This will prevent staff at your member libraries from placing holds on these copies. issa will then issue force holds on these copies for your patrons, thus getting around the holdable is false on its own copies.

We've had a few cases of staff (and even a patron or two!) placing holds on issa's copies when they weren't supposed to. This new feature is intended to prevent that and save us a little cleanup and headache in the future.

Enjoy!

Wednesday, August 8, 2012

More issa fun

The issa connnector software for Evergreen ILS and the URSA ILL package got another update today.

This update adds holds cancellation and removal of transits from Evergreen when URSA deletes a copy as a result of the copy being returned. This is just more "making sure" in the case that staff don't follow the proper work flow when returning ILL items.

If you have been using issa and have had questions from staff about hanging transits or holds with VC items, then you might want to update.

issa is also available on the MVLC public git repository.

Tuesday, July 17, 2012

Updated issa

I made an update to the issa software yesterday so that it looks to see if a copy is checked out before deleting it when the copy is returned to the virtual catalog. If a copy to be deleted is checked out, it is checked in before being deleted.

The above change should cut back on the number of tickets that we get about virtual catalog copies.

Wednesday, May 23, 2012

Added xact_finisher.pl to MVLC's Evergreen Utilities

I just added a new script to MVLC's Evergreen Utilities repository.

xact_finisher.pl is a perl script to set the xact_finish on "hanging" circulations. There was a bug that did not set xact_finish properly on circulations when a copy was lost and/or billings were modified. This script is intended to fix these by setting the xact_finish on their circulations.

It takes a patron and a copy barcode as arguments and sets the xact_finish on "open" circulations on that combination.

It currently does not do any safety checks. It assumes that the user knows what they are doing.

MVLC's Evergreen Utilities

One of the initial reasons for setting up this blog was to share with the community many of the extra scripts and tools that MVLC has developed for working with Evergreen. Most of these are publicly available on our git server in various repositories and branches. This post is going to be about the current state of our evergreen_utilities repository.

This repository contains some handy utility scripts for use with Evergreen. They are written against master and kept relatively up to date with changes there. They may work with earlier releases of Evergreen, but we make no guarantees. In many cases they will need extensive modifications to work with anything but the most recent releases or master.

SMP: What is it and why do you care?

SMP stands for Symmetric Multiprocessing, a feature of most modern multi-core and multi-processor computers that allows multiple CPUs to work together in running a single properly-coded application. This can allow an application to work faster and more efficiently, such as by allowing it to process one piece of data while it is receiving the next.

Most programs don't need this functionality. Evergreen, for example, is split into a large number of smaller programs. The entire system can thus be spread not only across multiple CPUs, but across a large number of servers.

Where SMP comes in handy for Evergreen is in the communication between all of these smaller programs. They all communicate with each other over XMPP, which is currently handled by ejabberd.

Testing, 1, 2, 3!

The discussions in IRC this morning revolve around release notes and the new XulRunner branch. I’m chiming in to talk about the latter.

This branch needs testing and lots of it. We here at MVLC have been testing it on our development/test servers, even with production data, but we’ve not used it in production nor put it through all of its paces. Specifically, we developers don’t spend a lot of time in cataloging, acquisitions, serials, etc. We’ve kicked the tires, and basic circulation still works. Some of the more advanced features have gotten a workout, but nothing systematic so far.

So, what’s in it for me, you ask? Why should I care about a “new XulRunner branch” and what does that mean for Evergreen?

Glad you asked. The new XulRunner branch updates the compatibility of Evergreen with newer versions of XulRunner, the underlying technology used to create the Evergreen staff client. The staff client is currently only compatible with older, out-of-date versions of XulRunner (1.9.x and 3.6.x). These old versions have some known problems that are fixed in later versions. The new XulRunner branch actually makes the Evergreen staff client compatible with later versions and still maintains compatibility with the older versions of XulRunner. (There is a catch to that compatibility, however. You can’t use different XulRunner versions with the same server because of changes in the OPAC pages. You have to use all the same XulRunner version with all of your clients.)

So, what does using a newer version of XulRunner get you? For starters, the community gets a greater longevity out of the current staff client. This lowers the pressure on the developers to come up with something new. End users will notice an improvement in performance. We’ve not actually measured the difference that using a newer XulRunner makes, but screen refreshes were noticeably faster when using XulRunner 11.0 compared to 1.9. Additionally, the client should freeze up less and use less RAM. Newer XulRunner versions may open possibilities for easier development of new features if we can take advantage of the technologies being added to XulRunner. The list goes on.

I know, I had you at “noticeably faster,” and now you want to know how to install it and try it out. Well, I’m now ready to tell you how, at least if you have a Debian or Ubuntu test server set up. Oh, and don’t worry. If you’ve installed Evergreen from a tarball (.tar.gz file), then you should be able to follow these instructions.

ACHTUNG! MINEN!

DO NOT attempt the following on your production server. You will have a lot of very unhappy users if you do. I assume you have a test server set up with its own test database where you can try the following steps. I also only outline the basic steps needed to install and test the new XulRunner branch. I skip over things like running upgrade scripts on the database if necessary.

CAVEAT LECTOR.

Before you do anything else, you’ll want to make sure that you have some essentials in place. Most of what you need should be available from the last time that you installed Evergreen, but some needed packages may not necessarily be there. To make sure that you have everything, run the following commands as the root user or via sudo:

apt-get install build-essential 
apt-get install git-core 
apt-get install zip 
apt-get install nsis

The above will ensure that you have the GNU autotools package installed as well as any other required modules for building programs from source code that may be missing on a default installation. (Not likely that you’ll be missing any, but it is always good to make sure they are all there.) It also installs the git program that you will need to fetch the latest Evergreen source code, and makes sure that the zip and nsis packages, needed by the client installer, are present.

Next, you’ll want to clone the public repositories onto your server. I usually do this as the opensrf user in the opensrf user’s home directory:

git clone git://git.evergreen-ils.org/Evergreen.git 
cd Evergreen 
git remote add working git://git.evergreen-ils.org/working/Evergreen.git 
git fetch working

The above will clone the Evergreen repository, checkout the master development branch of Evergreen, and make the working repository available to you. You will also have read-only access to the repositories, so there is no worry about messing something up. If you do, you can simply delete your local directory and follow the above steps again, or if you know some more about git, you can fix it yourself. Using these commands, there is absolutely no danger of doing harm to the public repositories, so feel free to experiment.

Now, you’re ready to merge the new XulRunner branch’s code into master. Let’s make a new branch to do the merge in so we don’t mess up the copy of master that was checked out for us automatically:

git checkout -b new_xulrunner origin/master

The above command literally tells git to make a new branch named
“new_xulrunner” that is based on the master branch of the origin
repository and to make that new branch active, which means that you are
now using that new branch. Origin is typically the repository that
you cloned when creating your own local copy, but it can be changed
with the proper commands.

Now, we can merge the new XulRunner branch from the working
repository into our new branch:

git merge working/collab/tsbere/new_xulrunner

Hopefully, that merges cleanly without reporting any conflicts. If that is the case, then you can proceed to the installation of Evergreen according to the instructions in the README file. You will need to run autoreconf -i before you can run configure.

After you’ve done the make install and changed ownership of the /openils directory but before you restart Apache, you will want to build the actual staff client with the following commands, assuming that you are still in your Evergreen clone directory:

cd Open-ILS/xul/staff_client 
make rigbeta 
make rebuild devbuild 
make updates-client

If you don’t do the above as root or with sudo, then it may fail. I always like to change the ownership on my source directory after doing make install when I also change ownership of the /openils directory:

sudo chown -R opensrf:opensrf ./ /openils/

If you do run the commands as root, then you’ll likely need to run
the above chown command again.

Assuming that everything has gone as expected, you can now start your
OpenSRF services and restart Apache. In addition to having a working
Evergreen installation with the new XulRunner branch installed, you
should also have a new URL available on your server where you can
download the new XulRunner client:

http://YOUR_SERVER_NAME/updates/manualupdate.html

You can go there, download the new client, and you should be able to log in to your test server with the new client as usual.

Once you’ve logged in, you should test the functions that you care the most about. You’ll be relieved when you add a bibliographic record, create a new copy, and can circulate it to a patron that you’ve also just added. If you have some existing data and you’ve run the proper upgrade scripts (beyond the scope of this document), then try the new client out with some of your existing work flows. Be sure to report any problems you encounter by adding comments on the Launchpad Bug or by sending an email to the developers' list.

Monday, May 14, 2012

git.mvlcstaff.org

It is, of course, no secret that Merrimack Valley Library Consortium has made quite a few enhancements and additions to Evergreen-ILS available to the community by including them in the Evergreen core. My first series of posts is intended to introduce some of our lesser known offerings to the community. Many of these have not been publicized or perhaps just mentioned in IRC or an email here and there.

Let us start out by mentioning that MVLC runs a public git mirror of Evergreen at http://git.mvlcstaff.org/. Here, you will not only find copies of the public Evergreen git repositories, but you’ll also find some of MVLC’s development branches, including aborted experiments. Thomas Berezansky and Jason Stephenson often put their work in progress code on one of the repositories here before putting the code on the public working repository for review.

In addition to various Evergreen specific repositories and branches, you will find mirrors of the OpenSRF and SIPServer code repositories. Thomas and Jason also have “personal” repositories for OpenSRF and SIPServer where they do their development work before pushing things that are ready for review to the working repositories at git.evergreen-ils.org.

The above might be of interest if you want to follow the latest in bleeding edge development from Merrimack Valley Library Consortium. You should probably avoid Thomas’s and Jason’s named repositories if you are looking for production-ready code.

That said, feel free to use Evergreen/ILS.git, Evergreen/OpenSRF.git, and Evergreen/SIPServer.git as mirrors of the main repositories available at git.evergreen-ils.org. These repositories are updated automatically within a few minutes of something new being added to the main, community repositories.

As you browse git.mvlcstaff.org you will encounter some other interesting sounding repositories and branches. I plan to provide detailed coverage of each of these in future posts.

MVLC