Monday, December 15, 2008

Authority Control?

Here's a question from a non-librarian: how do we (could we) do authority-control-type-operations across a lot of different systems/services? Is there already an Authority Control Service application for this sort of thing? So that when we deposit something into MINDS@UW, the "Author" field can be something other than a free-text string, and can hook into the same authority control system that is used in all of the other places that need it?

I have this feeling that good authority control ties into FRBR and faceting in ways that could make things easier for us in the future.


Dorothea said...

There is no such service -- at least, not one robust enough to cover all our use cases.

Sending you an article under separate cover. :)

Edie Dixon said...

I don't know much about services, but I do know that librarians take great pains in doing authority control. We use thesauri like the Library of Congress subject headings, and the Medical Library subject headings and the Library of Congress name and series headings. And we have other thesauri, too, for special subjects. These thesauri are, really, hierarchical lists, and authority control is about matching on those lists. So, since we have these kinds of lists available to us, couldn't we use them for a service -- and not just for books or printed things -- why couldn't we use those same common, well-known lists for objects?

Mike said...

Edie, are these thesauri in hardcopy or digital form? I guess what I'm envisioning is a general service that stores all of those hierarchical lists, and maybe allows for the creations of new ones, or annotations on old ones, or something like that; you could then call out to that service wherever you're writing interface code, so drop-down lists (or whatever, I'm old) on the MINDS@UW submission forms could match similar drop-down lists in a cataloging client. You could even envision a peered environment, where different institutions each maintain their own customized versions of the lists, with each institution able to easily query out to someone else to essentially answer the question, "ah, I don't know what I should use for thie authority-controlled field, I wonder what other folks are doing." If that makes any kind of sense at all -- not a cataloger, not even a librarian, just a fanboy.

Edie Dixon said...

the LC names, subjects and the MeSH stuff -- they are all available as files, accessible, I think (Mark would know) to anyone who wants them. We have subsets that we get every month for MadCat. And there are other thesauri available that we could use for these services -- specialized language constucts, if you will. We'd probably want to decide which ones are most appropriate (a person's name might be one serice, a subject a different one) for whatever interface we're writing -- a search form, for example, or an institutional database of objects about teaching.

Dorothea said...

The problem, Edie, is that much library authority control is predicated on incomplete sources. Catalogs, for example, don't put someone under authority control until they've authored a book. For my purposes, I need to control names of people who have never written a book, and likely will never.

Steve said...

I think there is a tendency to assume that authority control will solve problems that it was not intended to solve. Take the extreme case examples:

* Dorothea has pointed out on numerous occasions that you will be hard pressed to find users that actually employ the language of LC Subject Headings
* I like to point out that natural language has its own curious failures, such as the popularity of "me" as a common tag in the open Web:

Looking to authority control only makes sense when you have a problem domain put in context and clearly scoped. Unfortunately we always seem to look to authority control to solve the problem of context and scope when in fact the reverse (predefining scope/context) is a prerequisite for authority control to work.

Dorothea said...

Let's make sure that we draw a line between SUBJECT authority control and NAME authority control. :) Speaking purely selfishly, in what I do I'm much more interested in the latter than the former.

I've put a postprint of the article I sent Mike in MINDS@UW, if anyone else is interested. Sorry about the clumsy handling of figures.

Steve said...

Dorothea, for the sake of argument: Why? ;)

Given that names in IT boil down to the same things as subject headings, strings of characters, they are subject to the same kinds of problems: disambiguation, standardization, etcetera.

Both still run into problems with context in order to reliably do their job: point to things in the real world, whether those things are people or branches of history.

Dorothea said...

Well, sure. I differentiate them because in my context, name authority control is a more tractable problem (which is not to say it's especially tractable!) that promises more immediately-useful results.

It may be worth pointing out that there's movement afoot to move authority control in BOTH contexts away from string-matching toward something more indirect and (let us hope) more useful. The thing is, the efforts are pretty much entirely in parallel -- there is no one effort for both name and subject authority control.

I talk about this in the name-authority-control context in my article, but the OCLC VIAF service, which gives people a URL independent of their name string, is as good an example as any.

The best example of this in action for subject authority control unfortunately just went away, but let us hope that the LoC brings it back in some form.

Mike said...

Another cut at the problem, based on the comments plus some whiteboard scribbling yesterday.