An open API service indexing awesome lists of open source software.

https://github.com/mlibrary/umich_solr_library_filters

Turn library-type data into something more useful in solr
https://github.com/mlibrary/umich_solr_library_filters

Last synced: 10 months ago
JSON representation

Turn library-type data into something more useful in solr

Awesome Lists containing this project

README

          

# umich_solr_library_filters

This package contains a number of solr analysis filters to transform library
data at index and/or query time in a solr index.

By putting transformations
into your solr install, you can guarantee that the same changes happen at both index and query time, so the same input will
always match. Otherwise, you're stuck trying to make sure you're using identical normailzation routines in your indexing code
and your search-collecting application code, which can be problematic.

For an example of a configuration that uses these, check out
[my solr6 config](https://github.com/billdueber/solr6_test_conf).

## Using and/or Building

To use the .jar in [releases](https://github.com/mlibrary/umich_solr_library_filters/releases), just stick it
into a place where solr will find it, and
create appropriate `fieldType` definitions, as shown below.

The jarfile provided is built on/against solr 6.x. To build it youself, simply check out this repo inside of the `solr/contrib/` directory and run (from `solr/`) `ant dist-contrib`. You'll still need to copy the jarfile to wherever you need it.

## LCCN Normalizer

The `LCCNNormalizer` will, as you might expect, normalize LCCNs (typically
found in MARC field `010`) according to the
[algorithm on the Library of Congress site](http://www.loc.gov/marc/lccn-namespace.html#syntax).

This filter presumes that whatever you're sending it is supposed to be an LCCN;
if you give it something else, what comes out the other end may or may not make
any sense. (This is primarily because it's hard to just look at a string and
determine if it's supposed to be an LCCN).

```xml







```

## ISBN Normalizer

The `ISBNNormalizer` will take a token, attempt to extract something that
looks like a valid ISBN out of it, turn it into an ISBN-13 if need be, and
index the resulting 13-digit string.

Anything that doesn't seem to contain an ISBN will not index anything at all,
so if you want to be able to look for random strings that come out of your
`020` fields, you'll need to use an additional field for that.

Note how we limit to strings of exactly length 13, since that's all we should get.

```xml








```

## LC Shelf Key

This is a simple wrapper around the [solrmarc code for generating a sortable
shelfkey from an LC Classification Number](https://code.google.com/p/solrmarc/source/browse/trunk/lib/solrmarc/src/org/solrmarc/callnum/LCCallNumber.java). It's useful as a sort key and (at least a bit) for string matching, although that last part is notoriously iffy.

You can see the original solrmarc-centric examples in the code; I'm just wrapping it up as a filter.

```xml





```