https://github.com/mlibrary/umich_solr_library_filters
Turn library-type data into something more useful in solr
https://github.com/mlibrary/umich_solr_library_filters
Last synced: 10 months ago
JSON representation
Turn library-type data into something more useful in solr
- Host: GitHub
- URL: https://github.com/mlibrary/umich_solr_library_filters
- Owner: mlibrary
- Created: 2015-02-03T20:34:35.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2023-10-18T17:40:21.000Z (over 2 years ago)
- Last Synced: 2024-04-15T02:57:19.041Z (about 2 years ago)
- Language: Java
- Homepage:
- Size: 168 KB
- Stars: 8
- Watchers: 4
- Forks: 2
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
Awesome Lists containing this project
README
# umich_solr_library_filters
This package contains a number of solr analysis filters to transform library
data at index and/or query time in a solr index.
By putting transformations
into your solr install, you can guarantee that the same changes happen at both index and query time, so the same input will
always match. Otherwise, you're stuck trying to make sure you're using identical normailzation routines in your indexing code
and your search-collecting application code, which can be problematic.
For an example of a configuration that uses these, check out
[my solr6 config](https://github.com/billdueber/solr6_test_conf).
## Using and/or Building
To use the .jar in [releases](https://github.com/mlibrary/umich_solr_library_filters/releases), just stick it
into a place where solr will find it, and
create appropriate `fieldType` definitions, as shown below.
The jarfile provided is built on/against solr 6.x. To build it youself, simply check out this repo inside of the `solr/contrib/` directory and run (from `solr/`) `ant dist-contrib`. You'll still need to copy the jarfile to wherever you need it.
## LCCN Normalizer
The `LCCNNormalizer` will, as you might expect, normalize LCCNs (typically
found in MARC field `010`) according to the
[algorithm on the Library of Congress site](http://www.loc.gov/marc/lccn-namespace.html#syntax).
This filter presumes that whatever you're sending it is supposed to be an LCCN;
if you give it something else, what comes out the other end may or may not make
any sense. (This is primarily because it's hard to just look at a string and
determine if it's supposed to be an LCCN).
```xml
```
## ISBN Normalizer
The `ISBNNormalizer` will take a token, attempt to extract something that
looks like a valid ISBN out of it, turn it into an ISBN-13 if need be, and
index the resulting 13-digit string.
Anything that doesn't seem to contain an ISBN will not index anything at all,
so if you want to be able to look for random strings that come out of your
`020` fields, you'll need to use an additional field for that.
Note how we limit to strings of exactly length 13, since that's all we should get.
```xml
```
## LC Shelf Key
This is a simple wrapper around the [solrmarc code for generating a sortable
shelfkey from an LC Classification Number](https://code.google.com/p/solrmarc/source/browse/trunk/lib/solrmarc/src/org/solrmarc/callnum/LCCallNumber.java). It's useful as a sort key and (at least a bit) for string matching, although that last part is notoriously iffy.
You can see the original solrmarc-centric examples in the code; I'm just wrapping it up as a filter.
```xml
```