Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/caltechlibrary/cait

Caltech Archives Integration Tools - A collection of tools utilities and services for integrating ArchivesSpace with other systems.
https://github.com/caltechlibrary/cait

Last synced: 22 days ago
JSON representation

Caltech Archives Integration Tools - A collection of tools utilities and services for integrating ArchivesSpace with other systems.

Lists

README

        

# cait

[cait](https://github.com/caltechlibrary/cait) is a set of utilities written in the [Go](http://golang.org) language that work with and augment the [ArchivesSpace](http://archivesspace.org) API.

+ cait - a command line utility for ArchivesSpace interaction (basic CRUD operations and export)
+ cait-genpages - a simple static page generator based on exported ArchivesSpace content
+ cait-indexpages - for indexing exported JSON structures with [Bleve](https://github.com/blevesearch/bleve)
+ cait-servepages - a web service providing public search services and content browsing

## Requirements

+ A working deployment of ArchivesSpace
+ Golang 1.8 or better to compile
+ Three 3rd party Go packages
+ [Bleve](https://github.com/blevesearch/bleve) by [Blevesearch](http://blevesearch.com), Apache License, Version 2.0
+ Caltech Library's Go packages
+ [cait](https://github.com/caltechlibrary/cait), Caltech Library's ArchivesSpace integration tools

## Compiling

If you already have [Go](https://golang.org) setup and installed compiling the utilities are pretty straight forward.

1. Clone the git repository for the project
2. "Go get" the 3rd party libraries
3. Compile
4. Setup the necessary environment variables for using the utilities

Here's a typical example of setting things up.

```
go get github.com/blevesearch/bleve/...
git clone [email protected]:caltechlibrary/cait.git
cd cait
mkdir $HOME/bin
export PATH=$HOME/bin:$PATH
go build -o $HOME/bin/cait cmds/cait/cait.go
go build -o $HOME/bin/cait-genpages cmds/cait-genpages/cait-genpages.go
go build -o $HOME/bin/cait-indexpages cmds/cait-indexpages/cait-indexpages.go
go build -o $HOME/bin/cait-servepages cmds/cait-servepages/cait-servepages.go
```

At this point you should have your command line utilities ready to go in the *bin* directory. You are now ready to setup your environment variables.

## Setting up your environment

The command line tools and services are configured via environment variables. Below is an example of setting things up under Bash running on your favorite Unix-like system.

```bash
#!/bin/bash
#
# setup.sh - this script sets the environment variables for cait project.
# You would source file before using cait, cait-indexpages, or cait-servepages.
#

#
# Local Development setup
#
export CAIT_API_URL=http://localhost:8089
export CAIT_USERNAME=admin
export CAIT_PASSWORD=admin
export CAIT_DATASET=dataset
export CAIT_SITE_URL=http://localhost:8501
export CAIT_HTDOCS=htdocs
export CAIT_BLEVE=htdocs.bleve
export CAIT_TEMPLATES=templates/default

```

One time setup, creat the directories matching your configuration.

```bash
#!/bin/bash
#
# Create the necessary directory structure
#
mkdir -p $CAIT_DATASET
mkdir -p $CAIT_HTDOCS
mkdir -p $CAIT_TEMPLATES
```

Assuming Bash and that you've named the file _cait.bash_ you could
source the file from your shell prompt by typing

```
. etc/cait.bash
```

### Setting up a dev box

I run ArchivesSpace in a vagrant box for development use. You can find details to set that up at [github.com/caltechlibrary/archivesspace_vagrant](https://github.com/caltechlibrary/archivesspace_vagrant). I usually run the [cait](https://github.com/caltechlibrary/cait) tools locally. You can see and example workflow in the document [EXPORT-IMPORT.md](EXPORT-IMPORT.md).

## Utilities

### _cait_

This command is a general purpose tool for fetch ArchivesSpace data from the
ArchivesSpace REST API, saving or modifying that data as well as querying the
locally capture output of the API.

Current _cait_ supports operations on repositories, subjects, agents, accessions and digital_objects.

These are the common actions that can be performed

+ create
+ list (individually or all ids)
+ update (can use a file instead of the command line, see -input option)
+ delete
+ export (useful with integrating into static websites or batch processing via scripts)

Here's an example session of using the _cait_ command line tool on the repository object.

```shell
. setup.sh # Source my setup file so I can get access to the API
cait repository create '{"uri":"/repositories/3","repo_code":"My Archive","name":"My Archive"}' # Create an archive called My Archive
cait repository list # show a list of archives, for example purposes we'll use archive ID of 3
cait repository list '{"uri":"/repositories/3"}' # Show only the archive JSON for repository ID equal to 3
cait repository list '{"uri":"/repositories/3"}' > repo2.json # Save the output to the file repo3.json
cait repository update -input repo3.json # Save your changes back to ArchivesSpace
cait repository export '{"uri":"/repositories/3"}' # export the repository metadata to data/repositories/3.json
cait repository delete '{"uri":"/repositories/3"}' # remove repository ID 3
```

This is the general pattern also used with subject, agent, accession, digital_object.

The _cait_ command uses the following environment variables

+ CAIT_API_URL, the URL to the ArchivesSpace API (e.g. http://localhost:8089 in v1.4.2)
+ CAIT_USERNAME, username to access the ArchivesSpace API
+ CAIT_PASSWORD, to access the ArchivesSpace API
+ CAIT_DATASET, the directory for exported content

### _cait-genpages_

This command generates static webpages from exported ArchivesSpace content.

It relies on the following environment variables

+ CAIT_DATASET, where you've exported your ArchivesSpace content
+ CAIT_HTDOCS, where you want to write your static pages
+ CAIT_TEMPLATES, the templates to use (this defaults to template/defaults but you probably want custom templates for your site)

The typical process would use _cait_ to export all your content and then run _cait-genpages_ to generate your website content.

```
cait archivesspace export # this takes a while
cait-genpages # this is faster
```

Assuming the default settings you'll see new webpages in your local *htdocs* directory.

### _cait-indexpages_

This command creates [bleve](http://blevesearch.com) indexes for use by _cait-servepages_.

Current _cait-indexpages_ operates on JSON content exported with _cait_. It expects
a specific directory structure with each individual JSON blob named after its
numeric ID and the extension .json. E.g. htdocs/repositories/2/accession/1.json would
correspond to accession id 1 for repository 2.

_cait-indexpages_ depends on four environment variables

+ CAIT_HTDOCS, the root directory where the JSON blobs and HTML files are saved
+ CAIT_BLEVE, the name of the Bleve index (created or maintained)

### _cait-servepages_

_cait-servepages_ provides both a static web server as well as web search service.

Current _cait-servepages_ uses the Bleve indexes created with _cait-indexpages_. It also
uses the search page and results templates defined in CAIT_TEMPLATES.

It uses the following environment variables

+ CAIT_HTDOCS, the htdoc root of the website
+ CAIT_BLEVE, the Bleve index to use to drive the search service
+ CAIT_TEMPLATES, templates for search service as well as browsable static pages
+ CAIT_SITE_URL, the url you want to run the search service on (e.g. http://localhost:8501)

Assuming the default setup, you could start the like

```
cait-servepages
```

Or you could add a startup script to /etc/init.d/ as appropriate.

## Setting up a production box

The basic production environment would export the contents of ArchivesSpace nightly, regenerate the webpages, re-index the webpages and finally restart _cait-servepages_ service.

The script in *scripts/nightly-update.sh* shows these steps based on the configuration in *etc/setup.sh*. This script is suitable for running form a cronjob under Linux/Unix/Mac OS X.