Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/dartmouth-dltg/aspace_sitemap

ArchivesSpace plugin to create a sitemap for the PUI
https://github.com/dartmouth-dltg/aspace_sitemap

Last synced: 22 days ago
JSON representation

ArchivesSpace plugin to create a sitemap for the PUI

Lists

README

        

# ArchivesSpace Sitemap Generation for the PUI

## Getting started

Download and unpack the latest release of the plugin into your
ArchivesSpace plugins directory:

```
$ curl ...
$ cd /path/to/archivesspace/plugins
$ unzip ...
```

Add the plugin name to the list of enabled plugins in `config/config.rb`:

```
AppConfig[:plugins] = ['some_plugin','aspace_sitemap']
```
### Note
For users running ArchivesSpace versions older than v2.6.0, please note that slug related options are not available.

Institutions with large numbers of published objects may need to increase the memory alloted to the application. See http://archivesspace.github.io/archivesspace/user/tuning-archivesspace/

The sitemap generation relies on the SOLR index for some checks related to unpublished ancestors, so the sitemap generation should only be run after the indexer
completes the first full index round.

## What does it do?
The plugin adds a new job that generates a sitemap (at least one sitemap with a sitemap index)
for the PUI. The file(s) can be downloaded and placed on a server of your choice for submission
to the search engine(s) of choice and saved to the local filesystem to be served out at `{pui_host}/sitemap-index.xml`.
There are two configuration options.

## Configuration

Configure the plugin by editing your `config.rb` file with the
following entries - modified as appropriate. If you are submitting the sitemap via the tools provided by Google or Bing, you will need to set the following.

1) Google requires verification that you own the site.
One way is by a verification meta tag.
```
# set the meta tag from Google to verify site ownership
AppConfig[:google_verification_meta_tag] = "your_verification_meta_tag"
```
2) Bing also requires verification that you own the site.
One way is by a verification meta tag.
```
# set the meta tag from Bing to verify site ownership
AppConfig[:bing_verification_meta_tag] = "your_verification_meta_tag"
```
## How to Use
For users with access to `Background Jobs`, there is a new entry in the `Create Jobs` menu called `ArchivesSpace PUI Sitemap` Once selected, the job asks for several inputs

1. What types of objects to include in the sitemap. At least one is required.
2. The update frequency. For most institutions, yearly is probably fine.
3. Use human readable slugs. Slugs generated by the user or the application will be used in the `` field if they are available. (v2.6.0+)
4. Write to local filesystem. Sitemaps will be written to a static space and to the root of the PUI webspace. The generated sitemaps are stored in `AppConfig[:data_directory]/pui_sitemaps`
and placed at the root of the site ie: `{pui_host}/sitemap-index.xml` It also updates the robots.txt file in the PUI to include the sitemap entry. Any existing sitemaps are copied to the
PUI webroot on startup and the robots.txt file is updated on startup if there are existing sitemap files. Uncheck this option and fill in the sitemap index base url entry (below) if you want to host the sitemaps on an external server.
5. The sitemap index base url. This is the location where you will be hosting the sitemaps. It is ignored if write to filesystem (above) is selected.
6. The limit on the number of entries per sitemap file. You should be able to leave this at the default of 50000.

## Notes
1. The 'priority' attribute is not used in the sitemap since there is no mechanism in place to mark
objects in the staff interface. Given the large number of objects that are typically
published, it seems unlikely that 'priority' would be widely used. Google has also indicated that the priority attribute is not used by their algorithm.
2. The option to use slug/human readable urls is somewhat risky, since these slugs are based on **changeable** metadata.

Joshua Shaw ()
Digital Library Technologies Group
Dartmouth College Library