{"id":20635744,"url":"https://github.com/psu-libraries/psulib_traject","last_synced_at":"2026-02-15T22:02:55.737Z","repository":{"id":33566407,"uuid":"151454326","full_name":"psu-libraries/psulib_traject","owner":"psu-libraries","description":"Penn State University Libraries' Blacklight Catalog Traject Indexer","archived":false,"fork":false,"pushed_at":"2025-04-08T12:12:02.000Z","size":1471,"stargazers_count":2,"open_issues_count":15,"forks_count":1,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-08T13:24:55.637Z","etag":null,"topics":["jruby","libraries","marc","ruby","solr"],"latest_commit_sha":null,"homepage":"","language":"mIRC Script","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/psu-libraries.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-03T17:40:34.000Z","updated_at":"2025-03-05T19:02:27.000Z","dependencies_parsed_at":"2023-02-13T23:46:18.303Z","dependency_job_id":"d5c34689-38ff-4e01-bed2-a131841c08c7","html_url":"https://github.com/psu-libraries/psulib_traject","commit_stats":null,"previous_names":[],"tags_count":59,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/psu-libraries%2Fpsulib_traject","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/psu-libraries%2Fpsulib_traject/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/psu-libraries%2Fpsulib_traject/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/psu-libraries%2Fpsulib_traject/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/psu-libraries","download_url":"https://codeload.github.com/psu-libraries/psulib_traject/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249155757,"owners_count":21221654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["jruby","libraries","marc","ruby","solr"],"created_at":"2024-11-16T15:07:03.058Z","updated_at":"2026-02-15T22:02:55.732Z","avatar_url":"https://github.com/psu-libraries.png","language":"mIRC Script","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Maintainability](https://api.codeclimate.com/v1/badges/f877d0681e38deb0f3c8/maintainability)](https://codeclimate.com/github/psu-libraries/psulib_traject/maintainability)\n[![Test Coverage](https://api.codeclimate.com/v1/badges/f877d0681e38deb0f3c8/test_coverage)](https://codeclimate.com/github/psu-libraries/psulib_traject/test_coverage)\n\n# psulib_traject\n\n## Dependencies\n\n### Java\nTo run JRuby you will need java version 21 or higher.\n\n    $ java --version\n      openjdk version \"21.0.8\" 2025-07-15\n      OpenJDK Runtime Environment Homebrew (build 21.0.8)\n      OpenJDK 64-Bit Server VM Homebrew (build 21.0.8, mixed mode, sharing)\n\n### Ruby\nFollow these instructions to [install JRuby](https://github.com/psu-libraries/psulib_traject/wiki/Install-JRuby) if you\ndo not already have it.\n\n    $ ruby --version\n      jruby 10.0.2.0\n\n## Development Setup\n\n[Make sure you have ssh keys established on your machine](https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/#generating-a-new-ssh-key)\n\n[Make sure you have docker installed and running](https://docs.docker.com/install/)\n\nClone the application and install:\n\n    $ git clone git@github.com:psu-libraries/psulib_traject.git\n    $ cd psulib_traject\n    $ bundle install\n\n## Configuration\n\nFor local development, you can change the settings by adding configuration files. These will be ignored by git.\n\n### Solr\n\nCreate 2 files: `config/settings.local.yml` and `config/settings/test.local.yml` and add the following lines to each:\n    \n    solr:\n      url: http://localhost:8983/solr/\n      port: 8983\n\nChange the URL and port numbers if you want to use a different port.\nYou will also need to set your environment variables with the Solr username and password.\n\n### Traject\n\nWhen using jruby, traject will use multiple threads, but we want to tailor that to our system. In\n`config/settings.local.yml` add:\n\n    hathi_overlap_path: spec/fixtures/hathitrust/overlap.tsv\n    processing_thread_pool: 5\n   \n## Build an Index\n\nStart Solr via the Docker container\n    \n    $ bundle exec rake docker:up\n\nThis will download and configure Solr, if it's not already present, or if it is, start up the container again.\nIf you need to reconfigure Solr:\n\n    $ bundle exec rake docker:clean\n    $ bundle exec rake docker:conf\n    \nConvert marc records and import into Solr\n\n    $ bundle exec traject -c config/traject.rb solr/sample_data/sample_psucat.mrc \n      \n## Traject in debug mode\n\nFor testing purposes you can run traject with the `--debug-mode` flag to\ndisplay the output to the console (and not push the data to Solr).\n\n    $ bundle exec traject --debug-mode -c config/traject.rb solr/sample_data/sample_psucat.mrc\n\n## HathiTrust ETAS data\n\nHathiTrust access level can be recorded in `ht_access_ss`. It will expect to have an overlap report tsv from HathiTrust\nat `ConfigSettings.hathi_overlap_path`. This file should be the latest overlap report from HathiTrust.\n\nBecause the monthly overlap file lives in a restricted area that can only be accessed by signing in to Box at UMich, we\nwill need to manually set the overlap.tsv prior to indexing operations when there is a new overlap. This can be done by\n`scp`ing the file up to the location specified in `ConfigSettings.hathi_overlap_path`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpsu-libraries%2Fpsulib_traject","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpsu-libraries%2Fpsulib_traject","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpsu-libraries%2Fpsulib_traject/lists"}