{"id":13819954,"url":"https://github.com/sul-dlss/pre-assembly","last_synced_at":"2025-06-24T08:03:52.763Z","repository":{"id":19284870,"uuid":"22521770","full_name":"sul-dlss/pre-assembly","owner":"sul-dlss","description":"Rails app - prepares objects for assembly workflow and allows discovery report","archived":false,"fork":false,"pushed_at":"2025-06-16T20:56:51.000Z","size":9841,"stargazers_count":1,"open_issues_count":9,"forks_count":2,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-06-16T21:43:15.028Z","etag":null,"topics":["application","infrastructure","rails-ui"],"latest_commit_sha":null,"homepage":"https://consul.stanford.edu/display/chimera/Automated+Accessioning+and+Object+Remediation+%28pre-assembly+and+assembly%29","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sul-dlss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2014-08-01T18:07:56.000Z","updated_at":"2025-06-16T20:53:59.000Z","dependencies_parsed_at":"2023-12-18T18:53:35.745Z","dependency_job_id":"b0c4643e-564b-4d4f-968d-aef0ddb69f23","html_url":"https://github.com/sul-dlss/pre-assembly","commit_stats":null,"previous_names":[],"tags_count":222,"template":false,"template_full_name":null,"purl":"pkg:github/sul-dlss/pre-assembly","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sul-dlss%2Fpre-assembly","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sul-dlss%2Fpre-assembly/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sul-dlss%2Fpre-assembly/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sul-dlss%2Fpre-assembly/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sul-dlss","download_url":"https://codeload.github.com/sul-dlss/pre-assembly/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sul-dlss%2Fpre-assembly/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261632031,"owners_count":23187268,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["application","infrastructure","rails-ui"],"created_at":"2024-08-04T08:00:55.404Z","updated_at":"2025-06-24T08:03:52.749Z","avatar_url":"https://github.com/sul-dlss.png","language":"Ruby","funding_links":[],"categories":["Happy Exploring 🤘"],"sub_categories":[],"readme":"# Pre-Assembly\n\n[![CircleCI](https://circleci.com/gh/sul-dlss/pre-assembly/tree/main.svg?style=svg)](https://circleci.com/gh/sul-dlss/pre-assembly/tree/main)\n[![codecov](https://codecov.io/github/sul-dlss/pre-assembly/graph/badge.svg?token=8DsPm8gAGZ)](https://codecov.io/github/sul-dlss/pre-assembly)\n[![GitHub version](https://badge.fury.io/gh/sul-dlss%2Fpre-assembly.svg)](https://badge.fury.io/gh/sul-dlss%2Fpre-assembly)\n\nThis is a Ruby implementation of services needed to prepare objects to be\nassembled and then accessioned into the SUL digital library.\n\n## Basics\n\n`Pre-Assembly` is a Rails web-app at https://sul-preassembly-prod.stanford.edu/. There is a link in the upper right to \"Usage Instructions\" which goes to the github wiki pages: https://github.com/sul-dlss/pre-assembly/wiki.\n\n## Deployment\n\nDeploy the Web app version in the usual capistrano manner:\n\n```bash\ncap stage deploy\ncap prod deploy\n```\n\nSee the `Capfile` for more info.\n\n## Setting up code for local development\n\nClone project:\n\n```bash\ngit clone git@github.com:sul-dlss/pre-assembly.git\ncd pre-assembly\n```\n\n## Prerequisites\n\n### Get needed gems\n\n```bash\nbundle install\n```\n\n## Development/Test\n\nThe pre-assembly app requires redis and postgres for local development and testing. In order to run the tests or run the\nwebapp locally, you will need to have start these dependencies via `docker compose`:\n\n```bash\ndocker compose up -d\n```\n\n### Prepare for testing\n\n```bash\n# Makes sure the DB is ready, assets are built, and javascript dependencies are installed\nbin/rake db:prepare test:prepare\n```\n\n### Install exiftool\n\nYou need `exiftool` on your system in order to successfully run all of the tests.\n\nOn RHEL, download latest version from: http://www.sno.phy.queensu.ca/~phil/exiftool\n\n```bash\ntar -xf Image-ExifTool-#.##.tar.gz\ncd Image-ExifTool-#.##\nperl Makefile.PL\nmake test\nsudo make install\n```\n\nOn MacOSX, use `homebrew` to install:\n\n```bash\nbrew install exiftool\n```\n\n### Running tests\n\n```bash\ndocker compose up\nbundle exec rspec\n```\n\n## Local development\n\nJust the usual:\n\n```bash\nbin/dev\n```\n\nWhen running the application in development mode, it will use a default sunet_id (`'tmctesterson'`) for\nits sessions. To override that behavior and specify an alternate user, you can manually specify the `REMOTE_USER`\nenvironment variable at startup, like so:\n\n```bash\nREMOTE_USER=ima_user bin/dev\nrdbg -A # run in separate terminal window if you want a seperate debugger window\n```\n\nBecause the application looks for user info in an environment variable, and because local dev environments don't have\nan Apache module setting that environment variable per request based on headers from Webauth/Shibboleth, dev just always\nsets a single value in that env var at start time. So laptop dev instances basically only allow one fake login at a time.\n\n### Globus client gem\n\nThe Globus client gem needs to be configured for it work in stage/qa during development.  You will need the client_id/secrets/config from vault for the pre-assembly application, and then add them to your `config/settings.local.yml`, matching the Globus config setup shown in `config/settings.yml`.\n\n## Post Accessioning Reports\n\nUse [Argo](https://argo.stanford.edu/).\n\n## Manifests\n\nManifests are a way of indicating which objects you will be accessioning. A\nmanifest file is a CSV, UTF-8 encoded file and works for projects which have\none file per object (where container = one file), or projects with many\nfiles per object (where container = folder).\n\n**WARNING**: if you export from Microsoft Excel, you may not get a properly\nformatted UTF-8 CSV file. You should open any CSV that has been exported from\nExcel in a text editor and re-save it in proper UTF-8 files (e.g. Atom, Sublime,\nor TextMate).\n\nThere are a few columns in the manifest, with two required:\n\n- `container`: container name (either filename or folder name) -- **required**\n- `druid`: druid of object -- **required**\n- `sourceid`: source ID\n- `label`: label\n\nThe druids should include the \"druid:\" prefix (e.g. \"druid:oo000oo0001\" instead of \"oo000oo0001\").\n\nThe first line of the manifest is a header and specifies the column names.\nColumn names should not have spaces and it is easiest if they are all lower\ncase. These columns are used to indicate which file goes\nwith the object. If the container column specifies a filename, it should be\nrelative to the manifest file itself. You can have additional columns in your\nmanifest which can be used to create descriptive metadata for each object. See\nthe section below for more details on how this works.\n\nThe druid column **must** be called `\"druid\"`.\n\nSee an example manifest file [`manifest.csv`](spec/fixtures/multimedia/manifest.csv).\n\nNote that there is a second (optional) type of file manifest which is used to further describe\nthe structure of each individual object, such as the exact files to be included.  This is\nonly required in advanced cases where you need to provide additional metadata about each\nfile in the object.  For more information about the file manifest, see https://github.com/sul-dlss/pre-assembly/wiki/Accessioning-images-with-captions-(labels)\n\n## Accession of Specific Objects\n\nUsing a manifest:\n\n1.  Create a new manifest with only the objects you need accessioned.\n2.  Create a new project config YAML file referencing the new manifest and\n    write to a new progress log file.\n3.  Run pre-assembly.\n\n## Preparing maps content\n\nUsed to stage content from Rumsey or other similar format to folder structure ready for accessioning.\nThis script is only known to be used by the Maps Accessioning team (Rumsey Map Center)\nFull documentation of how it is used is here (which needs to be updated if this script moves):\nhttps://consul.stanford.edu/pages/viewpage.action?pageId=146704638\n\nIterate through each row in the supplied CSV manifest, find files, generate contentMetadata and symlink to new location.\nNote: filenames must match exactly (no leading 0s) but can be in any sub-folder\n\nRun with:\n\n```\nRAILS_ENV=production bin/prepare_content INPUT_CSV_FILE.csv FULL_PATH_TO_CONTENT FULL_PATH_TO_STAGING_AREA [--no-object-folders] [--report] [--content-metadata] [--content-metadata-style map]\n```\n\ne.g.:\n\n```\nRAILS_ENV=production bin/prepare_content.rb /maps/ThirdParty/Rumsey/Rumsey_Batch1.csv /maps/ThirdParty/Rumsey/content /maps/ThirdParty/Rumsey [--no-object-folders] [--report] [--content-metadata] [--content-metadata-style map]\n```\n\nThe first parameter is the input CSV (with columns labeled \"Object\", \"Image\", and \"Label\" (image is the filename, object is the object identifier which can be turned into a folder)\nsecond parameter is the full path to the content folder that will be searched (i.e. the base content folder)\nNote: files will be searched iteratively through all sub-folders of the base content folder\nthird parameter is optional and is the full path to a folder to stage (i.e. symlink) content to - if not provided, will use same path as csv file, and append \"staging\"\n\nif you set the --report switch, it will only produce the output report, it will not symlink any files\nif you set the --content-metadata switch, it will only generate content metadata for each object using the log file for successfully found files, assuming you also have columns in your input CSV labeled \"Druid\", \"Sequence\" and \"Label\"\nif you set the --no-object-folders switch, then all symlinks will be flat in the staging directory (i.e. no object level folders) -- this requires all filenames to be unique across objects, if left off, then object folders will be created to store symlinks\nnote that file extensions do not matter when matching\n\n## Data [Model](Model)\n\nPre-Assembly has a fairly simple [data model](db/schema.rb) based on three types of objects:\n\n* *User*: a SUL staff person who is able to log in, based on configuration in [Puppet](https://github.com/sul-dlss/puppet/blob/production/hieradata/node/sul-preassembly-prod.stanford.edu.eyaml)\n* *BatchContext*: includes details about a particular type of batch load, including where the data lives, who created it, the type of batch load, etc. This is also known as a \"Project\" in the user interface.\n* *JobRun*: represents a specific batch load run using information from the BatchContext. These jobs are picked up by an asynchronous Sidekiq job when requested by the user. The job can be a full run, which submits the data to the [dor-services-app](https://github.com/sul-dlss/dor-services-app) API, or simply a \"discovery report\" which checks that the data and configuration look correct.\n* *GlobusDestination*: represents a user and created_at timestamp for creating a Globus directory that will be associated with a BatchContext.\n\nA *User* can have multiple *BatchContext*s and a *BatchContext* can have multiple *JobRun*s. When a user chooses to run or rerun a job in the user interface a new *JobRun* is created using the same *BatchContext*.\n\n## Reset Process (for QA/Stage)\n\n### Steps\n\n1. [Reset the database](https://github.com/sul-dlss/DeveloperPlaybook/blob/main/best-practices/db_reset.md)\n2. Delete the staging directory: `rm -fr /dor/assembly/*`\n3. Delete the job artifacts output directory: `rm -fr /dor/preassembly/*`\n4. To test, run the `preassembly_*_spec.rb` integration tests.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsul-dlss%2Fpre-assembly","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsul-dlss%2Fpre-assembly","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsul-dlss%2Fpre-assembly/lists"}