{"id":18818399,"url":"https://github.com/smithsonian/osprey","last_synced_at":"2025-04-13T23:32:44.490Z","repository":{"id":38972461,"uuid":"155895233","full_name":"Smithsonian/Osprey","owner":"Smithsonian","description":"Dashboard that displays the file validation results in mass digitization projects. Digitization Program Office, OCIO, Smithsonian.","archived":false,"fork":false,"pushed_at":"2025-03-13T11:33:19.000Z","size":12160,"stargazers_count":8,"open_issues_count":15,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-27T13:45:47.484Z","etag":null,"topics":["digitization","digitization-workflows","mass-digitization","museum-collections","python3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Smithsonian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-02T16:47:46.000Z","updated_at":"2025-03-13T11:33:23.000Z","dependencies_parsed_at":"2023-02-12T02:18:31.387Z","dependency_job_id":"9b34f855-2a17-44e0-a6c4-d469306d7adb","html_url":"https://github.com/Smithsonian/Osprey","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Smithsonian%2FOsprey","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Smithsonian%2FOsprey/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Smithsonian%2FOsprey/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Smithsonian%2FOsprey/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Smithsonian","download_url":"https://codeload.github.com/Smithsonian/Osprey/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248796904,"owners_count":21163050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["digitization","digitization-workflows","mass-digitization","museum-collections","python3"],"created_at":"2024-11-08T00:16:36.712Z","updated_at":"2025-04-13T23:32:40.521Z","avatar_url":"https://github.com/Smithsonian.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Osprey\r\n\r\nOsprey is a system that checks the images produced by vendors in mass\r\ndigitization projects by the Collections Digitization program of the\r\nDigitization Program Office, OCIO, Smithsonian.\r\n\r\n![DPO Logo](https://github.com/Smithsonian/Osprey/assets/2302171/fa136270-943d-47f3-8a86-2eb6660b2913)\r\n\r\nhttps://dpo.si.edu/\r\n\r\nThe system checks that the files pass a number of tests and displays\r\nthe results in a web dashboard. This allows the vendor, the\r\nproject manager, and the unit to monitor the progress and detect\r\nproblems early.\r\n\r\n## Osprey Dashboard\r\n\r\nThis repo hosts the code for the dashboard, which presents the progress in each project and highlights any issues in the files.\r\n\r\n\u003cimg width=\"1203\" alt=\"Main Dashboard\" src=\"https://github.com/user-attachments/assets/a8578e8d-f788-45cf-93f3-398b4e63bc5d\"\u003e\r\n\r\n\u003cimg width=\"1106\" alt=\"Example Project\" src=\"https://github.com/user-attachments/assets/4e201d5b-cee3-4d3b-b893-cb1a1fd3f946\"\u003e\r\n\r\n## File Checks\r\n\r\nThe [Osprey Worker](https://github.com/Smithsonian/Osprey_Worker/) runs in Linux and updates the dashboard via an API (see below). The Worker can be configured to run one or more of these checks:\r\n\r\n * unique_file - Unique file name in the project\r\n * raw_pair - There is a raw file paired in a subfolder (*e.g.* tifs and raws (.eip/.iiq) subfolders)\r\n * jhove - The file is a valid image according to [JHOVE](https://jhove.openpreservation.org/)\r\n * tifpages - The tif files don't contain an embedded thumbnail, or more than one image per file\r\n * magick - The file is a valid image according to [Imagemagick](https://imagemagick.org/)\r\n * tif_compression - The tif file is compressed using LZW to save disk space\r\n\r\nOther file checks can be added. Documentation to be added. \r\n\r\n## Setup\r\n\r\nThe app runs in Python using the Flask module and requires a MySQL database. Install and populate the database according to the instructions in [database/tables.sql](https://github.com/Smithsonian/Osprey_Misc/tree/main/database).\r\n\r\nTo install the required environment and modules to the default location (`/var/www/app`):\r\n\r\n```bash\r\nmkdir /var/www/app\r\ncd /var/www/app\r\npython3 -m venv venv\r\nsource venv/bin/activate\r\npython3 -m pip install --upgrade pip\r\npython3 -m pip install -r requirements.txt\r\n```\r\n\r\nThen, test the app by running the main file:\r\n\r\n```bash\r\n./app.py\r\n```\r\n\r\nor:\r\n\r\n```bash\r\npython3 app.py\r\n```\r\n\r\nwhich will start the service at `http://localhost:5000/`.\r\n\r\nUpdate permissions:\r\n\r\n```bash\r\ndeactivate\r\nsudo chown -R apache:apache /var/www/app\r\n```\r\n\r\nSetup apache2/httpd as described in the [web_server](web_server) folder\r\n\r\n## API\r\n\r\nThe application includes an API which requires a key sent using POST with `api_key`.\r\n\r\n### Python example\r\n\r\n```python\r\nimport requests\r\npayload = {'api_key': KEY}\r\nr = requests.post('{}/api/projects/{}'.format([API_URL], [PROJECT_ALIAS]), data=payload)\r\n```\r\n\r\n### API Routes\r\n\r\nThese routes are available:\r\n\r\n * `/api/`: Print available routes in JSON\r\n * `/api/projects/`: Get the list of projects in the system\r\n * `/api/projects/\u003cproject_alias\u003e`: Get the details of a project by specifying the project_alias\r\n    * `project_alias`: String alias of the project\r\n    * `project_id`: ID of the project (integer)\r\n    * `folders`: Folders in this project\r\n    * `project_unit`: SI Unit\r\n    * `project_type`: Production or Pilot\r\n    * `project_status`: Status of the project (*e.g.* ongoing, paused, completed)\r\n    * `project_area`: Discipline area of the project\r\n    * `project_description`: Description of the project, goals, and collection digitized\r\n    * `project_checks`: Checks that run for all files in the project\r\n    * `project_postprocessing`: Post-project steps tracked in the system\r\n    * `project_manager`: PM of the project\r\n    * `project_method`: Method used for digitization\r\n    * `project_start`: Date when the project started digitization\r\n    * `project_end`: Date when the digitization ended\r\n    * `project_stats`: Main stats of the project\r\n    * `reports`: Data reports in this project \r\n * `/api/folders/\u003cfolder_id\u003e`: Get the details of a folder and the list of files\r\n    * `folder`: Name of folder\r\n    * `folder_id`: ID of this folder (integer)\r\n    * `folder_date`: Date when the folder was created by the vendor\r\n    * `no_files`: Number of files in the folder\r\n    * `project_id`: ID of the project (integer)\r\n    * `project_alias`: String alias of the project\r\n    * `delivered_to_dams`: Status of the folder regarding delivery to the DAMS\r\n    * `qc_status`: QC status of the folder\r\n    * `files`: Files, including file_id, in this folder\r\n * `/api/files/\u003cfile_id\u003e`: Get the details of a file by its `file_id`\r\n    * `file_id`: ID of the file in the system (integer)\r\n    * `file_name`: Filename\r\n    * `dams_uan`: DAMS UAN\r\n    * `exif`: EXIF metadata\r\n    * `file_checks`: Checks of the files and results\r\n    * `file_postprocessing`: Steps tracking data steps of each file\r\n    * `folder_id`: ID of the folder containing the file\r\n    * `links`: Links to other systems related to this image\r\n    * `md5_hashes`: MD5 hashes of files related to this image, usually a TIF and a RAW\r\n    * `preview_image`: If not null, a link to an external rendering of the image\r\n * `/api/reports/\u003creport_id\u003e/`: Get the data from a project report\r\n\r\n## Components\r\n\r\nThe system has two related repos:\r\n\r\n * [Osprey Worker](https://github.com/Smithsonian/Osprey_Worker/) - Python tool that runs a series of checks on folders. Results are sent to the dashboard via an HTTP API to be saved to the database.\r\n * [Osprey Misc](https://github.com/Smithsonian/Osprey_Misc/) - Database and scripts.\r\n\r\n## License\r\n\r\nAvailable under the Apache License 2.0. Consult the [LICENSE](LICENSE) file for details.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmithsonian%2Fosprey","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmithsonian%2Fosprey","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmithsonian%2Fosprey/lists"}