{"id":42334981,"url":"https://github.com/datacite/sashimi","last_synced_at":"2026-04-23T14:02:06.731Z","repository":{"id":30334521,"uuid":"106835541","full_name":"datacite/sashimi","owner":"datacite","description":"DataCite Usage Reports API","archived":false,"fork":false,"pushed_at":"2025-03-10T14:22:39.000Z","size":2345,"stargazers_count":6,"open_issues_count":12,"forks_count":2,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-09-11T10:20:24.654Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://api.datacite.org/reports","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datacite.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-10-13T14:35:40.000Z","updated_at":"2025-03-10T13:37:44.000Z","dependencies_parsed_at":"2024-04-15T09:04:42.158Z","dependency_job_id":null,"html_url":"https://github.com/datacite/sashimi","commit_stats":null,"previous_names":[],"tags_count":53,"template":false,"template_full_name":null,"purl":"pkg:github/datacite/sashimi","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacite%2Fsashimi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacite%2Fsashimi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacite%2Fsashimi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacite%2Fsashimi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datacite","download_url":"https://codeload.github.com/datacite/sashimi/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacite%2Fsashimi/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28814579,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T12:25:15.069Z","status":"ssl_error","status_checked_at":"2026-01-27T12:25:05.297Z","response_time":168,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-27T14:15:08.027Z","updated_at":"2026-01-27T14:15:08.585Z","avatar_url":"https://github.com/datacite.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DataCite Usage Reports API\n\n[![Build Status](https://travis-ci.org/datacite/sashimi.svg?branch=master)](https://travis-ci.org/datacite/sashimi) [![Docker Build Status](https://img.shields.io/docker/build/datacite/sashimi.svg)](https://hub.docker.com/r/datacite/sashimi/) [![Maintainability](https://api.codeclimate.com/v1/badges/a0d15834af2cdc24e22f/maintainability)](https://codeclimate.com/github/datacite/sashimi/maintainability) [![Test Coverage](https://api.codeclimate.com/v1/badges/a0d15834af2cdc24e22f/test_coverage)](https://codeclimate.com/github/datacite/sashimi/test_coverage)\n\nSashimi is an api-only application for Usage Reports in [SUSHI](https://www.niso.org/schemas/sushi) format. Sashimi expects SUSHI-formatted reports in ingestion and returns collections of SUSHI reports for consumption.\n\nThe application closely follows the [RESEARCH_DATA_SUSHI specification](https://app.swaggerhub.com/apis/COUNTER/researchdata-sushi_1_0_api/1.0.0#/).\n\n![](https://c1.staticflickr.com/1/21/31470457_3680ff198e_b.jpg)\n\n## Service Documentation\n\nDataCites provides an API for usage reports as a service. For a description about the API Rerefence and how-to guides, please visit:\n\n* [API Reference](https://support.datacite.org/v1.1/reference#usage-reports)\n* [How To Guide for Usage Reports API](https://support.datacite.org/docs/usage-reports-api-guide)\n\n\nThe rest of the README deals with techincal description and usage of the software in this repository.\n\n## Installation\n\nUsing Docker.\n\n```\ndocker run -p 8075:80 datacite/sashimi\n```\n\nYou can now point your browser to `http://localhost:8075` and use the application.\n\n\n## Development\n\nWe use Rspec for unit and acceptance testing:\n\n```\nbundle exec rspec\n```\n\nTo run the Rails console:\n\n```\nDISABLE_SPRING=true bundle exec rails console\n```\n\n## Technical Description\n\n### Resource components\n\nMajor resource components supported by the Data Usage API are:\n\n- reports\n- hearthbeat\n\nThese can be used alone like this\n\n| resource      | description                       |\n|:--------------|:----------------------------------|\n| `/reports`      | returns a list of all reports in the Hub\n| `/hearthbeat` | returns the service status |\n\n### Resource components and identifiers\n\nResource components can be used in conjunction with identifiers to retrieve the metadata for that report.\n\n| resource                    | description                       |\n|:----------------------------|:----------------------------------|\n| `/reports/{report-uid}`           |   returns metadata for the specified Report. The report UID is a UUID according to RFC 4122. |\n\n## Depositing Reports\n\nTo add a report, you need to send JSON content and your POST call should include `Content-Type: application/json` and `Accept: application/json` in the headers. Additionally you will need to include to JSON Web Token (JWT) for authenthication. For example:\n\n```shell\ncurl --header \"Content-Type: application/json; Accept: application/json\" -H \"X-Authorization: Bearer {YOUR_JWT}\" -X POST https://api.datacite.org/reports\n```\n\n```json\n{\n  \"report-header\": {\n    \"report-name\": \"dataset report\",\n    \"report-id\": \"dsr\",\n    \"release\": \"rd1\",\n    \"created\": \"2016-09-08t22:47:31z\",\n    \"created-by\": \"dataone\",\n\t\t\"reporting-period\":\n    {\n        \"begin-date\": \"2018-05-01\",\n        \"end-date\": \"2018-05-30\"\n    },\n    \"report-filters\": [\n      {\n        \"name\": \"begin-date\",\n        \"value\": \"2015-01\"\n      }\n    ],\n    \"report-attributes\": [\n      {\n        \"name\": \"exclude-monthly-details\",\n        \"value\": \"true\"\n      }\n    ],\n    \"exceptions\": [\n      {\n        \"code\": 3040,\n        \"severity\": \"warning\",\n        \"message\": \"partial data returned.\",\n        \"help-url\": \"string\",\n        \"data\": \"usage data has not been processed for all requested months.\"\n      }\n    ]\n  },\n  \"report-datasets\": [\n    {\n      \"dataset-title\": \"lake erie fish community data\",\n      \"dataset-id\": [\n        {\n          \"type\": \"doi\",\n          \"value\": \"0931-865\"\n        }\n      ],\n      \"dataset-contributors\": [\n        {\n          \"type\": \"name\",\n          \"value\": \"john smith\"\n        }\n      ],\n      \"dataset-dates\": [\n        {\n          \"type\": \"pub-date\",\n          \"value\": \"2002-01-15\"\n        }\n      ],\n      \"dataset-attributes\": [\n        {\n          \"type\": \"dataset-version\",\n          \"value\": \"vor\"\n        }\n      ],\n      \"platform\": \"dataone\",\n      \"publisher\": \"dataone\",\n      \"publisher-id\": [\n        {\n          \"type\": \"orcid\",\n          \"value\": \"1234-1234-1234-1234\"\n        }\n      ],\n      \"data-type\": \"dataset\",\n      \"yop\": \"2010\",\n      \"access-method\": \"regular\",\n      \"performance\": [\n        {\n          \"period\": {\n            \"begin-date\": \"2015-01-01\",\n            \"end-date\": \"2015-01-31\"\n          },\n          \"instance\": [\n            {\n              \"metric-type\": \"total-dataset-requests\",\n              \"count\": 21\n            }\n          ]\n        }\n      ]\n    }\n  ]\n}\n\n```\n\nAdditionally, you can use a PUT call whith a new report and providing your own 'report_id'. this will create a report provided the report follows the Sushi schema and the 'report_id' is a UUID.\n\n## Key and Values\n\nThe allowed and recommended characters for an URL safe naming of parameters are defined in the format spec. To also standardize parameters names, the following (more restrictive) rules are recommended:\n\n- Parameters names SHOULD start and end with the characters “a-z” (U+0061 to U+007A)\n- Parameters names SHOULD contain only the characters “a-z” (U+0061 to U+007A), “0-9” (U+0030 to U+0039), and the hyphen minus (U+002D HYPHEN-MINUS, “-“) as separator between multiple words.\n\n## Report Storage\n\nReports are stored in a S3 bucket using ActiveStorage. We are storing them rather than in MySQL because report can get rather big as mentioned in the [COUNTER documentation](https://groups.niso.org/workrooms/sushi/start/clients).\n\n## Register a large Usage Report\n\nUsage report can get very large and we use two approaches handle them. The first approach is compression and the second is subsetting. Large reports need to be divided and compressed. We have set up a top limit of 50,000 datasets per report.\n\nIn both cases, you need to add this exception in the report header:\n\n```json\n\"exceptions\": [{\n  \"code\": 69,\n  \"severity\": \"warning\",\n  \"message\": \"Report is compressed using gzip\",\n  \"help-url\": \"https://github.com/datacite/sashimi\",\n  \"data\": \"usage data needs to be uncompressed\"\n}]\n```\n\n### Sending compressed reports\n\n We suggest compressing any report that is larger than 10MB. Here it is a ruby example of report compression:\n\n```ruby\ndef compress file\n  report = File.read(file)\n  gzip = Zlib::GzipWriter.new(StringIO.new)\n  string = JSON.parse(report).to_json\n  gzip \u003c\u003c string\n  body = gzip.close.string\n  body\nend\n```\n\nWhen sending the compressed reports you need to send them using application/gzip as Content Type and gzip as Content Encoding. For example\n\n```ruby\nURI = 'https://api.datacite.org/reports'\n\ndef post_file file\n\n  headers = {\n    content_type: \"application/gzip\",\n    content_encoding: 'gzip',\n    accept: 'application/json'\n  }\n\n  body = compress(file)\n\n  request = Maremma.post(URI, data: body,\n    bearer: ENV['TOKEN'],\n    headers: headers,\n    timeout: 100)\nend\n```\n\nThe equivalent Curl call would be:\n\n```shell\n$ curl --header \"Content-Type: application/gzip; Content-Encoding: gzip\" -H \"X-Authorization: Bearer {YOUR-JSON-WEB-TOKEN}\" -X POST https://api.datacite.org/reports/ -d @usage-report-compressed\n```\n\n### Send Usage Report in subsets\n\nIn order to create a report with more of 50,000 records, just keep making POST requests with the same report-header. This will create subsets of the report. For example:\n\n```shell\nPOST /reports\nPOST /reports\nPOST /reports\n```\n\nTo update an existing compressed report make a PUT request followed with as many POST requests with the same report-header as you need. For example:\n\n```shell\nPUT /reports/{report-id}\nPOST /reports\nPOST /reports\n```\n\n### Metadata Validation\n\nThe validation of the metadata in the reports its a two-step process. The controller takes care of checking presence of fields. Then the Schema validation is performed before saving the report. We use json-schema validation for this.\n\n## Queries\n\nVery basic querying is supported and just for fields in the header of the reports. For more complex quering we suggest to use the DataCite Event Data Service.\n\n## Pagination\n\nPagination follows the JSOANPI specification.\n\nFollow along via [Github Issues](https://github.com/datacite/sashimi/issues).\n\n### Note on Patches/Pull Requests\n\n* Fork the project\n* Write tests for your new feature or a test that reproduces a bug\n* Implement your feature or make a bug fix\n* Do not mess with Rakefile, version or history\n* Commit, push and make a pull request. Bonus points for topical branches.\n\n## License\n\n**Sashimi** is released under the [MIT License](https://github.com/datacite/sashimi/blob/master/LICENSE).\n\nTurned off builds temporarily.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatacite%2Fsashimi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatacite%2Fsashimi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatacite%2Fsashimi/lists"}