{"id":42334930,"url":"https://github.com/datacite/shiba-inu","last_synced_at":"2026-01-27T14:15:03.530Z","repository":{"id":46729087,"uuid":"137197967","full_name":"datacite/shiba-inu","owner":"datacite","description":"Pipeline for DOI Resolution Logs procesing","archived":false,"fork":false,"pushed_at":"2023-03-28T02:19:11.000Z","size":262,"stargazers_count":7,"open_issues_count":1,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-09-11T10:23:14.103Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datacite.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-06-13T10:05:12.000Z","updated_at":"2023-12-23T15:01:41.000Z","dependencies_parsed_at":"2023-01-21T20:42:42.425Z","dependency_job_id":null,"html_url":"https://github.com/datacite/shiba-inu","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/datacite/shiba-inu","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacite%2Fshiba-inu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacite%2Fshiba-inu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacite%2Fshiba-inu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacite%2Fshiba-inu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datacite","download_url":"https://codeload.github.com/datacite/shiba-inu/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datacite%2Fshiba-inu/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28814576,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T12:25:15.069Z","status":"ssl_error","status_checked_at":"2026-01-27T12:25:05.297Z","response_time":168,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-27T14:15:02.126Z","updated_at":"2026-01-27T14:15:03.521Z","avatar_url":"https://github.com/datacite.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pipeline for DOI Resolution Logs processing\n\n[![Build Status](https://travis-ci.org/datacite/shiba-inu.svg?branch=master)](https://travis-ci.org/datacite/shiba-inu) \n[![Docker Build Status](https://img.shields.io/docker/build/datacite/shiba-inu.svg)]\n[![Test Coverage](https://api.codeclimate.com/v1/badges/107d556dafb28c85d261/test_coverage)](https://codeclimate.com/github/datacite/shiba-inu/test_coverage)\n[![Maintainability](https://api.codeclimate.com/v1/badges/107d556dafb28c85d261/maintainability)](https://codeclimate.com/github/datacite/shiba-inu/maintainability)\n\nShiba-Inu is pipeline for DOI Resolution Logs processing. The pipeline processes DOI resolution logs following the [Code of practice for research data usage metrics](https://doi.org/10.7287/peerj.preprints.26505v1). Its based in Logstash.\n\n\n![The Shiba Inu is the smallest of the six original and distinct spitz breeds of dog from Japan.](https://i.imgur.com/ueW0Leo.jpg)\n\n\n## Installation\n\nRequirements\n\n- A Elasticsearch instance\n- Single line logs with DOI names.\n\n\nOne can run the logs processor using Docker. you will need to set the following enviroment variables:\n\n```\nES_HOST=http://elasticsearch:9200\nES_INDEX=resolutions\nINPUT_DIR=/usr/share/logstash/tmp/DataCite-access.log-201805\nOUTPUT_DIR=/usr/share/logstash/tmp/output.json\nLOGSTASH_HOST = localhost:9600\n\nS3_MERGED_LOGS_BUCKET     = /usr/share/logstash/monthly_logs\nS3_RESOLUTION_LOGS_BUCKET = /usr/share/logstash/\nELASTIC_PASSWORD=changeme\nLOGS_TAG=[Resolution Logs]\n\nHUB_TOKEN=eyJhbGciOiJSUzI1NiJ9\nHUB_URL=https://api.test.datacite.org\n```\n\n\nand run the container like this:\n\n```\ndocker run -p 8090:9200 datacite/shiba-inu\n```\n\nAlternatively you can use docker-compose to use the log processor without an elasticsearch instace:\n\n\n```\ndocker-compose up\n```\n\n## Usage logs\n\nYour logs need to fulling a 2 of requerimentes:\n\n- The logs must be single line logs.\n- MUST include the following data:\n  - doi =\u003e DOI name\n  - occurred_at =\u003e timestamp (ISO8601)\n  - clientip =\u003e IP address (IPV4 or IPV6)\n  - user_agent =\u003e user agent\n\n\nYou will need to provide the configuration of your log lines following the grok filter documentation. You can enter the configuration in the file `/vendor/docker/log_configuration.tmpl`. \n\nFor example for logs file with the following style:\n\n```text\n46.229.168.146 HTTP:HDL \"2018-09-30 23:40:39.132Z\" 1 1 3ms 10.5277/ppmp1850 \"300:10.admin/codata\" \"\" \"Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8) Gecko/20051111 Firefox/1.5\"\n131.180.162.29 HTTP:HDL \"2018-09-30 23:40:42.731Z\" 1 1 71ms 10.4233/uuid:9798fb4a-9201-4efa-b324-3e50bbdc7ca5 \"300:10.admin/codata\" \"\" \"\"\n131.180.162.29 HTTP:HDL \"2018-09-30 23:40:44.846Z\" 1 100 111ms 10.4233/uuid:a92fc858-da92-4339-8f80-b608aaa09741 \"\" \"\" \"\"\n\n```\nOne would need the following configuration:\n\n```logstash\n\n\"^%{IP:clientip} (?\u003chandle\u003e(HTTP:HDL)) %{QS:occurred_at} %{INT:ld} %{INT:resp_code} (?\u003cms\u003e((.+ms))) %{DOI:doi} %{QS:server} %{QS:something} %{QS:user_agent}\"\n\n```\n\n## How to create reports\n\nThere are 3 basics steps to create a report.\n\n1. Copy your usage logs to `/usage_logs`\n2. Trigger the logs processing.\n3. Generate the report.\n\n\n### 1. Copying the usage logs\n\nThe logs processor is restricted to processes logs in a monthly basis and with individual files or ordered files. You would need to merge all your logs in a single file or rename them in order. Logs files must be places in `/usage_logs`.\n\n\n### 2. Trigger the logs processing\n\nThe logs processor will start working automatically once a new logs get to the logs folder.\n\n### 3. Generate the report.\n\nUsage reports can be generated locally, pushed and/or streamed to the MDC Hub. We can use the `kishu` client for logs processing to generate a report in any of these ways. To run the `kishu` client you need to be inside the logstash docker container. The kishu client does not need paramaters about the report that need be generate (i.e. month) as automatically will generate the report with whatever is in the logs processor pipeline.\n\n\n```shell \nsource /usr/local/rvm/scripts/rvm\nrvm user gemsets\n```\n\nTo generate a usage report in JSON format following the Code of Practice for Usage Metrics, you can use the following command. This will generate a usage report in the folder `/reports`.\n\n\n\n```shell\nbundle exec kishu sushi generate_report --created_by {YOUR DATACITE CLIENT ID}\n```\n\nTo generate and push a usage report in JSON format following the Code of Practice for Usage Metrics, you can use the following command. \n\n```shell\nbundle exec kishu sushi push_report --created_by {YOUR DATACITE CLIENT ID}\n```\n\nTo stream a usage report in JSON format following the Code of Practice for Usage Metrics, you can use the following command. This option should be only used with reports with more than 50,000 datasets or larger than 10MB. We compress all reports that are streammed to the the MDC Hub.\n\n```shell\nbundle exec kishu sushi stream --created_by {YOUR DATACITE CLIENT ID} --schema resolution --aggs_size 200 --report_size 90000\n```\n\nFurther information about parametrizing the streaming can be found in the [kishu](https://github.com/datacite/kishu) client.\n\n\n## Development\n\nWe use Rspec for unit and acceptance testing:\n\n```\nruby -S bundle exec rspec\n```\n\nFollow along via [Github Issues](https://github.com/datacite/shiba-inu/issues).\n\n### Note on Patches/Pull Requests\n\n* Fork the project\n* Write tests for your new feature or a test that reproduces a bug\n* Implement your feature or make a bug fix\n* Do not mess with Rakefile, version or history\n* Commit, push and make a pull request. Bonus points for topical branches.\n\n## License\n**shiba-inu** is released under the [MIT License](https://github.com/datacite/shiba-inu/blob/master/LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatacite%2Fshiba-inu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatacite%2Fshiba-inu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatacite%2Fshiba-inu/lists"}