{"id":21620804,"url":"https://github.com/miku/span","last_synced_at":"2026-03-17T16:18:35.088Z","repository":{"id":26798658,"uuid":"30257097","full_name":"miku/span","owner":"miku","description":"Span formats.","archived":false,"fork":false,"pushed_at":"2026-03-12T13:51:40.000Z","size":58565,"stargazers_count":16,"open_issues_count":3,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2026-03-12T19:28:41.953Z","etag":null,"topics":["bibliographic","code4lib","json","metadata","xml"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/miku.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2015-02-03T18:10:44.000Z","updated_at":"2026-03-12T14:07:11.000Z","dependencies_parsed_at":"2025-10-24T01:22:44.287Z","dependency_job_id":"e0c914e9-37ef-48a8-aec8-3e1047873df4","html_url":"https://github.com/miku/span","commit_stats":null,"previous_names":[],"tags_count":203,"template":false,"template_full_name":null,"purl":"pkg:github/miku/span","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fspan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fspan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fspan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fspan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/miku","download_url":"https://codeload.github.com/miku/span/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fspan/sbom","scorecard":{"id":646847,"data":{"date":"2025-08-11","repo":{"name":"github.com/miku/span","commit":"7de824675640c36e9ef802cb5d47ef81e8168982"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.8,"checks":[{"name":"Maintained","score":10,"reason":"20 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 10","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: GNU General Public License v3.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Signed-Releases","score":0,"reason":"Project has not signed or included provenance with any releases.","details":["Warn: release artifact v0.2.8 not signed: https://api.github.com/repos/miku/span/releases/231400054","Warn: release artifact v0.2.7 not signed: https://api.github.com/repos/miku/span/releases/231122105","Warn: release artifact v0.2.6 not signed: https://api.github.com/repos/miku/span/releases/230519687","Warn: release artifact v0.2.5 not signed: https://api.github.com/repos/miku/span/releases/213341611","Warn: release artifact v0.2.4 not signed: https://api.github.com/repos/miku/span/releases/205359789","Warn: release artifact v0.2.8 does not have provenance: https://api.github.com/repos/miku/span/releases/231400054","Warn: release artifact v0.2.7 does not have provenance: https://api.github.com/repos/miku/span/releases/231122105","Warn: release artifact v0.2.6 does not have provenance: https://api.github.com/repos/miku/span/releases/230519687","Warn: release artifact v0.2.5 does not have provenance: https://api.github.com/repos/miku/span/releases/213341611","Warn: release artifact v0.2.4 does not have provenance: https://api.github.com/repos/miku/span/releases/205359789"],"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}}]},"last_synced_at":"2025-08-21T12:21:49.262Z","repository_id":26798658,"created_at":"2025-08-21T12:21:49.262Z","updated_at":"2025-08-21T12:21:49.262Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30626982,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T14:16:03.965Z","status":"ssl_error","status_checked_at":"2026-03-17T14:16:03.380Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bibliographic","code4lib","json","metadata","xml"],"created_at":"2024-11-24T23:12:58.024Z","updated_at":"2026-03-17T16:18:35.070Z","avatar_url":"https://github.com/miku.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Span\n\n![](docs/ticino_26242_sm.gif)\n\nSpan started as a single tool to convert [Crossref\nAPI](https://www.crossref.org/services/metadata-delivery/rest-api/) data into a\n[VuFind](https://github.com/vufind-org/vufind)/[SOLR\nformat](https://github.com/finc/index/blob/master/schema.xml) as used in\n[finc](https://finc.info). An [intermediate\nrepresentation](https://github.com/ubleipzig/intermediateschema) for article\nmetadata is used for normalizing various input formats.\n[Go](https://golang.org/) was choosen as the implementation language because it\nis easy to deploy and has concurrency support built into the language. A basic\nscatter-gather design allowed to process millions of records fast.\n\n[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n\nWhile span has a few independent tools (like fetching or compacting\n[crossref](https://www.crossref.org/) feeds), it is mostly used inside\n[siskin](https://github.com/ubleipzig/siskin), a set of tasks to build an\naggregated index.\n\n## Installation\n\n    $ go install github.com/miku/span/cmd/...@latest\n\nSpan has frequent [releases](https://github.com/miku/span/releases), although\nnot all versions will be packaged as deb or rpm.\n\n## Background\n\nInitial import *Tue Feb 3 19:11:08 2015*, a single `span` command. In March\n2015, `span-import` and `span-export` appeared.  There were some rudimentary\ncommands for dealing with holding files of various formats. In early 2016, a\nlicensing tool was briefly named `span-label` before becoming `span-tag`. In\nSummer 2016, `span-check`, `span-deduplicate`, `span-redact` were added, later\na first man-page followed. In Summer 2017, `span-deduplicate` was gone, the\ndoi-based deduplication was split up between the blunt, but fast\n[groupcover](https://github.com/miku/groupcover) and the generic\n`span-update-labels`. A new `span-oa-filter` helped to mark open-access\nrecords. In Winter 2017, a `span-freeze` was added to allow for fixed\nconfiguration across dozens of files. The `span-crossref-snapshot` tool\nreplaced a sequence of luigi tasks responsible for creating a snapshot of\ncrossref data (the process has been summarized in [a\ncomment](https://github.com/datahq/awesome-data/issues/29#issuecomment-405089255)).\nIn Summer 2018, three new tools were added: `span-compare` for generating index\ndiffs for index update tickets, `span-review` for generating reports based on\nSOLR queries and `span-webhookd` for triggering index reviews and ticket\nupdates through GitLab. During the development, new input and output formats\nhave been added. The parallel processing of records has been streamlined with\nthe help of a small library called\n[parallel](https://github.com/miku/parallel). Since Winter 2017, the\n[zek](https://github.com/miku/zek) struct generator takes care of the initial\nscreening of sources serialized as XML - making the process of mapping new data\nsources easier.\n\nSince about 2018 (0.1.211), the span tools have seen mostly small fixes and\nadditions.  Notable, since 2021, the previous scripts used to fetch daily\nmetadata updates from [crossref](https://api.crossref.org) has been put into a\nstandalone tool, `span-crossref-sync`, which merely adds some retry logic and\nconsistent file naming to the API harvest. In 2024, `span-webhookd`,\n`span-check`, `span-review`, `span-tagger` are gone. A [faster crossref snapshot\ntool](https://github.com/miku/span/blob/29d0af845102464475e1d3b9aba779895847c32e/cmd/span-crossref-fast-snapshot/main.go) was implemented in 2025.\n\n## Documentation\n\nSee: manual [source](https://github.com/miku/span/blob/master/docs/span.md).\n\n## Performance\n\nIn the best case no complete processing of the data should take more than two\nhours or run slower than 20000 records/s. The most expensive part currently\nseems to be the JSON\n[serialization](https://raw.githubusercontent.com/miku/span/master/docs/span-import.0.1.253.png),\nbut we keep JSON for now for the sake of readability. Experiments with faster\nJSON serializers and msgpack have been encouraging, a faster serialization\nshould be the next measure to improve performance.\n\nMost tools that work on lines will try to use as many workers as CPU cores.\nExcept for `span-tag` - which needs to keep all holdings data in memory - all\ntools work well in a low-memory environment.\n\nMore cores can help (but returns may diminsh): On a 64 core [2021\nXeon](https://ark.intel.com/content/www/de/de/ark/products/215274/intel-xeon-gold-6326-processor-24m-cache-2-90-ghz.html),\nwe find that e.g. `span-export` can process (decompression, deserialization,\nconversion, serialization, compression) on average 130000 JSON documents/s. The\nfinal pipeline stage (from normalized data to deduplicated and indexable data)\nseems to take about three hours.\n\n![](docs/htop.png)\n\n## Integration\n\nThe span tools are used in various tasks in siskin (which contains all\norchestration code). All span tools work fine standalone, and most will accept\ninput from stdin as well, allowing for one-off things like:\n\n```shell\n$ metha-cat http://oai.web | span-import -i name | span-tag -c amsl | span-export | solrbulk\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiku%2Fspan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmiku%2Fspan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiku%2Fspan/lists"}