{"id":17202866,"url":"https://github.com/kermitt2/biblio-glutton","last_synced_at":"2025-10-07T11:15:44.591Z","repository":{"id":35882699,"uuid":"100077276","full_name":"kermitt2/biblio-glutton","owner":"kermitt2","description":"A high performance bibliographic information service: https://biblio-glutton.readthedocs.io","archived":false,"fork":false,"pushed_at":"2024-09-14T07:59:21.000Z","size":7661,"stargazers_count":135,"open_issues_count":31,"forks_count":17,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-03T08:25:57.382Z","etag":null,"topics":["bibliographical-references","disambiguation","doi","hal","metadata-api","openaccess","pubmed","reference-matching"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kermitt2.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-11T23:44:32.000Z","updated_at":"2025-02-23T11:29:55.000Z","dependencies_parsed_at":"2022-08-30T02:52:44.263Z","dependency_job_id":"5317d780-3fa5-45c2-a98b-5222befdc594","html_url":"https://github.com/kermitt2/biblio-glutton","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kermitt2%2Fbiblio-glutton","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kermitt2%2Fbiblio-glutton/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kermitt2%2Fbiblio-glutton/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kermitt2%2Fbiblio-glutton/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kermitt2","download_url":"https://codeload.github.com/kermitt2/biblio-glutton/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243495506,"owners_count":20299922,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bibliographical-references","disambiguation","doi","hal","metadata-api","openaccess","pubmed","reference-matching"],"created_at":"2024-10-15T02:16:14.876Z","updated_at":"2025-10-07T11:15:39.570Z","avatar_url":"https://github.com/kermitt2.png","language":"Java","funding_links":[],"categories":["Citation and metadata extraction"],"sub_categories":[],"readme":"# biblio-glutton\n\n[![License](http://img.shields.io/:license-apache-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0.html)\n[![SWH](https://archive.softwareheritage.org/badge/origin/https://github.com/kermitt2/biblio-glutton/)](https://archive.softwareheritage.org/browse/origin/?origin_url=https://github.com/kermitt2/biblio-glutton)\n\nA framework dedicated to scientific bibliographic information. It includes:\n\n- a bibliographical reference matching service: from an input such as a raw bibliographical reference and/or a combination of key metadata, the service will return the disambiguated bibliographical object with in particular its DOI and a set of metadata aggregated from Crossref and other sources, \n- a fast metadata look-up service: from a \"strong\" identifier such as DOI, PMID, etc. the service will return a set of metadata aggregated from Crossref and other sources,\n- various mapping between DOI, PMID, PMC, ISTEX ID and ark, integrated in the bibliographical service,\n- Open Access resolver: Integration of Open Access links via the Unpaywall dataset from Impactstory,\n- Gap and daily update for Crossref resources (via the Crossref REST API), so that your glutton data service stays always in sync with Crossref,\n- MeSH classes mapping for PubMed articles.\n\nbiblio-glutton should be very handy if you need to run and scale a local full \"Crossref\" database and API, to aggregate Crossref, Pubmed and other common bibliographical records and to match a large amount of bibliographical records or raw bibliographical reference strings.\n\nThe framework is designed both for speed (with several thousands request per second for look-up) and matching accuracy. It can be [scaled](https://github.com/kermitt2/biblio-glutton#architecture) horizontally as needed and can provide high availability. \n\nBenchmarking against the Crossref REST API is presented [below](https://github.com/kermitt2/biblio-glutton#matching-accuracy). \n\nIn the Glutton family, the following complementary tools are available for taking advantage of Open Access resources: \n\n* [biblio-glutton-extension](https://github.com/kermitt2/biblio-glutton-extension): A browser extension (Firefox \u0026 Chrome) for providing bibliographical services, like identifying dynamically Open Access resources on web pages and providing contextual citation services.\n\n* [biblio-glutton-harvester](https://github.com/kermitt2/biblio-glutton-harvester): A robust, fault tolerant, Python utility for harvesting efficiently (multi-threaded) a large Open Access collection of PDF (Unpaywall, PubMed Central), with the possibility to upload content on Amazon S3,\n\nCurrent stable version of biblio-glutton is `0.3`. Working version is `0.4-SNAPSHOT`.\n\n## Documentation\n\nThe full documentation is available [here](https://biblio-glutton.readthedocs.io/en/latest/), including an evaluation of the bibliographical reference matching and some expected runtime information.\n\n## How to cite\n\nIf you want to cite this work, please refer to the present GitHub project, together with the [Software Heritage](https://www.softwareheritage.org/) project-level permanent identifier and do please indicate any author name. For example, with BibTeX:\n\n```bibtex\n@misc{biblio-glutton,\n    title = {biblio-glutton},\n    url = {https://github.com/kermitt2/biblio-glutton},\n    publisher = {GitHub},\n    year = {2018--2024},\n    archivePrefix = {swh},\n    eprint = {1:dir:a5a4585625424d7c7428654dbe863837aeda8fa7}\n}\n```\n\n## Main authors and contact\n\n- Patrice Lopez ([@kermitt2](https://github.com/kermitt2), patrice.lopez@science-miner.com)\n\n- Luca Foppiano ([@lfoppiano](https://github.com/lfoppiano))\n\n## License\n\nDistributed under [Apache 2.0 license](http://www.apache.org/licenses/LICENSE-2.0). \n\nIf you contribute to this project, you agree to share your contribution following this license. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkermitt2%2Fbiblio-glutton","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkermitt2%2Fbiblio-glutton","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkermitt2%2Fbiblio-glutton/lists"}