{"id":13819062,"url":"https://github.com/inukshuk/anystyle","last_synced_at":"2025-05-16T02:06:32.033Z","repository":{"id":1622473,"uuid":"2302592","full_name":"inukshuk/anystyle","owner":"inukshuk","description":"Fast citation reference parsing","archived":false,"fork":false,"pushed_at":"2025-05-11T12:32:49.000Z","size":18108,"stargazers_count":1117,"open_issues_count":66,"forks_count":94,"subscribers_count":31,"default_branch":"main","last_synced_at":"2025-05-11T13:24:36.567Z","etag":null,"topics":["bibliography","citation-styles","conditional-random-fields","machine-learning","parser","ruby","science"],"latest_commit_sha":null,"homepage":"https://anystyle.io","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/inukshuk.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2011-08-31T16:24:21.000Z","updated_at":"2025-05-11T12:32:52.000Z","dependencies_parsed_at":"2023-01-12T15:01:13.981Z","dependency_job_id":"f57ba251-7120-4e4a-bd18-6b85fc0c5a35","html_url":"https://github.com/inukshuk/anystyle","commit_stats":{"total_commits":624,"total_committers":11,"mean_commits":56.72727272727273,"dds":0.0625,"last_synced_commit":"4038bedc0ab8d4bab5970376813d78e7d8d10e68"},"previous_names":[],"tags_count":56,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inukshuk%2Fanystyle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inukshuk%2Fanystyle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inukshuk%2Fanystyle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inukshuk%2Fanystyle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/inukshuk","download_url":"https://codeload.github.com/inukshuk/anystyle/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254453645,"owners_count":22073616,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bibliography","citation-styles","conditional-random-fields","machine-learning","parser","ruby","science"],"created_at":"2024-08-04T08:00:38.955Z","updated_at":"2025-05-16T02:06:32.005Z","avatar_url":"https://github.com/inukshuk.png","language":"Ruby","readme":"AnyStyle\n========\n[![CI](https://github.com/inukshuk/anystyle/actions/workflows/ci.yml/badge.svg)](https://github.com/inukshuk/anystyle/actions/workflows/ci.yml)\n[![Coverage Status](https://coveralls.io/repos/github/inukshuk/anystyle/badge.svg)](https://coveralls.io/github/inukshuk/anystyle)\n\nAnyStyle is a fast and smart parser of bibliographic references.\nOriginally inspired by [parsCit][] and [FreeCite][],\nAnyStyle uses machine learning algorithms\nand aims to make it easy to train models\nwith data that's relevant to you.\n\n\nUsing AnyStyle on the command line\n----------------------------------\n    $ [sudo] gem install anystyle-cli\n    $ anystyle --help\n    $ anystyle help find\n    $ anystyle help parse\n\nSee [anystyle-cli][] for more details.\n\n\nUsing AnyStyle in Ruby\n----------------------\nInstall the `anystyle` gem.\n\n    $ [sudo] gem install anystyle\n\nNow you can use the static Parser and Finder instances\nby calling the `AnyStyle.parse` or `AnyStyle.find` methods.\nFor example:\n\n```ruby\nrequire 'anystyle'\n\npp AnyStyle.parse 'Derrida, J. (1967). L’écriture et la différence (1 éd.). Paris: Éditions du Seuil.'\n#-\u003e [{\n#  :author=\u003e[{:family=\u003e\"Derrida\", :given=\u003e\"J.\"}],\n#  :date=\u003e[\"1967\"],\n#  :title=\u003e[\"L’écriture et la différence\"],\n#  :edition=\u003e[\"1\"],\n#  :location=\u003e[\"Paris\"],\n#  :publisher=\u003e[\"Éditions du Seuil\"],\n#  :language=\u003e\"fr\",\n#  :scripts=\u003e[\"Common\", \"Latin\"],\n#  :type=\u003e\"book\"\n#}]\n```\n\nYou can also create your own\n`AnyStyle::Parser` or `AnyStyle::Finder` with custom options.\n\n\nUsing AnyStyle on the web\n-------------------------\nAnyStyle is available at [anystyle.io][].\n\nThe web application is [open source][]\nand you're welcome to host your own instance!\n\n[anystyle-cli]: https://github.com/inukshuk/anystyle-cli\n[anystyle.io]: https://anystyle.io\n[open source]: https://github.com/inukshuk/anystyle.io\n[parsCit]: http://aye.comp.nus.edu.sg/parsCit/\n[FreeCite]: http://freecite.library.brown.edu/\n\n\nImproving results for your data\n===============================\nTraining\n--------\nYou can train custom Finder and Parser models.\nTo do this, you need to prepare your own data sets for training.\nYou can create your own data from scratch\nor build on AnyStyle's default sets.\nThe default parser model uses the [core][] data set.\nAnd though the finder model sources aren't available in their entirety,\ndue to copyright restrictions,\nyou can find several [tagged documents][] here.\n\nWhen you have compiled a data set for training,\nyou will be ready to create your own model:\n\n    $ anystyle train training-data.xml custom.mod\n\nThis will save your new model as `custom.mod`.\nTo use your model instead of AnyStyle's default,\nuse the `-P` or `--parser-model` flag and, respectively,\n`-F` or `--finder-model` to use a custom finder model.\nFor instance, the command below\nwill parse a file `bib.txt` with the custom model\nand print the result to STDOUT in JSON format:\n\n    $ anystyle -P custom.mod -f json parse bib.txt -\n\nWhen training your own models, it's good practice\nto check their quality using a second data set.\nFor example, to check your custom model\nusing AnyStyle's manually curated [gold][] data set:\n\n    $ anystyle -P x.mod check ./res/parser/gold.xml\n    Checking gold.xml.................   1 seq  0.06%   3 tok  0.01%  3s\n\nThis command prints sequence and token error rates.\nHere, sequence errors are the number of references\ntagged differently by the parser\nas compared to the curated input;\nthe number of token errors\nis the total number of words in these references.\nIn the example above, one reference was wrong\n(out of 1,700 at the time),\nbecause a total of three words had a different tag.\n\nWhen working with training data,\nit's a good idea to use the `Wapiti::Dataset` API in Ruby:\nit supports standard set operators\nand makes it easy to combine or compare data sets.\n\n[core]: https://github.com/inukshuk/anystyle/blob/master/res/parser/core.xml\n[gold]: https://github.com/inukshuk/anystyle/blob/master/res/parser/gold.xml\n[tagged documents]: https://github.com/inukshuk/anystyle/blob/master/res/finder\n\n\nNatural Languages used in AnyStyle\n----------------------------------\nThe [core][] data set contains the manually marked-up references\nwhich comprise AnyStyle's default parser model.\nIf your references include non-English documents,\nthe distribution of natural languages in this corpus is relevant.\n\n| Language                | n   |\n|-------------------------|-----|\n| ENGLISH                 | 965 |\n| FRENCH                  | 54  |\n| GERMAN                  | 26  |\n| ITALIAN                 | 11  |\n| Others                  | 9   |\n|                         |     |\n| Not reliably determined | 449 |\n| (but mainly English)    |     |\n\n(Measured using [cld][] and AnyStyle version 1.3.13)\n\nThere is a strong prevalence of English-language documents with the\nconventions used in English-language bibliographies,\nwith some representation of other European languages.\nThe languages used reflect those used in scientific publishing\nas well as the maintainers' competencies.\nIf you are working with documents in languages other than English,\nyou might consider training the model with some examples\nin the relevant languages.\n\nAnyStyle works with references written in any Latin script,\nincluding most European languages,\nlanguages such as Indonesian and Malaysian,\nas well as romanized Arabic, Chinese and Japanese.\nIt also supports non-Latin alphabets such as Cyrillic,\nalthough no examples of these appear in the default training sets.\nLanguages written in syllabaries or complex symbols\nwhich don't use white space to separate tokens\naren't compatible with AnyStyle's approach:\nthis includes Chinese, Japanese, Arabic, and Indian languages. \n\n[cld]: https://github.com/jtoy/cld\n\n\nDictionary Adapters\n-------------------\nDuring the statistical analysis of reference strings,\nAnyStyle relies on a large feature dictionary;\nby default, AnyStyle creates a persistent Ruby hash\nin the folder of the `anystyle-data` Gem.\nThis uses up about 2MB of disk space\nand keeps the entire dictionary in memory.\nIf you prefer a smaller memory footprint,\nyou can use AnyStyle's GDBM dictionary.\nGDBM bindings are part of the Ruby standard library\nand supported on all platforms,\nthough you may need to install GDBM before installing Ruby.\n\nIf you don't want to use the persistent Ruby hash nor GBDM,\nyou can store your dictionary in memory or use a Redis.\nThe best way to change the default dictionary adapter\nis by adjusting AnyStyle's default configuration\n(when using the static parser instances\nyou must set the default before using the parser):\n\n    AnyStyle::Dictionary.defaults[:adapter] = :ruby\n    #-\u003e Use a persistent Ruby hash;\n    #-\u003e slower start-up than GDBM but no extra dependency\n\n    AnyStyle::Dictionary.defaults[:adapter] = :hash\n    #-\u003e Use in-memory dictionary; slow start-up but uses no space on disk\n\n    require 'anystyle/dictionary/gdbm'\n    AnyStyle::Dictionary.defaults[:adapter] = :gdbm\n\nTo use Redis, install the `redis` and `redis/namespace` (optional) Gems\nand configure AnyStyle to use the Redis adapter:\n\n    AnyStyle::Dictionary.defaults[:adapter] = :redis\n\n    # Adjust the Redis-specifi configuration\n    require 'anystyle/dictionary/redis'\n    AnyStyle::Dictionary::Redis.defaults[:host] = 'localhost'\n    AnyStyle::Dictionary::Redis.defaults[:port] = 6379\n\n\nAbout AnyStyle\n==============\nContributing\n------------\nThe AnyStyle source code is hosted on [GitHub][].\nYou can check out a copy of the latest code using Git:\n\n    $ git clone https://github.com/inukshuk/anystyle.git\n\nIf you've found a bug or have a question,\nplease [report the issue][] or,\nfor extra credit, clone the AnyStyle repository,\nwrite a failing example, fix the bug and submit a pull request.\n\n[GitHub]: https://github.com/inukshuk/anystyle/\n[report the issue]: https://github.com/inukshuk/anystyle/issues\n\n\nCredits\n-------\nAnyStyle is a volunteer effort and you're encourage to join!\nOver the years the main contributors have been:\n\n* [Alex Fenton](https://github.com/a-fent)\n* [Sylvester Keil](https://github.com/inukshuk)\n* [Johannes Krtek](https://github.com/flachware)\n* [Ilja Srna](https://github.com/namyra)\n\n\nLicense\n-------\nCopyright 2011-2023 Sylvester Keil. All rights reserved.\n\nAnyStyle is distributed under a BSD-style license.\nSee [LICENSE](./LICENSE) for details.\n","funding_links":[],"categories":["Ruby","Happy Exploring 🤘"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finukshuk%2Fanystyle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finukshuk%2Fanystyle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finukshuk%2Fanystyle/lists"}