{"id":18377284,"url":"https://github.com/bbc/citron","last_synced_at":"2025-04-06T21:31:30.816Z","repository":{"id":232244790,"uuid":"419262600","full_name":"bbc/citron","owner":"bbc","description":"Citron is an experimental quote extraction system created by BBC R\u0026D","archived":false,"fork":false,"pushed_at":"2021-12-14T10:51:01.000Z","size":11797,"stargazers_count":31,"open_issues_count":1,"forks_count":7,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-22T06:51:21.972Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bbc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-10-20T09:19:08.000Z","updated_at":"2025-02-14T21:37:16.000Z","dependencies_parsed_at":"2024-04-08T21:12:40.134Z","dependency_job_id":null,"html_url":"https://github.com/bbc/citron","commit_stats":null,"previous_names":["bbc/citron"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fcitron","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fcitron/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fcitron/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fcitron/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bbc","download_url":"https://codeload.github.com/bbc/citron/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247555105,"owners_count":20957705,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T00:27:36.277Z","updated_at":"2025-04-06T21:31:29.028Z","avatar_url":"https://github.com/bbc.png","language":"Python","readme":"\u003cimg src=\"./citron/public/img/citron_logo.png\" alt=\"Citron logo\" align=\"right\"\u003e\n\n# Citron #\n\nCitron is an experimental quote extraction and attribution system created by [BBC R\u0026D](https://www.bbc.co.uk/rd), based on a [paper](https://aclanthology.org/D13-1101/) and a [dataset](https://aclanthology.org/L16-1619/) developed by the School of Informatics at the University of Edinburgh.\n\nIt can be used to extract quotes from text documents, attributing them to the appropriate speaker and resolving pronouns where necessary. It supports direct and indirect quotes (with and without quotation marks respectively) and mixed quotes (which have direct and indirect parts). Note that there can be a significant number of errors and omissions. Extracted quotes should be checked against the input text.\n\nYou can run Citron using the [pre-trained model](./models/en_2021-11-15) or [train your own model](./scripts/train). You can also [evaluate its performance](./scripts/evaluate).\n\nTraining and evaluating models requires data using [Citron's Annotation Format](./docs/data_format.md). Citron provides [pre-processing scripts](./scripts/preprocess) to extract suitable data from the [PARC 3.0 Corpus of Attribution Relations](https://aclanthology.org/L16-1619/). Alternatively, you can create your own data using the [Citron Annotator](./scripts/annotator) app.\n\nTechnical details and potential applications are discussed in: [\"Quote Extraction and Analysis for News\"](./docs/DSJM_2018_paper_1.pdf).\n\n## Installation ##\nRequires Python 3.7.2 or above. The package versions shown should be installed when using the [pre-trained model](./models/en_2021-11-15).\n\n- [Install scikit-learn (1.0.*)](https://scikit-learn.org/stable/install.html)\n- [Install spaCy (3.*) and download a model](https://spacy.io/usage) \u0026nbsp;\u0026nbsp; (e.g. \"en_core_web_sm\")\n- Download the source code: ```git clone git@github.com:bbc/citron.git```\n\nThen from the citron root directory:\n\n    python3 -m pip install -r requirements.txt\n\nThen from python3:\n\n    import nltk\n    nltk.download(\"names\")\n\n## Usage  ##\n\nScripts to run Citron are available in the [bin/](./bin/) directory.\n\nAll scripts require the citron root directory in the PYTHONPATH.\n\n    $ export PYTHONPATH=$PYTHONPATH:/path/to/citron_root_directory\n\n### Run the Citron REST API and demonstration server ###\n    \n    $ citron-server\n        --model-path   Path to Citron model directory\n        --logfile      Path to logfile                   (Optional)\n        --port         Port for the Citron API           (Optional: default is 8080)\n        -v             Verbose mode                      (Optional)\n\n### Run Citron on the command-line ###\n\n    $ citron-extract\n        --model-path     Path to Citron model directory\n        --input-file     Path to input file                (Optional: Otherwise read from stdin)\n        --output-file    Path to output file               (Optional: Otherwise write to stdout)\n        -v               Verbose mode                      (Optional)\n\n### Use Citron in Python ###\n\n    from citron.citron import Citron\n    from citron import utils\n    \n    nlp = utils.get_parser()\n    citron = Citron(model_path, nlp)\n    doc = nlp(text)\n    quotes = citron.get_quotes(doc)\n\n## Issues and Questions ##\nIssues can be reported on the [issue tracker](https://github.com/bbc/citron/issues) and questions can be raised on the [discussion board](https://github.com/bbc/citron/discussions/categories/q-a).\n\n## Contributing ##\n\nContributions would be welcome. Please refer to the [contributing guidelines](./CONTRIBUTING.md).\n\n## License ##\n\nLicensed under the [Apache License, Version 2.0](./LICENSE).\n\nThe [pre-trained model](./models/en_2021-11-15) is separately licensed under the Creative Commons [Attribution-NonCommercial-ShareAlike 4.0 International licence](./models/en_2021-11-15/CC_BY-NC-SA_4.0.txt) and the [VerbNet 3.0 license](./models/en_2021-11-15/verbnet-license.3.0.txt).\n\n## Contact ##\n\nFor more information please contact: [chris.newell@bbc.co.uk](mailto:chris.newell@bbc.co.uk)\n\nCopyright 2021 British Broadcasting Corporation.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbc%2Fcitron","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbbc%2Fcitron","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbc%2Fcitron/lists"}