{"id":15284151,"url":"https://github.com/bramvanroy/astred","last_synced_at":"2025-04-12T23:21:33.190Z","repository":{"id":54901016,"uuid":"243038556","full_name":"BramVanroy/astred","owner":"BramVanroy","description":"An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For instance useful for comparing a translation with the original text, to find differences and similarities between two different translations, or to see how a machine translation differs from a reference translation.","archived":false,"fork":false,"pushed_at":"2021-11-27T13:34:32.000Z","size":263,"stargazers_count":22,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-06T08:43:57.110Z","etag":null,"topics":["alignment","linguistics","nlp","parallel-corpus","parsing","spacy","stanza","translation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BramVanroy.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION","codeowners":null,"security":null,"support":null}},"created_at":"2020-02-25T15:47:17.000Z","updated_at":"2024-12-12T05:26:06.000Z","dependencies_parsed_at":"2022-08-14T06:10:37.066Z","dependency_job_id":null,"html_url":"https://github.com/BramVanroy/astred","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BramVanroy%2Fastred","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BramVanroy%2Fastred/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BramVanroy%2Fastred/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BramVanroy%2Fastred/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BramVanroy","download_url":"https://codeload.github.com/BramVanroy/astred/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248644181,"owners_count":21138564,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","linguistics","nlp","parallel-corpus","parsing","spacy","stanza","translation"],"created_at":"2024-09-30T14:50:07.319Z","updated_at":"2025-04-12T23:21:33.168Z","avatar_url":"https://github.com/BramVanroy.png","language":"Python","readme":"Easily compare two word-aligned sentences with ASTrED\n=====================================================\n\nExample notebooks\n-----------------\n\nA couple example notebooks exist, each with a different grade of automation for the initialisation of the aligned object. \nOnce an aligned object has been created, the functionality is identical.\n\n- `High automation`_: *automate all the things*. Tokenisation, parsing, and word alignment is done automatically\n  [`Try on Colab \u003chttps://colab.research.google.com/github/BramVanroy/astred/blob/master/examples/full-auto.ipynb\u003e`__]\n- `Normal automation`_: the typical scenario where you have tokenised and aligned text that is not parsed yet\n  [`Try on Colab \u003chttps://colab.research.google.com/github/BramVanroy/astred/blob/master/examples/automatic-parsing.ipynb\u003e`__]\n- `No automation`_: full-manual mode, where you provide all the required information, including dependency labels\n  and heads [`Try on Colab \u003chttps://colab.research.google.com/github/BramVanroy/astred/blob/master/examples/full-manual.ipynb\u003e`__]\n- `Monolingual`_: in this example we rely on spaCy to compare two English sentences and calculate semantic similarity\n  between aligned words [`Try on Colab \u003chttps://colab.research.google.com/github/BramVanroy/astred/blob/master/examples/monolingual.ipynb\u003e`__]\n\n.. _High automation: examples/full-auto.ipynb\n.. _Normal automation: examples/automatic-parsing.ipynb\n.. _No automation: examples/full-manual.ipynb\n.. _Monolingual: examples/monolingual.ipynb\n\nInstallation\n------------\n\nRequires Python 3.7 or higher. To keep the overhead low, a default parser is NOT installed. Currently both `spaCy`_ and\n`stanza`_ are supported and you can choose which one to use. Stanza is recommended for bilingual research (because it\nis ensured that all of its models use Universal Dependencies), but spaCy can be used as well. The latter is especially\nused for monolingual comparisons, or if you are not interested in the linguistic comparisons and only require word\nreordering metrics.\n\nA pre-release is available on PyPi. You can install it with pip as follows.\n\n.. code-block:: bash\n\n    # Install with stanza (recommended)\n    pip install astred[stanza]\n    # ... or install with spacy\n    pip install astred[spacy]\n    # ... or install with both and decide later\n    pip install astred[parsers]\n\nIf you want to use spaCy, you have to make sure that you `install`_ the required models manually, which cannot be\nautomated.\n\n.. _spaCy: https://spacy.io/\n.. _stanza: https://github.com/stanfordnlp/stanza\n.. _install: https://spacy.io/usage/models\n\nAutomatic Word Alignment\n------------------------\n\nAutomatic word alignment is supported by using a modified version of `Awesome Align`_ under the hood. This is a neural\nword aligner that uses transfer learning with multilingual models to do word alignment. It does require\nsome manual installation work. Specifically, you need to install the :code:`astred_compat` branch from `this fork`_.\nIf you are using pip, you can run the following command:\n\n.. code-block:: bash\n\n    pip install git+https://github.com/BramVanroy/awesome-align.git@astred_compat\n\nAwesome Align requires PyTorch, like :code:`stanza` above.\n\nIf it is installed, you can initialize :code:`AlignedSentences` without providing word alignments. Those will be added\nautomatically behind the scenes. See `this example notebook`_ [`Try on Colab \u003chttps://colab.research.google.com/github/BramVanroy/astred/blob/master/examples/full-auto.ipynb\u003e`__] for more.\n\n.. code-block:: bash\n\n\tsent_en = Sentence.from_text(\"I like eating cookies\", \"en\")\n\tsent_nl = Sentence.from_text(\"Ik eet graag koekjes\", \"nl\")\n\n\t# Word alignments do not need to be added on init:\n\taligned = AlignedSentences(sent_en, sent_nl)\n\nKeep in mind however that automatic alignment will never have the same quality as manual alignments. Use with caution!\nI highly suggest reading `the paper`_ of Awesome Align to see whether it is a good pick for you.\n\n.. _Awesome Align: https://github.com/neulab/awesome-align\n.. _this fork: https://github.com/BramVanroy/awesome-align/tree/astred_compat\n.. _this example notebook: examples/full-auto.ipynb\n.. _the paper: https://arxiv.org/abs/2101.08231\n\nLicense\n-------\nLicensed under Apache License Version 2.0. See the LICENSE file attached to this repository.\n\nCitation\n--------\nPlease cite our `papers`_ if you use this library.\n\nVanroy, B., De Clercq, O., Tezcan, A., Daems, J., \u0026 Macken, L. (2021). Metrics of syntactic equivalence to assess \ntranslation difficulty. In M. Carl (Ed.), *Explorations in empirical translation process research* (Vol. 3, pp. 259–294).\nCham, Switzerland: Springer International Publishing. https://doi.org/10.1007/978-3-030-69777-8_10\n\n.. code-block::\n\n\t@incollection{vanroy2021metrics,\n\t    title = {Metrics of syntactic equivalence to assess translation difficulty},\n\t    booktitle = {Explorations in empirical translation process research},\n\t    author = {Vanroy, Bram and De Clercq, Orph{\\'e}e and Tezcan, Arda and Daems, Joke and Macken, Lieve},\n\t    editor = {Carl, Michael},\n\t    year = {2021},\n\t    series = {Machine {{Translation}}: {{Technologies}} and {{Applications}}},\n\t    volume = {3},\n\t    pages = {259--294},\n\t    publisher = {{Springer International Publishing}},\n\t    address = {{Cham, Switzerland}},\n\t    isbn = {978-3-030-69776-1},\n\t    url = {https://link.springer.com/chapter/10.1007/978-3-030-69777-8_10},\n\t    doi = {10.1007/978-3-030-69777-8_10}\n\t}\n\nVanroy, B., Schaeffer, M., \u0026 Macken, L. (2021). Comparing the Effect of Product-Based Metrics on the Translation Process. *Frontiers in Psychology*, 12. https://doi.org/10.3389/fpsyg.2021.681945\n\n.. code-block::\n\n\t@article{vanroy2021comparing,\n\t    publisher = {Frontiers},\n\t    author = {Vanroy, Bram and Schaeffer, Moritz and Macken, Lieve},\n\t    title = {Comparing the effect of product-based metrics on the translation process},\n\t    year = {2021},\n\t    journal = {Frontiers in Psychology},\n\t    volume = {12}, \n\t    issn = {1664-1078}, \n\t    url = {https://www.frontiersin.org/article/10.3389/fpsyg.2021.681945},\n\t    doi = {10.3389/fpsyg.2021.681945}, \n\t}\n\n\n.. _papers: CITATION\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbramvanroy%2Fastred","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbramvanroy%2Fastred","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbramvanroy%2Fastred/lists"}