{"id":13741431,"url":"https://github.com/cidles/pyannotation","last_synced_at":"2026-01-10T13:39:00.914Z","repository":{"id":1622362,"uuid":"2302173","full_name":"cidles/pyannotation","owner":"cidles","description":"PyAnnotation is a Python Library to access and manipulate linguistically annotated corpus files.","archived":false,"fork":false,"pushed_at":"2012-09-04T10:52:23.000Z","size":772,"stargazers_count":16,"open_issues_count":0,"forks_count":1,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-08-03T04:07:58.401Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://www.cidles.eu/ltll/poio-pyannotation","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cidles.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-08-31T15:04:29.000Z","updated_at":"2021-04-08T19:28:58.000Z","dependencies_parsed_at":"2022-08-31T10:53:21.311Z","dependency_job_id":null,"html_url":"https://github.com/cidles/pyannotation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cidles%2Fpyannotation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cidles%2Fpyannotation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cidles%2Fpyannotation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cidles%2Fpyannotation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cidles","download_url":"https://codeload.github.com/cidles/pyannotation/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224774798,"owners_count":17367795,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T04:00:59.126Z","updated_at":"2026-01-10T13:39:00.859Z","avatar_url":"https://github.com/cidles.png","language":"Python","readme":"Python Linguistic Annotation Libary\n===================================\nPyAnnotation is a Python Library to access and manipulate linguistically\nannotated corpus files. Supported file format is currently only Elan XML,\nwith Kura XML and Toolbox files support planned for future releases. A\nCorpus Reader API is provided to support statistical analysis within the\nNatural Language Toolkit.\nThe software is licensed under the GNU General Public License. \n\n\nREQUIREMENTS\n============\nYou need to install the following packages:\n\n- Python: http://python.org/download\n- If you want to process data with NLTK: http://www.nltk.org/download\n\n\nINSTALLATION\n============\nTo install PyAnnotation on Windows just start the .exe file you downloaded and\nfollow the instructions in the setup process.\nTo install PyAnnotation on Linux, Unix and other platforms you need to unpack\nthe file and start \"setup.py\" on the command line. Change to the directory\ninto which you downloaded the package and unpack it::\n\n  $ tar xzf pyannotation-x.y.z.tar.gz\n  $ cd pyannotation-x.y.z\n\nThen, to install the package locally into your python repository (you may need\nto have root privileges)::\n\n  $ python setup.py install\n\nThe installation process will give you feedback and should finish without\nerrors.\n\n\nBASIC USAGE\n===========\nHere are a few examples what you can do with PyAnnotation. All the examples\nprocess Elan files which are stored in one directory, the directory here is\n\"example_data\" which is part of the package you downloaded. The package also\ncontains a sample script \"example1.py\" that runs all the commands presented\nhere, so you might just call \"python example1.py\" and see all the results on\nyour own computer at once. First, start a python interpreter and import\npyanntation for Elan::\n\n  $ python\n  Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) \n  [GCC 4.3.3] on linux2\n  Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n\nFirst, import the corpus reader module:\n\n\u003e\u003e\u003e import pyannotation.corpusreader\n\nThen load create a corpus reader and load a file into your corpus. The\nsecond argument to the addFile method is the file type (.eaf here):\n\n\u003e\u003e\u003e cr = pyannotation.corpusreader.GlossCorpusReader()\n\u003e\u003e\u003e cr.addFile(\"example_data/turkish.eaf\", pyannotation.data.EAF)\n\nTo get all sentences with their tags that have a gloss \"ANOM\" (here: tags\nare morphemes and their glosses stored in a kind of tree):\n\n\u003e\u003e\u003e result = [s for s in cr.tagged_sents() for (word, tag) in s\n...             for (morphem, gloss) in tag\n...             if 'ANOM' in gloss and s not in locals()['_[1]']]\n\u003e\u003e\u003e print result\n[[('eve', [('ev', ['home']), ('e', ['DIR'])]), ('geldi\\xc4\\x9fimde', ...\n\nOnly the sentences of the result:\n\n\u003e\u003e\u003esents = [[w for (w, t) in s] for s in result]\n\u003e\u003e\u003e print sents\n[['eve', 'geldi\\xc4\\x9fimde', 'ya\\xc4\\x9fmur',  ...\n\nA word list from the result:\n\n\u003e\u003e\u003e tagged_words = [(w,t) for s in result for (w, t) in s]\n\u003e\u003e\u003e print tagged_words\n[('eve', [('ev', ['home']), ('e', ['DIR'])]), ('geldi\\xc4\\x9fimde', ...\n\nA list of morphemes and their tags from the result:\n\n\u003e\u003e\u003e tagged_morphemes = [(m,g) for s in result for (w,t) in s for (m,g) in t]\n\u003e\u003e\u003e print tagged_morphemes\n[('ev', ['home']), ('e', ['DIR']), ('gel', ['come']), ('di\\xc4\\x9f', ...\n\nAnother query: find all sentences that contain a certain word (here: \"home\")\nin their translation:\n\n\u003e\u003e\u003e import re\n\u003e\u003e\u003e result2 = [(s, translations) \n...            for (s, translations) in cr.tagged_sents_with_translations() \n...            for t in translations if re.search(r\"\\bhome\\b\", t)]\n\u003e\u003e\u003e print result2\n[([('d\\xc3\\xbcn', [('d\\xc3\\xbcn', ['yesterday'])]), ('ak\\xc5\\x9fam', ...\n\nAnd, last but not least, use your Elan corpus with NLTK. An example to get the\nconcordance for the word \"bir\" (turkish for \"one\"):\n\n\u003e\u003e\u003e import nltk.text\n\u003e\u003e\u003e text = nltk.text.Text(cr.words())\n\u003e\u003e\u003e text.concordance('bir') # find concordance for turkish \"bir\"\nBuilding index...\nDisplaying 2 of 2 matches:\n daha rahat ederdim çünkü içimden bir ses yeter artık çalışma derken bi\nir ses yeter artık çalışma derken bir diğer ses de çalışmam gerektiğin\n\n\nJust try it out for yourself what you can do with the data. PyAnnotation's\ncorpus reader for .eaf files has the following access methods for data::\n\n  # I{corpus}.mophemes()\n  # I{corpus}.words()\n  # I{corpus}.sents()\n  # I{corpus}.sents_with_translations()\n  \n  # I{corpus}.tagged_morphemes()\n  # I{corpus}.tagged_words()\n  # I{corpus}.tagged_sents()\n  # I{corpus}.tagged_sents_with_translations()\n\nMore documentation is available at:\n\nhttp://www.cidles.eu/doc/pyannotation/index.html\n\n\nSITE\n====\nThe website of this project is:\n\nhttp://www.cidles.eu/ltll/poio-pyannotation\n","funding_links":[],"categories":["Software"],"sub_categories":["Utilities"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcidles%2Fpyannotation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcidles%2Fpyannotation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcidles%2Fpyannotation/lists"}