{"id":13740191,"url":"https://github.com/kornai/4lang","last_synced_at":"2025-05-08T19:36:48.497Z","repository":{"id":25809158,"uuid":"29248144","full_name":"kornai/4lang","owner":"kornai","description":"Concept dictionary","archived":false,"fork":false,"pushed_at":"2024-04-04T20:49:58.000Z","size":4785,"stargazers_count":37,"open_issues_count":52,"forks_count":13,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-08-04T04:06:17.262Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kornai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2015-01-14T14:33:02.000Z","updated_at":"2024-02-07T14:43:13.000Z","dependencies_parsed_at":"2024-04-17T21:42:44.994Z","dependency_job_id":"9c46e2fb-415c-413f-ae06-a3380757d974","html_url":"https://github.com/kornai/4lang","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kornai%2F4lang","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kornai%2F4lang/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kornai%2F4lang/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kornai%2F4lang/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kornai","download_url":"https://codeload.github.com/kornai/4lang/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224765420,"owners_count":17366117,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T04:00:44.110Z","updated_at":"2024-11-15T10:30:21.333Z","avatar_url":"https://github.com/kornai.png","language":"Python","funding_links":[],"categories":["Software","Python","Datasets"],"sub_categories":["Utilities","Linguistic resources"],"readme":"## 4lang\n\nThis repository provides\n- the `4lang` concept dictionary, which contains manually written concept\n  definitions. (Learn more about the filelds of the tsv file [here](https://github.com/kornai/4lang/wiki/Fields-in-the-concept-dictionary))\n- the `text_to_4lang` module, which creates concept graph representations from running text\n- the `dict_to_4lang` module, which builds more of these definitions from human-readable dictionaries\n\n\n### Dependencies\n\n#### pymachine\nOur tools require an installation of the [pymachine](http://github.com/kornai/pymachine) implementation of Eilenberg-machines.\n\n#### hunmorph\nFor lemmatization, `4lang` uses the `hunmorph` tool, on most UNIX-based systems you can use [these pre-compiled executables and models](http://people.mokk.bme.hu/~recski/4lang/huntools_binaries.tgz) (just extract them in your `4lang` directory). On 64-bit systems you may have to install the `libc6-i386` package for the `hundisambig` binary to work.\n\n__NOTE__: All remaining dependencies are required only for building 4lang graphs, so in case you only want to use the graphs we provide (e.g. for the machine similarity component of our [Semeval STS system](https://github.com/juditacs/semeval/)), you can skip the rest of this section and continue to [download pre-compiled graphs](#downloading-pre-compiled-graphs).\n\n#### Stanford Parser, CoreNLP, jython\nFor parsing dictionary definitions, `4lang` requires the [Stanford Dependency Parser](http://nlp.stanford.edu/software/lex-parser.shtml#Download). Additionally, `text_to_4lang.py` requires the [Stanford CoreNLP](http://nlp.stanford.edu/software/corenlp.shtml#Download) toolkit for parsing and coreference resolution, while the `dict_to_4lang` tool requires [jython](http://www.jython.org/downloads.html) for customized parsing via the Stanford Parser API. Both tools require a copy of the RNN-based parser model for English, which is distributed alongside the Stanford Parser.\n\nCurrently, `text_to_4lang` requires the installation of the [corenlp-server](https://github.com/kowey/corenlp-server) package. Just download the repository and follow the instructions in its README to build the package and start the server (mvn package; mvn exec:java -D server), the text_to_4lang module will then be able to connect.\n\nAfter downloading and installing these tools, all you need to do is edit the `stanford` and `corenlp` sections of the default configuration file `conf/default.cfg` so that the relevant fields point to your installations of each tool and your copy of the englishRNN.ser.gz model (more on config files below).\n\n### Downloading pre-compiled graphs\nWe provide [serialized machine graphs](http://sandbox.hlt.bme.hu/~recski/4lang/machines.tgz) built from `4lang` definitions as well as from the English Wiktionary (using the `dict_to_4lang` module). Unpacking this archive in your `4lang` directory will place them in the `data/machines` directory, which is the default location for compiled machine graphs.\n\n### Environment variables\nThe location of your installations of the above third-party tools, as well as 4lang must be specified via environment variables. These variables must always be set, there are no fallback values to avoid strange bugs. Here's an example of a `bashrc` file setting all required variables:\n\n```\nexport FOURLANGPATH=/home/recski/projects/4lang\nexport JYTHONPATH=/home/recski/projects/jython/jython/bin/jython\nexport STANFORDPATH=/home/recski/projects/stanford_dp\nexport MAGYARLANCPATH=/home/recski/projects/4lang/magyarlanc\nexport HUNTOOLSBINPATH=/home/recski/sandbox/huntools_binaries\n```\n\nNote that the `JYTHONPATH` variable must point to the jython binary directly (and not a directory), since various jython installations may have different directory structures.\n\n### Usage\n\n#### Semeval STS\nTo use `4lang` from our [Semeval STS system](https://github.com/juditacs/semeval/) you just need to edit the `4langpath` and `hunmorph_path` attributes in your semeval config file so that they point to your 4lang directory and the downloaded `hunmorph` binaries, respectively.\n\n#### Dict_to_4lang and Text_to_4lang\n\nTo run each module on small test datasets, simply run\n\n```\npython src/dict_to_4lang.py\npython src/text_to_4lang.py\n```\n\nBoth tools can be configured by editing a copy of [conf/default.cfg](conf/default.cfg) and running\n\n```\npython src/dict_to_4lang.py MY_CONFIG_FILE\n```\nto build `4lang`-style definitions from a monolingual dictionary such as Wiktionary or Longman\n\n```\ncat INPUT_FILE | python src/text_to_4lang.py MY_CONFIG_FILE\n```\nto create concept graphs from running English text\n\n\n### The config file\n\n### Contact\nThis repository is maintained by Gábor Recski. Questions, suggestions, bug reports, etc. are very welcome and can be sent by email to recski at aut bme hu.\n\n### Publications\nIf you use the `4lang` module, please cite:\n\n```\n@inproceedings{Kornai:2015a,\n    author    = {Kornai, Andr\\'as  and  \\'{A}cs, Judit  and  Makrai, M\\'{a}rton  and  Nemeskey, D\\'{a}vid M\\'{a}rk  and  Pajkossy, Katalin  and  Recski, G\\'{a}bor},\n    title     = {Competence in lexical semantics},\n    booktitle = {Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics},\n    month     = {June},\n    year      = {2015},\n    address   = {Denver, Colorado},\n    publisher = {Association for Computational Linguistics},\n    pages     = {165--175},\n    url       = {ht\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkornai%2F4lang","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkornai%2F4lang","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkornai%2F4lang/lists"}