{"id":44902909,"url":"https://github.com/orenmel/context2vec","last_synced_at":"2026-03-02T15:00:29.413Z","repository":{"id":72773408,"uuid":"63874881","full_name":"orenmel/context2vec","owner":"orenmel","description":null,"archived":false,"fork":false,"pushed_at":"2018-10-08T14:07:54.000Z","size":40,"stargazers_count":215,"open_issues_count":16,"forks_count":60,"subscribers_count":17,"default_branch":"master","last_synced_at":"2023-10-20T22:13:14.279Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/orenmel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-07-21T14:09:47.000Z","updated_at":"2023-10-20T22:13:24.162Z","dependencies_parsed_at":"2023-05-13T07:45:19.645Z","dependency_job_id":null,"html_url":"https://github.com/orenmel/context2vec","commit_stats":null,"previous_names":[],"tags_count":1,"template":null,"template_full_name":null,"purl":"pkg:github/orenmel/context2vec","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orenmel%2Fcontext2vec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orenmel%2Fcontext2vec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orenmel%2Fcontext2vec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orenmel%2Fcontext2vec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/orenmel","download_url":"https://codeload.github.com/orenmel/context2vec/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orenmel%2Fcontext2vec/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30007043,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-02T14:08:50.421Z","status":"ssl_error","status_checked_at":"2026-03-02T14:08:50.037Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-17T22:01:18.262Z","updated_at":"2026-03-02T15:00:29.407Z","avatar_url":"https://github.com/orenmel.png","language":"Python","readme":"# The context2vec toolkit\n\nWith this code you can:\n* Use our pre-trained models to represent sentential contexts of target words and target words themselves with low-dimensional vector representations.\n* Learn your own context2vec models with your choice of a learning corpus and hyperparameters.\n\nPlease cite the following paper if using the code:\n\n**context2vec: Learning Generic Context Embedding with Bidirectional LSTM**  \nOren Melamud, Jacob Goldberger, Ido Dagan. CoNLL, 2016 [[pdf]](http://u.cs.biu.ac.il/~melamuo/publications/context2vec_conll16.pdf).\n\n## Requirements\n\n* Python 3.6\n* Chainer 4.2 ([chainer](http://chainer.org/))\n* NLTK 3.0 ([NLTK](http://www.nltk.org/))  - optional (only required for the AWE baseline and MSCC evaluation)\n\nNote: Release 1.0 includes the original code that was used in the context2vec paper and has different dependencies (Python 2.7 and Chainer 1.7).\n\n## Installation\n\n* Download the code\n* ```python setup.py install```\n\n## Quick-start\n\n* Download pre-trained context2vec models from [[here]](http://u.cs.biu.ac.il/~nlp/resources/downloads/context2vec/)\n* Unzip a model into MODEL_DIR\n* Run:\n```\npython context2vec/eval/explore_context2vec.py MODEL_DIR/MODEL_NAME.params\n\u003e\u003e this is a [] book\n```\n* This will embed the entire sentential context 'this is a \\_\\_ book' and will output the top-10 target words whose embeddings are closest to that of the context.\n* Use this as sample code to help you integrate context2vec into your own application.\n\n## Training a new context2vec model\n\n* CORPUS_FILE needs to contain your learning corpus with one sentence per line and tokens separated by spaces.\n* Run:\n```\npython context2vec/train/corpus_by_sent_length.py CORPUS_FILE [max-sentence-length]\n```\n* This will create a directory CORPUS_FILE.DIR that will contain your preprocessed learning corpus\n* Run:\n```\npython context2vec//train/train_context2vec.py -i CORPUS_FILE.DIR  -w  WORD_EMBEDDINGS -m MODEL  -c lstm --deep yes -t 3 --dropout 0.0 -u 300 -e 10 -p 0.75 -b 100 -g 0\n```\n* This will create WORD_EMBEDDINGS.targets file with your target word embeddings, a MODEL file, and a MODEL.params file. Put all of these in the same directory MODEL_DIR and you're done.\n* See usage documentation for all run-time parameters.\n  \nNOTE:   \n* The current code lowercases all corpus words\n* Use of a gpu and mini-batching is highly recommended to achieve good training speeds\n\n### Avoiding exploding gradients\n\nSome users have noted that this configuration can cause exploding gradients\n[(see issue #6)](https://github.com/orenmel/context2vec/issues/6). One option\nis to turn down the learning rate, by reducing the Adam optimizer's alpha from\n0.001 to something lower, e.g. by specifying `-a 0.0005`. As an extra safety\nmeasure, you can enable gradient clipping which could be set to 5 by using the\nvery scientific method of using the value everyone else seems to be using `-gc\n5`.\n\n## Evaluation\n\n### Microsoft Sentence Completion Challenge (MSCC)\n\n* Download the train and test datasets from [[here]](https://www.microsoft.com/en-us/research/project/msr-sentence-completion-challenge/).\n* Split the test files into dev and test if you wish to do development tuning.\n* Download the pre-trained context2vec model for MSCC from [[here]](http://u.cs.biu.ac.il/~nlp/resources/downloads/context2vec/);\n* Or alternatively train your own model as follows:\n\t- Run ```context2vec/eval/mscc_text_tokenize.py INPUT_FILE OUTPUT_FILE``` for every INPUT_FILE in the MSCC train set.\n\t- Concatenate all output files into one large learning corpus file.\n\t- Train a model as explained above.\n* Run:  \n```\npython context2vec/eval/sentence_completion.py Holmes.machine_format.questions.txt Holmes.machine_format.answers.txt RESULTS_FILE MODEL_NAME.params\n```\n\n\n### Senseval-3\n\n* Download the 'English lexical sample' train and test datasets from [[here]](http://web.eecs.umich.edu/~mihalcea/senseval/senseval3/data.html).\n* Download the senseval scorer script(scorer2) from [[here]](http://web.eecs.umich.edu/~mihalcea/senseval/senseval3/scoring/scorer2.c) and build it.\n* Train your own context2vec model or use one of the pre-trained models provided.\n* For development runs do:\n```\npython context2vec/eval/wsd/wsd_main.py EnglishLS.train EnglishLS.train RESULTS_FILE MODEL_NAME.params 1\n```\n```\nscorer2 RESULTS_FILE EnglishLS.train.key EnglishLS.sensemap\n```\n* For test runs do:\n```\npython context2vec/eval/wsd/wsd_main.py EnglishLS.train EnglishLS.test RESULTS_FILE MODEL_NAME.params 1\n```\n```\nscorer2 RESULTS_FILE EnglishLS.test.key EnglishLS.sensemap\n```\n\n\n\n### Lexical Substitution\n\nThe code for the lexical substitution evaluation is included in a separate repository [[here]](https://github.com/orenmel/lexsub).\n\n## Known issues\n\n* All words are converted to lowercase.\n* Using gpu and/or mini-batches is not supported at test time.\n\n\n## License\n\nApache 2.0\n\n\n\n\n\n","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forenmel%2Fcontext2vec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Forenmel%2Fcontext2vec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forenmel%2Fcontext2vec/lists"}