{"id":24985584,"url":"https://github.com/castorini/castor","last_synced_at":"2025-03-17T16:10:40.136Z","repository":{"id":74010550,"uuid":"85839745","full_name":"castorini/castor","owner":"castorini","description":"PyTorch deep learning models for text processing","archived":false,"fork":false,"pushed_at":"2019-04-08T00:07:28.000Z","size":1179,"stargazers_count":176,"open_issues_count":28,"forks_count":56,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-03-02T13:11:20.343Z","etag":null,"topics":["deep-learning"],"latest_commit_sha":null,"homepage":"http://castor.ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/castorini.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-03-22T14:46:05.000Z","updated_at":"2025-01-23T05:48:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"28a27b1a-0c59-4ff2-8c86-4433a227ed9c","html_url":"https://github.com/castorini/castor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/castorini%2Fcastor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/castorini%2Fcastor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/castorini%2Fcastor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/castorini%2Fcastor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/castorini","download_url":"https://codeload.github.com/castorini/castor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244066180,"owners_count":20392406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning"],"created_at":"2025-02-04T10:53:21.634Z","updated_at":"2025-03-17T16:10:40.117Z","avatar_url":"https://github.com/castorini.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Castor\n\nThis is the common repo for deep learning models implemented in PyTorch by the Data Systems Group at the University of Waterloo.\n\n## Models\n\n### Predictions Over One Input Text Sequence\n\nMoved to https://github.com/castorini/hedwig\n\n### Predictions Over Two Input Text Sequences\n\nFor paraphrase detection, question answering, etc.\n\n+ [SM-CNN](./sm_cnn/): Siamese CNN for ranking texts [(Severyn and Moschitti, SIGIR 2015)](https://dl.acm.org/citation.cfm?id=2767738)\n+ [MP-CNN](./mp_cnn/): Multi-Perspective CNN [(He et al., EMNLP 2015)](http://anthology.aclweb.org/D/D15/D15-1181.pdf)\n+ [NCE](./nce/): Noise-Contrastive Estimation for answer selection applied on SM-CNN and MP-CNN [(Rao et al., CIKM 2016)](https://dl.acm.org/citation.cfm?id=2983872)\n+ [VDPWI](./vdpwi): Very-Deep Pairwise Word Interaction NNs for modeling textual similarity [(He and Lin, NAACL 2016)](http://www.aclweb.org/anthology/N16-1108)\n+ [IDF Baseline](./idf_baseline/): IDF overlap between question and candidate answers\n\nEach model directory has a `README.md` with further details.\n\n## Setting up PyTorch\n\n**If you are an internal Castor contributor using GPU machines in the lab, follow the instructions [here](./docs/internal-instructions.md).**\n\nCastor is designed for Python 3.6 and [PyTorch](https://pytorch.org/) 0.4.\nPyTorch recommends [Anaconda](https://www.anaconda.com/distribution/) for managing your environment.\nWe'd recommend creating a custom environment as follows:\n\n```\n$ conda create --name castor python=3.6\n$ source activate castor\n```\n\nAnd installing the packages as follows:\n\n```\n$ conda install pytorch torchvision -c pytorch\n```\n\nOther Python packages we use can be installed via pip:\n\n```\n$ pip install -r requirements.txt\n```\n\nCode depends on data from NLTK (e.g., stopwords) so you'll have to download them. Run the Python interpreter and type the commands:\n\n```python\n\u003e\u003e\u003e import nltk\n\u003e\u003e\u003e nltk.download()\n```\n\nFinally, run the following inside the `utils` directory to build the `trec_eval` tool for evaluating certain datasets.\n\n```bash\n$ ./get_trec_eval.sh\n```\n\n## Data and Pre-Trained Models\n\n**If you are an internal Castor contributor using GPU machines in the lab, follow the instructions [here](./docs/internal-instructions.md).**\n\nTo fully take advantage of code here, clone these other two repos:\n\n+ [`Castor-data`](https://git.uwaterloo.ca/jimmylin/Castor-data): embeddings, datasets, etc.\n+ [`Caster-models`](https://git.uwaterloo.ca/jimmylin/Castor-models): pre-trained models\n\nOrganize your directory structure as follows:\n\n```\n.\n├── Castor\n├── Castor-data\n└── Castor-models\n```\n\nFor example (using HTTPS):\n\n```bash\n$ git clone https://github.com/castorini/Castor.git\n$ git clone https://git.uwaterloo.ca/jimmylin/Castor-data.git\n$ git clone https://git.uwaterloo.ca/jimmylin/Castor-models.git\n```\n\nAfter cloning the Castor-data repo, you need to unzip embeddings and run data pre-processing scripts. You can choose\nto follow instructions under each dataset and embedding directory separately, or just run the following script in Castor-data\nto do all of the steps for you:\n\n```bash\n$ ./setup.sh\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcastorini%2Fcastor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcastorini%2Fcastor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcastorini%2Fcastor/lists"}