{"id":23681495,"url":"https://github.com/hs094/sciatica","last_synced_at":"2026-01-03T02:30:15.661Z","repository":{"id":266508011,"uuid":"898457566","full_name":"hs094/Sciatica","owner":"hs094","description":"Sciatica is a powerful semantic search engine designed for academic literature exploration. This tool leverages cutting-edge transformer models to deliver precise and contextually relevant search results.","archived":false,"fork":false,"pushed_at":"2025-01-21T13:39:00.000Z","size":75027,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-21T14:32:25.013Z","etag":null,"topics":["albert-model","bert-model","deep-learning","information-retrieval","machine-learning","nlp","research-tools","roberta-model","specter-model","streamlit-webapp","transformer-models"],"latest_commit_sha":null,"homepage":"https://sciatica-research.streamlit.app/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hs094.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-04T12:35:16.000Z","updated_at":"2025-01-21T13:39:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"92cf5b8c-4c61-4d2a-ac20-72f1b6da1af9","html_url":"https://github.com/hs094/Sciatica","commit_stats":null,"previous_names":["hs094/sciatica"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hs094%2FSciatica","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hs094%2FSciatica/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hs094%2FSciatica/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hs094%2FSciatica/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hs094","download_url":"https://codeload.github.com/hs094/Sciatica/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239728257,"owners_count":19687319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["albert-model","bert-model","deep-learning","information-retrieval","machine-learning","nlp","research-tools","roberta-model","specter-model","streamlit-webapp","transformer-models"],"created_at":"2024-12-29T18:39:33.047Z","updated_at":"2026-01-03T02:30:15.613Z","avatar_url":"https://github.com/hs094.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# \u003cdiv align=\"center\"\u003eSCIATICA\u003c/div\u003e\n\n\u003e Repository for End term submission for Information Retrieval course (CS60092) offered in Spring semester 2023, Department of CSE, IIT Kharagpur.\n\n\u003c!-- PROJECT LOGO --\u003e\n\u003cbr /\u003e\n\u003cdiv align=\"center\"\u003e\n    \u003c!-- \u003cimg width=\"200\" src=\"https://user-images.githubusercontent.com/86282911/230894496-b9402384-bf0a-4bf7-afbf-2207aa2d31be.png\"\u003e\n   --\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ci\u003eResearch for research papers\u003c/i\u003e\n    \u003cbr /\u003e\n    \u003cbr /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge\u0026logo=googlecolab\u0026color=525252\" /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Jupyter-F37626.svg?\u0026style=for-the-badge\u0026logo=Jupyter\u0026logoColor=white\" /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Python-FFD43B?style=for-the-badge\u0026logo=python\u0026logoColor=blue\"/\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Streamlit-FF4B4B?style=for-the-badge\u0026logo=Streamlit\u0026logoColor=white\"\u003e\n    \u003cbr /\u003e\n    \u003cbr /\u003e\n    \u003ca href=\"https://github.com/outer-rim/Sciatica/issues\"\u003eReport Bug\u003c/a\u003e\n    ·\n    \u003ca href=\"https://github.com/outer-rim/Sciatica/issues\"\u003eRequest Feature\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/div\u003e\n\n\n\u003c!-- TABLE OF CONTENTS --\u003e\n\u003cdetails\u003e\n  \u003csummary\u003eTable of Contents\u003c/summary\u003e\n  \u003col\u003e\n    \u003cli\u003e\n      \u003ca href=\"#about-the-project\"\u003eAbout The Project\u003c/a\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#getting-started\"\u003eGetting Started\u003c/a\u003e\n      \u003cul\u003e\n        \u003cli\u003e\u003ca href=\"#directory-structure\"\u003eDirectory structure\u003c/a\u003e\u003c/li\u003e\n        \u003c!-- \u003cul\u003e\n          \u003cli\u003e\u003ca href=\"#chromium-based-browsers\"\u003eChromium Based Browsers\u003c/a\u003e\u003c/li\u003e\n          \u003cli\u003e\u003ca href=\"#firefox\"\u003eFirefox\u003c/a\u003e\u003c/li\u003e\n        \u003c/ul\u003e --\u003e\n      \u003c/ul\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#colab-notebooks\"\u003eColab Notebooks\u003c/a\u003e\u003c/li\u003e\n    \u003c!-- \u003cli\u003e\u003ca href=\"#contact\"\u003eContact\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#acknowledgments\"\u003eAcknowledgments\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#miscelleneous\"\u003eMiscelleneous\u003c/a\u003e\u003c/li\u003e     --\u003e\n  \u003c/ol\u003e\n\u003c/details\u003e\n\n\n\u003c!-- ABOUT THE PROJECT --\u003e\n## About The Project\n\nThis project is an attempt of implementing and improving on the work of Sheshera Mysore, Tim O'Gorman, Andrew McCallum, Hamed Zamani titled `CSFCube - A Test Collection of Computer Science Papers for Faceted Query by Example`\n\nThe dataset can be found [here](https://github.com/iesl/CSFCube)\n\nThe paper describing the dataset can be accessed [here](https://arxiv.org/abs/2103.12906)\n\nDemo video:\n\nTeam members:\n\n- Ashwani Kumar Kamal - 20CS10011\n- Hardik Pravin Soni - 20CS30023\n- Shiladitya De - 20CS30061\n- Sourabh Soumyakanta Das - 20CS30051\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\n\u003c!-- GETTING STARTED --\u003e\n## Getting Started\n\nA quick introduction of the minimal setup you need to get the application up\n\n```shell\nconda env create -f environment.yaml\nconda activate sciatica-env\nstreamlit run deploy.py\n```\n\n### Directory Structure\n\n- Any `.ipynb` files that need to be run must be placed in this root directory which will contain the `/data` directory and `/Results` directory.\n\n- The `data` directory contains the CSFCube dataset\n\n```shell\n.\n├── abstracts-csfcube-preds.json\n├── abstracts-csfcube-preds.jsonl\n├── abstracts-csfcube-preds-no-unicode.jsonl\n├── evaluation_splits.json\n├── test-pid2anns-csfcube-background.json\n├── test-pid2anns-csfcube-method.json\n├── test-pid2anns-csfcube-result.json\n└── test-pid2pool-csfcube.json\n```\n\n- The `Results` directory contains the embeddings generated from the models used\n\n```shell\n.\n├── alberta\n│   ├── all.json\n│   ├── background.json\n│   ├── method.json\n│   ├── result.json\n│   ├── test-pid2pool-csfcube-alberta-background-ranked.json\n│   ├── test-pid2pool-csfcube-alberta-method-ranked.json\n│   └── test-pid2pool-csfcube-alberta-result-ranked.json\n├── allenai_specter\n│   ├── all.json\n│   ├── background.json\n│   ├── method.json\n│   ├── result.json\n│   ├── test-pid2pool-csfcube-allenai_specter-background-ranked.json\n│   ├── test-pid2pool-csfcube-allenai_specter-method-ranked.json\n│   └── test-pid2pool-csfcube-allenai_specter-result-ranked.json\n├── all_mpnet_base_v2\n│   ├── all.json\n│   ├── background.json\n│   ├── method.json\n│   ├── result.json\n│   ├── test-pid2pool-csfcube-all_mpnet_base_v2-background-ranked.json\n│   ├── test-pid2pool-csfcube-all_mpnet_base_v2-method-ranked.json\n│   └── test-pid2pool-csfcube-all_mpnet_base_v2-result-ranked.json\n├── bert_nli\n│   ├── all.json\n│   ├── background.json\n│   ├── method.json\n│   ├── result.json\n│   ├── test-pid2pool-csfcube-bert_nli-background-ranked.json\n│   ├── test-pid2pool-csfcube-bert_nli-method-ranked.json\n│   └── test-pid2pool-csfcube-bert_nli-result-ranked.json\n├── bert_pp\n│   ├── all.json\n│   ├── background.json\n│   ├── method.json\n│   ├── result.json\n│   ├── test-pid2pool-csfcube-bert_pp-background-ranked.json\n│   ├── test-pid2pool-csfcube-bert_pp-method-ranked.json\n│   └── test-pid2pool-csfcube-bert_pp-result-ranked.json\n├── distilbert_nli\n│   ├── all.json\n│   ├── background.json\n│   ├── method.json\n│   ├── result.json\n│   ├── test-pid2pool-csfcube-distilbert_nli-background-ranked.json\n│   ├── test-pid2pool-csfcube-distilbert_nli-method-ranked.json\n│   └── test-pid2pool-csfcube-distilbert_nli-result-ranked.json\n└── ensemble\n    ├── test-pid2pool-csfcube-ensemble-background-ranked.json\n    ├── test-pid2pool-csfcube-ensemble-method-ranked.json\n    └── test-pid2pool-csfcube-ensemble-result-ranked.json\n```\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n## Colab Notebooks\n\n- [Base Model](https://colab.research.google.com/drive/1PdvyNnlwA4eUyS6pTumNFI6OTbFY-GYc?usp=sharing)\n\nThis notebook contains the code for generating embeddings from the base models. Avoid running it as it takes a long time to run. The embeddings are already provided in the Googe Drive of IR Submission Files.\n\n- [Fine Tuning DistilBERT (Grid Search)](https://colab.research.google.com/drive/1uax2mYhE2dTxSsZAx1jPaiy7TkTHNP7P?usp=sharing)\n\nThis is for the fine tuning of the Distilbert model. The results are already present in it. Avoid ruuning it as it takes a long time.\n\n- [Ensembling models](https://colab.research.google.com/drive/1rOpEhRB_eAXny721EhDHkQ4l9kOkhhZ6?usp=sharing)\n\nRun each cell of this jupyter notebook and at the second last cell change the queries as per choice and then run both the cells (itself and after it) and it gives the results.\n\n- [IR Submission Files (Google Drive)](https://drive.google.com/drive/folders/1Wz8Pjxzp_hn4axTpviRWurgP6N57BzcK?usp=sharing)\n\n\nApart rom all this We are also submitting a zip of the local copies and reports of the .ipynb files which can be run locally. \n[Note] Please change the file directories strings in the notebooks appropriately to avoid any errors.\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhs094%2Fsciatica","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhs094%2Fsciatica","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhs094%2Fsciatica/lists"}