{"id":16978023,"url":"https://github.com/singhpratyush/index-search-query","last_synced_at":"2025-04-12T01:36:24.958Z","repository":{"id":102721216,"uuid":"105985434","full_name":"singhpratyush/index-search-query","owner":"singhpratyush","description":"Inverted Index, Query Formulation and Ranking from Scratch in Python","archived":false,"fork":false,"pushed_at":"2018-04-24T05:43:33.000Z","size":23,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-25T21:21:34.368Z","etag":null,"topics":["indexing","multithreading","pipenv","python","query","query-building","ranking","searching","stemming"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/singhpratyush.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-06T09:01:08.000Z","updated_at":"2025-01-21T03:57:05.000Z","dependencies_parsed_at":"2023-11-01T22:15:07.448Z","dependency_job_id":null,"html_url":"https://github.com/singhpratyush/index-search-query","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/singhpratyush%2Findex-search-query","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/singhpratyush%2Findex-search-query/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/singhpratyush%2Findex-search-query/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/singhpratyush%2Findex-search-query/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/singhpratyush","download_url":"https://codeload.github.com/singhpratyush/index-search-query/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248504848,"owners_count":21115211,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["indexing","multithreading","pipenv","python","query","query-building","ranking","searching","stemming"],"created_at":"2024-10-14T01:30:44.104Z","updated_at":"2025-04-12T01:36:24.932Z","avatar_url":"https://github.com/singhpratyush.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Index Search Query\n\nInverted Index, Query Formulation and Ranking from Scratch in Python.\n\n\u003e Part of Information Retrieval Lab (Autumn 2017-18)\n\n## Part 1: The Inverted Index\n\n### Dataset\n\nThe dataset used for this purpose is taken from the `FIRE 2011` corpus. It can be downloaded from [here](http://www.isical.ac.in/~fire/data/docs/adhoc/en.docs.2011.tar.gpg). It contains articles from two different magazines. The methods for handling these files are present in the [`magazine_index`](magazine_index) package.\n\n### Usage\n\nIf you wish to index all the files recursively from a directory, use the following command -\n\n```bash\n$ python lab1.py path/to/files\n```\n\nThis will create an inverted index and save it to a file called `index.bin`. You can directly use this file if created already by not passing any argument to the script -\n\n```bash\n$ python lab1.py\nLoading index from \"index.bin\"\n\u003cIndex documents=392577 words=105314026\u003e\n...\n```\n\n### Using a pre-built index\n\nSince indexing documents can take a lot of time, here are some already indexed files which can be renamed to `index.bin` and used directly -\n\n| Name | Link | Size | Comments |\n|------|------|------|----------|\n| `index.bin` | [LINK](https://drive.google.com/open?id=0BxDMRh_L_8pOUzlZQ0JJMUtYd1E) | 478 MB | Full index, 392k documents |\n| `index.bin.bak1` | [LINK](https://drive.google.com/file/d/0BxDMRh_L_8pObWU0ZkE1NHBTUUU/view?usp=sharing) | 374 MB | 303k documents |\n| `index.bin.bak` | [LINK](https://drive.google.com/file/d/0BxDMRh_L_8pOYmRKU0I5MWJhbG8/view?usp=sharing) | 36 MB | 25.8k documents |\n\n### Example\n\n```bash\n$ python lab_1.py\nLoading index from \"index.bin\"\n\u003cIndex documents=303290 words=83225120\u003e\nPlease start entering words to get top 5 documents containing them (CTRL+C to exit) -\nEnter word: market\n[('1100110_calcutta_story_11965855.utf8', 58), ('1070603_calcutta_story_7858507.utf8', 31), ('1100326_opinion_story_12251777.utf8', 30), ('1050912_frontpage_story_5227346.utf8', 30), ('1040406_opinion_story_2948544.utf8', 29)]\nEnter word: delhi\n[('1080422_sports_ipl.utf8', 30), ('1031223_opinion_story_2710457.utf8', 28), ('1090225_sports_story_10587273.utf8', 22), ('1090812_sports_story_11351508.utf8', 21), ('1100223_sports_story_12140507.utf8', 21)]\nEnter word: messi\n[('1100612_sports_story_12557276.utf8', 27), ('1100527_sports_story_12492679.utf8', 17), ('1100619_sports_story_12582889.utf8', 17), ('1090529_calcutta_story_11031479.utf8', 16), ('1100613_frontpage_story_12560387.utf8', 12)]\n```\n\n\n\n## Part 2: Ranking of Documents\n\n### Usage\n\nYou can use the pre-built index here.\n\n```bash\n$ python lab_2.py index.bin\nLoading index from index.bin\nEnter query: programming\nen.15.66.21.2008.5.9 : 97.3490637742472\nen.3.347.409.2010.2.2 : 79.64923399711134\nen.3.373.142.2007.6.8 : 53.09948933140756\nen.3.406.410.2007.11.18 : 48.6745318871236\nen.2.296.350.2010.1.20 : 48.6745318871236\nen.3.393.372.2007.9.23 : 44.24957444283963\nen.3.321.344.2009.7.31 : 44.24957444283963\nen.3.373.299.2007.6.11 : 44.24957444283963\nen.15.109.486.2009.4.1 : 44.24957444283963\nen.3.393.75.2007.9.24 : 44.24957444283963\n```\n\n\n---\n\n## Development\n\n`pipenv` is used for this project - \n\n```bash\n$ sudo -H pip install pipenv\n```\n\nTo install dependencies, simply\n\n```bash\n$ pipenv install\n```\n\nTo enter a virtualenv shell\n\n```bash\n$ pipenv shell\n```\n\nThis will spawn a new shell where all dependencies will be present.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsinghpratyush%2Findex-search-query","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsinghpratyush%2Findex-search-query","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsinghpratyush%2Findex-search-query/lists"}