{"id":16515371,"url":"https://github.com/njmarko/googolplex-pdf-search","last_synced_at":"2025-10-28T04:32:59.357Z","repository":{"id":84214208,"uuid":"503149864","full_name":"njmarko/googolplex-pdf-search","owner":"njmarko","description":"Python program for searching pdf text, ranking the results and exporting highlighted search results in pdf. Uses trie structure, stack, heap, page graph. Converts queries to postfix notation. Allows for logical expressions and phrases. Offers did you mean functionality.","archived":false,"fork":false,"pushed_at":"2024-08-28T13:29:22.000Z","size":6514,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-01T11:41:23.480Z","etag":null,"topics":["autocomplete","datastructures-algorithms","didyoumean","graph","heap","pdf-generation","pdf-highlighter","pdf-search","postfix-evaluation","stack","trie"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/njmarko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-13T23:55:06.000Z","updated_at":"2024-08-28T13:29:25.000Z","dependencies_parsed_at":"2023-05-24T00:15:20.673Z","dependency_job_id":null,"html_url":"https://github.com/njmarko/googolplex-pdf-search","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/njmarko%2Fgoogolplex-pdf-search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/njmarko%2Fgoogolplex-pdf-search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/njmarko%2Fgoogolplex-pdf-search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/njmarko%2Fgoogolplex-pdf-search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/njmarko","download_url":"https://codeload.github.com/njmarko/googolplex-pdf-search/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238597388,"owners_count":19498396,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autocomplete","datastructures-algorithms","didyoumean","graph","heap","pdf-generation","pdf-highlighter","pdf-search","postfix-evaluation","stack","trie"],"created_at":"2024-10-11T16:16:53.256Z","updated_at":"2025-10-28T04:32:50.776Z","avatar_url":"https://github.com/njmarko.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# googolplex-pdf-search\nPython program for searching pdf text, ranking the results and exporting highlighted search results in pdf. Uses trie structure, stack, heap, page graph. Converts queries to postfix notation. Allows for logical expressions and phrases. Offers did you mean functionality.\n\n## Required libraries\n- PyMuPDF\n- didyoumean.py\n\n## How to install and run the program\n\n1. Create a virtual environment in the project directory:\n```virtualenv venv```\n2. Activate the virtual environment:\n\n    2.1. For Windows:\n```venv\\Scripts\\activate```\n\n    2.2. For Linux:\n```source venv/bin/activate```\n3. Install the required libraries:\n```pip install -r requirements.txt```\n4. Run the program:\n```python main.py```\n5. All in one command:\n\n    5.1. For linux\n   ```virtualenv venv \u0026\u0026 source venv/bin/activate \u0026\u0026 pip install -r requirements.txt \u0026\u0026 python main.py```\n\n    5.1. For windows (if using Powershell)\n```virtualenv venv; venv\\Scripts\\Activate; pip install -r requirements.txt; python main.py```\n\n## Application screenshots\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"signal-visualization\" align=\"center\" width=\"100%\" src=\"https://user-images.githubusercontent.com/34657562/173469590-357ebc6b-b0ca-4c3b-963d-bec9cd4d6dc7.png\" /\u003e\n  \u003cp align=\"center\"\u003eIlustration 1 - Loading bar.\u003c/p\u003e\n\u003c/div\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"signal-visualization\" align=\"center\" width=\"100%\" src=\"https://user-images.githubusercontent.com/34657562/173468613-f2f6bb18-5223-451e-af30-8d871cf65e85.png\" /\u003e\n  \u003cp align=\"center\"\u003eIlustration 2 - Autocomplete feature.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"signal-visualization\" align=\"center\" width=\"100%\" src=\"https://user-images.githubusercontent.com/34657562/173472886-1813ea3c-9586-4493-96a9-dc35c05c9af1.png\" /\u003e\n  \u003cp align=\"center\"\u003eIlustration 3 - Did you mean functionality.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"signal-visualization\" align=\"center\" width=\"100%\" src=\"https://user-images.githubusercontent.com/34657562/173468848-4131545b-40a8-421a-9fa1-3f89857bb679.png\" /\u003e\n  \u003cp align=\"center\"\u003eIlustration 4 - Third page of results for the search query graph.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"signal-visualization\" align=\"center\" width=\"100%\" src=\"https://user-images.githubusercontent.com/34657562/173469035-59f6df96-20f0-4589-bed3-3994fbd2f9eb.png\" /\u003e\n  \u003cp align=\"center\"\u003eIlustration 5 - Complex logical query with OR, AND and grouping with brackets.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"signal-visualization\" align=\"center\" width=\"100%\" src=\"https://user-images.githubusercontent.com/34657562/173469726-42ab96b6-4e57-43d1-b01c-c83c5faa0347.png\" /\u003e\n  \u003cp align=\"center\"\u003eIlustration 6 - Complex logical query with negation (NOT) and grouping with brackets.\u003c/p\u003e\n\u003c/div\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"signal-visualization\" align=\"center\" width=\"100%\" src=\"https://user-images.githubusercontent.com/34657562/173472600-e7f8c197-b63f-4b48-a34f-17cb6dd86d47.png\" /\u003e\n  \u003cp align=\"center\"\u003eIlustration 7 - Phrase search for \"skip list\" by using the double quotes.\u003c/p\u003e\n\u003c/div\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"signal-visualization\" align=\"center\" width=\"100%\" src=\"https://user-images.githubusercontent.com/34657562/173472562-6d656436-4575-48df-bc06-58c551ec9314.png\" /\u003e\n  \u003cp align=\"center\"\u003eIlustration 8 - Generated pdf with highlighted search query \"skip list\".\u003c/p\u003e\n\u003c/div\u003e\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnjmarko%2Fgoogolplex-pdf-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnjmarko%2Fgoogolplex-pdf-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnjmarko%2Fgoogolplex-pdf-search/lists"}