{"id":21239173,"url":"https://github.com/cutupdev/pdf-ask-system","last_synced_at":"2025-03-15T03:40:27.703Z","repository":{"id":222873646,"uuid":"758166901","full_name":"cutupdev/pdf-ask-system","owner":"cutupdev","description":"This repository is the question-answering system for PDF files. You can upload pdf file to this system and can ask about the content.","archived":false,"fork":false,"pushed_at":"2024-02-16T17:43:10.000Z","size":435,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-21T19:22:32.309Z","etag":null,"topics":["embeddings","indexing","openai","python"],"latest_commit_sha":null,"homepage":"https://sharly.ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cutupdev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-02-15T18:52:26.000Z","updated_at":"2024-04-18T17:56:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"66304dfa-f1f7-45e9-959f-a29b02fdd1c8","html_url":"https://github.com/cutupdev/pdf-ask-system","commit_stats":null,"previous_names":["catlover75926/pdf-ask-system","harmonitech/pdf-ask-system","cutupdev/pdf-ask-system"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cutupdev%2Fpdf-ask-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cutupdev%2Fpdf-ask-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cutupdev%2Fpdf-ask-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cutupdev%2Fpdf-ask-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cutupdev","download_url":"https://codeload.github.com/cutupdev/pdf-ask-system/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243680974,"owners_count":20330154,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embeddings","indexing","openai","python"],"created_at":"2024-11-21T00:42:19.292Z","updated_at":"2025-03-15T03:40:27.685Z","avatar_url":"https://github.com/cutupdev.png","language":"Python","readme":"# Ask my PDF\n\n\n\nThank you for your interest in my application. Please be aware that this is only a **Proof of Concept system** and may contain bugs or unfinished features.\n\n\n\n### Ask my PDF - Question answering system built on top of GPT3\n\n\n\n🎲 The primary use case for this app is to assist users in answering  questions about board game rules based on the instruction manual. While  the app can be used for other tasks, helping users with board game rules is particularly meaningful to me since I'm an avid fan of board games  myself. Additionally, this use case is relatively harmless, even in  cases where the model may experience hallucinations.\n\n\n\n📄 The app implements the following academic papers:\n\n- [In-Context Retrieval-Augmented Language Models](https://arxiv.org/abs/2302.00083) aka **RALM**\n\n- [Precise Zero-Shot Dense Retrieval without Relevance Labels](https://arxiv.org/abs/2212.10496) aka **HyDE** (Hypothetical Document Embeddings)\n\n\n\n### Installation\n\n\n\n1. Clone the repo:\n\n   `git clone https://github.com/catlover75926/pdf-ask-system`\n\n2. Install dependencies:\n\n   `pip install -r ask-my-pdf/requirements.txt`\n\n3. Run the app:\n\n   `cd ask-my-pdf/src`\n   \n   `run.sh` or `run.bat`\n\n\n\n### High-level documentation\n\n\n\n#### RALM + HyDE\n\n![RALM + HyDE](docs/ralm_hyde.jpg)\n\n\n\n#### RALM + HyDE + context\n\n![RALM + HyDE + context](docs/ralm_hyde_wc.jpg)\n\n\n\n### Environment variables used for configuration\n\n\n\n##### General configuration:\n\n- **STORAGE_SALT** - cryptograpic salt used when deriving user/folder name and encryption key from API key, hexadecimal notation, 2-16 characters\n\n- **STORAGE_MODE** - index storage mode:  S3, LOCAL, DICT (default)\n\n- **STATS_MODE** - usage stats storage mode: REDIS, DICT (default)\n\n- **FEEDBACK_MODE** - user feedback storage mode: REDIS, NONE (default)\n\n- **CACHE_MODE** - embeddings cache mode: S3, DISK, NONE (default)\n\n  \n\n##### Local filesystem configuration (storage / cache):\n\n- **STORAGE_PATH** - directory path for index storage\n\n- **CACHE_PATH** - directory path for embeddings cache\n\n  \n\n##### S3 configuration (storage / cache):\n\n- **S3_REGION** - region code\n\n- **S3_BUCKET** - bucket name (storage)\n\n- **S3_SECRET** - secret key\n\n- **S3_KEY** - access key\n\n- **S3_URL** - URL\n\n- **S3_PREFIX** - object name prefix\n\n- **S3_CACHE_BUCKET** - bucket name (cache)\n\n- **S3_CACHE_PREFIX** - object name prefix (cache)\n\n  \n\n##### Community version related options:\n\n- **OPENAI_KEY** - API key used for the default user\n- **COMMUNITY_DAILY_USD** - default user's daily budget\n- **COMMUNITY_USER** - default user's code\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcutupdev%2Fpdf-ask-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcutupdev%2Fpdf-ask-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcutupdev%2Fpdf-ask-system/lists"}