{"id":50710869,"url":"https://github.com/shu-vro/learn-agent","last_synced_at":"2026-06-09T15:03:13.705Z","repository":{"id":351597692,"uuid":"1209901611","full_name":"shu-vro/learn-agent","owner":"shu-vro","description":"A rag system to talk about Attention is all you need","archived":false,"fork":false,"pushed_at":"2026-05-27T19:05:25.000Z","size":5145,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-27T20:21:40.344Z","etag":null,"topics":["agent","docling","langchain","llm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shu-vro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-13T22:34:42.000Z","updated_at":"2026-04-26T15:40:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/shu-vro/learn-agent","commit_stats":null,"previous_names":["shu-vro/learn-agent"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/shu-vro/learn-agent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shu-vro%2Flearn-agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shu-vro%2Flearn-agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shu-vro%2Flearn-agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shu-vro%2Flearn-agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shu-vro","download_url":"https://codeload.github.com/shu-vro/learn-agent/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shu-vro%2Flearn-agent/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34112225,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","docling","langchain","llm"],"created_at":"2026-06-09T15:03:12.742Z","updated_at":"2026-06-09T15:03:13.699Z","avatar_url":"https://github.com/shu-vro.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multimodal RAG over Research Papers\n\nThis project builds a local multimodal RAG pipeline over one or more papers using Docling ingestion, Qdrant retrieval, and Ollama generation.\n\nDefault paper sources:\n\n- https://arxiv.org/pdf/1706.03762\n- https://arxiv.org/pdf/2603.15031\n\nIt uses:\n\n- Docling for PDF parsing, markdown extraction, and artifact/image generation\n- Qdrant (hybrid dense + sparse retrieval via `langchain-qdrant`)\n- `Octen/Octen-Embedding-0.6B` for dense embeddings\n- Ollama `gemma4:e2b` for response generation\n- Optional formula OCR (`pix2tex` or Ollama vision, controlled by `--equation-ocr-lib`)\n\n## Project Structure\n\n- `src/module/upload_docs.py`: ingestion workflow from source PDF(s) into Qdrant\n- `src/module/rag_agent.py`: retrieval + strict context-grounded QA/chat agent\n- `src/vector_store/qdrant_store.py`: Qdrant client, collection helpers, hybrid vector store\n- `src/lib/docling_lib.py`: Docling conversion, chunking, and artifact extraction\n- `main.py`: CLI entrypoint (`ingest`, `ask`, `chat`)\n\n## Prerequisites\n\n1. Install dependencies:\n\n```bash\nuv sync\n```\n\n2. Make sure Qdrant is running (default: `localhost:6333`).\n\nExample local run:\n\n```bash\ndocker run --rm -p 6333:6333 qdrant/qdrant\n```\n\n3. Make sure Ollama is running.\n\n4. Pull required Ollama model(s):\n\n```bash\nollama pull gemma4:e2b\n```\n\n## Build the Index\n\n```bash\nuv run main.py ingest --rebuild\n```\n\nThis ingests documents into the Qdrant collection (default: `store`) and writes extracted artifacts under `data/artifacts`.\n\n## Ask a Single Question\n\n```bash\nuv run main.py ask \"What is the core idea of scaled dot-product attention?\"\n```\n\nOptional: force re-ingestion before asking.\n\n```bash\nuv run main.py ask \"What is the core idea of scaled dot-product attention?\" --rebuild\n```\n\n## Start Interactive Chat\n\n```bash\nuv run main.py chat\n```\n\nOptional: force re-ingestion before chat.\n\n```bash\nuv run main.py chat --rebuild\n```\n\n## Current Agent Behavior\n\n- Answers are constrained to retrieved context; if information is missing, the agent explicitly says it could not find it in indexed context.\n- Responses are streamed token-by-token in the terminal.\n- Source lines are printed after each answer (`type`, `source`, `page`, `image`).\n- `chat` mode keeps in-memory conversation state and enables `SummarizationMiddleware` (trigger: 500 tokens, keep last 2 messages).\n- Usage metadata is printed in `ask` mode per call and aggregated at the end of `chat` mode.\n\n## Useful Options\n\n- Ingest multiple sources:\n\n```bash\nuv run main.py ingest --source https://arxiv.org/pdf/1706.03762 --source https://arxiv.org/pdf/2603.15031\n```\n\n- Set retrieval depth:\n\n```bash\nuv run main.py ask \"Summarize encoder-decoder attention\" --top-k 8\n```\n\n- Disable all vision enrichment:\n\n```bash\nuv run main.py ingest --rebuild --no-vision\n```\n\n- Disable only image descriptions:\n\n```bash\nuv run main.py ingest --rebuild --no-image-description\n```\n\n- Disable only formula transcription:\n\n```bash\nuv run main.py ingest --rebuild --no-formula-transcription\n```\n\n- Select formula OCR backend:\n\n```bash\nuv run main.py ingest --rebuild --equation-ocr-lib llm\n```\n\nNote: current retrieval in `rag_agent` reads from the default collection (`store`) during `ask/chat`.\n\n## Help\n\n```bash\nuv run main.py --help\n```\n\n```\nusage: main.py [-h] [--source SOURCES] [--collection-name COLLECTION_NAME] [--artifacts-dir ARTIFACTS_DIR]\n               [--embedding-model EMBEDDING_MODEL] [--llm-model LLM_MODEL] [--vision-model VISION_MODEL] [--top-k TOP_K]\n               [--no-vision] [--no-image-description] [--no-formula-transcription] [--equation-ocr-lib {local,llm}]\n               {ingest,ask,chat} ...\n\nMultimodal RAG over Attention Is All You Need using Docling + Qdrant + Ollama.\n\npositional arguments:\n  {ingest,ask,chat}\n    ingest              Ingest the paper and write vectors to Qdrant.\n    ask                 Ask one question to the RAG agent.\n    chat                Run an interactive RAG chat session.\n\noptions:\n  -h, --help            show this help message and exit\n  --source SOURCES      Paper source URL or local file path. Repeat this flag to ingest multiple papers.\n  --collection-name COLLECTION_NAME, --index-dir COLLECTION_NAME\n                        Qdrant collection name for indexed paper documents.\n  --artifacts-dir ARTIFACTS_DIR\n                        Directory for extracted markdown and images.\n  --embedding-model EMBEDDING_MODEL\n                        SentenceTransformer embedding model name.\n  --llm-model LLM_MODEL\n                        Ollama text generation model for QA.\n  --vision-model VISION_MODEL\n                        Ollama vision model used for image descriptions.\n  --top-k TOP_K         Number of retrieved chunks for each question.\n  --no-vision           Disable all vision features (image descriptions and formula transcription).\n  --no-image-description\n                        Disable image descriptions while keeping other vision features enabled.\n  --no-formula-transcription\n                        Disable formula LaTeX transcription from formula images.\n  --equation-ocr-lib {local,llm}\n                        Formula OCR backend for LaTeX transcription (local=pix2tex, llm=Ollama vision).\n```\n\n\u003e [!CAUTION]\n\u003e This project is a work in progress and may contain incomplete features, bugs, or suboptimal implementations. It is intended for educational and experimental purposes only. Use at your own risk.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshu-vro%2Flearn-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshu-vro%2Flearn-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshu-vro%2Flearn-agent/lists"}