{"id":31746762,"url":"https://github.com/kinyugo/madin","last_synced_at":"2026-05-17T15:36:32.497Z","repository":{"id":315223530,"uuid":"1055154327","full_name":"Kinyugo/madin","owner":"Kinyugo","description":"An Agentic Framework for Document Retrieval","archived":false,"fork":false,"pushed_at":"2025-09-23T16:16:08.000Z","size":736,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-01T04:31:04.682Z","etag":null,"topics":["agentic-rag","ai","ai-agents","llm","rag"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Kinyugo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-11T21:11:19.000Z","updated_at":"2025-09-23T16:15:50.000Z","dependencies_parsed_at":"2025-09-17T12:00:29.293Z","dependency_job_id":"15138f84-ed72-4874-92ae-68dc1ba279d4","html_url":"https://github.com/Kinyugo/madin","commit_stats":null,"previous_names":["kinyugo/madin"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Kinyugo/madin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kinyugo%2Fmadin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kinyugo%2Fmadin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kinyugo%2Fmadin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kinyugo%2Fmadin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Kinyugo","download_url":"https://codeload.github.com/Kinyugo/madin/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kinyugo%2Fmadin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001540,"owners_count":26083102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-rag","ai","ai-agents","llm","rag"],"created_at":"2025-10-09T13:18:31.116Z","updated_at":"2025-10-09T13:18:32.486Z","avatar_url":"https://github.com/Kinyugo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# madin: An Agentic Framework for Document Retrieval\n\nmadin is a Python framework for building sophisticated, agentic retrieval systems over structured and unstructured documents.\n\nAt its core, madin represents documents as structured trees, enabling LLM-powered agents to perform nuanced, fine-grained searches that go beyond simple vector-based retrieval.\n\n## Key Features\n\n- **Agentic Tree Search:** The primary retrieval mechanism is an agentic tree search, allowing an LLM-powered agent to navigate the document's hierarchical structure to find precise answers.\n\n- **Intelligent Content Chunking:** The document processing pipeline intelligently chunks content from documents. This is a crucial step for enabling hybrid retrieval strategy, as it creates the document segments used for the initial vector search.\n\n- **Rich Metadata Enrichment:** The document processing pipeline automatically enriches the nodes of the document tree with metadata, such as keywords and named entities. This provides the agent with valuable context for more effective and fine-grained tree traversal.\n\n- **Flexible LLM Agent Support:** The framework is built on `pydantic-ai`, allowing for easy integration with any supported Large Language Model (LLM).\n\n- **Internal Structuring of Documents:** madin can process both structured (like Markdown) and unstructured documents by imposing an internal tree-like structure, making any document amenable to agentic tree search.\n\n## Getting Started\n\n### Installation\n\nTo get started with madin, clone the repository and install the dependencies using `uv`.\n\n```bash\ngit clone https://github.com/Kinyugo/madin.git\ncd madin\nuv sync --all-extras\nuv pip install -e .\n```\n\n## Usage\n\nThe core workflow of madin involves processing a document into a structured tree, and then using an agent to retrieve information from it. Here is a complete example of how to perform agentic retrieval on a single document.\n\n```python\nimport asyncio\nimport textwrap\n\nimport logfire\nfrom dotenv import load_dotenv\n\nfrom madin import (\n    AgentConfig,\n    Document,\n    DocumentProcessingConfig,\n    build_document_retrieval_agent,\n    flat_document_to_tree,\n    get_document_node_by_id,\n    process_document,\n    redact_document,\n    retrieve_documents,\n)\n\n# --- Configuration ---\n\n# Load environment variables from a .env file (e.g., for API keys)\nload_dotenv(\"path/to/.env\")\n\n# Configure Logfire for observability\nlogfire.configure()\nlogfire.instrument_pydantic_ai()\n\n# Define the model to be used by the agents for all tasks\nAGENT_MODEL = \"openai:gpt-5-mini\"\n\n\nasync def main() -\u003e None:\n    \"\"\"Runs the main madin demonstration workflow.\"\"\"\n    print(\"🚀 Starting the madin example workflow...\")\n\n    # --- 1. Create and Process a Document ---\n    markdown_content = textwrap.dedent(\n        \"\"\"\n        madin: An Agentic Framework for Document Retrieval\n\n        ### Introduction\n        madin is a Python framework for building sophisticated, agentic retrieval\n        systems over structured and unstructured documents. It excels at understanding\n        document hierarchy to provide precise answers.\n\n        Key Features\n        - **Agentic Tree Search**: Navigates the document structure like a human would, leading to more context-aware results.\n        - **Hybrid Retrieval**: Combines semantic search with structured traversal for scalability and accuracy.\n        - **Intelligent Content Chunking**: Dynamically breaks down content based on its semantic meaning and structural importance.\n        \"\"\"\n    )\n    doc = Document(id=\"madin-framework-doc\", raw_content=markdown_content)\n\n    print(\"\\n[Step 1/3] Processing document into a structured tree...\")\n    processing_config = DocumentProcessingConfig(\n        agent_config=AgentConfig(\n            document_content_analysis=AGENT_MODEL,\n            document_structure_editing=AGENT_MODEL,\n            node_content_analysis=AGENT_MODEL,\n            node_content_chunking=AGENT_MODEL,\n        ),\n    )\n    processing_result = await process_document(doc, processing_config)\n    processed_document = flat_document_to_tree(processing_result.document)\n    print(\"Processed document structure:\")\n    print(\n        redact_document(processed_document).model_dump_json(\n            indent=2, exclude_none=True, exclude_unset=True\n        )\n    )\n\n    # --- 2. Build a Retrieval Agent ---\n    print(\"\\n[Step 2/3] Building the retrieval agent...\")\n    retrieval_agent = build_document_retrieval_agent(model=AGENT_MODEL)\n\n    # --- 3. Retrieve Relevant Content ---\n    query = \"What is agentic tree search?\"\n    print(f\"\\n[Step 3/3] Retrieving content for query: '{query}'\")\n\n    retrieval_results = await retrieve_documents(\n        agent=retrieval_agent, documents=[processed_document], query=query\n    )\n\n    print(\"\\n✨ Retrieval Complete! Relevant Content Found:\")\n    if not retrieval_results.results or not retrieval_results.results[0].node_ids:\n        print(\"   - No relevant information found for the query.\")\n    else:\n        # Loop through results and print the content of each retrieved node\n        for result in retrieval_results.results:\n            for node_id in result.node_ids:\n                node = get_document_node_by_id(processed_document, node_id)\n                if node:\n                    print(\"-\" * 50)\n                    print(f\"📄 Node ID: {node.id}\")\n                    print(\"Content:\")\n                    print(textwrap.indent(node.content, \"   \u003e \"))\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\nFor a more comprehensive example, including demonstrations of hybrid retrieval strategies, see [notebooks/madin.ipynb](notebooks/madin.ipynb).\n\n## Project Structure\n\n- `madin/`: The main Python library containing all core logic for document processing, schemas, and retrieval algorithms.\n- `notebooks/`: Jupyter notebooks for demonstrating and evaluating retrieval strategies.\n- `example_data/`: Sample data for running the examples.\n- `pyproject.toml`: Project configuration and dependencies.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkinyugo%2Fmadin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkinyugo%2Fmadin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkinyugo%2Fmadin/lists"}