{"id":27813747,"url":"https://github.com/endevsols/long-trainer","last_synced_at":"2026-02-26T10:50:26.081Z","repository":{"id":213327853,"uuid":"728764970","full_name":"ENDEVSOLS/Long-Trainer","owner":"ENDEVSOLS","description":"Introducing LongTrainer, a sophisticated extension of the LangChain framework designed specifically for managing multiple bots and providing isolated, context-aware chat sessions. Ideal for developers and businesses looking to integrate complex conversational AI into their systems, LongTrainer simplifies the deployment and customization of LLMs.","archived":false,"fork":false,"pushed_at":"2024-12-17T13:17:34.000Z","size":862,"stargazers_count":9,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-24T04:04:54.920Z","etag":null,"topics":["gpt","langchain","langchain-python","llm-training","longtrainer","openai","rag"],"latest_commit_sha":null,"homepage":"https://endevsols.github.io/Long-Trainer/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ENDEVSOLS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-12-07T16:37:26.000Z","updated_at":"2025-05-05T21:22:32.000Z","dependencies_parsed_at":"2024-05-05T16:27:54.173Z","dependency_job_id":"e4d965cc-6f26-484b-aa11-f64d9596aa49","html_url":"https://github.com/ENDEVSOLS/Long-Trainer","commit_stats":null,"previous_names":["endevsols/long-trainer"],"tags_count":25,"template":false,"template_full_name":null,"purl":"pkg:github/ENDEVSOLS/Long-Trainer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ENDEVSOLS%2FLong-Trainer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ENDEVSOLS%2FLong-Trainer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ENDEVSOLS%2FLong-Trainer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ENDEVSOLS%2FLong-Trainer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ENDEVSOLS","download_url":"https://codeload.github.com/ENDEVSOLS/Long-Trainer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ENDEVSOLS%2FLong-Trainer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266822300,"owners_count":23989824,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-24T02:00:09.469Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gpt","langchain","langchain-python","llm-training","longtrainer","openai","rag"],"created_at":"2025-05-01T12:02:14.738Z","updated_at":"2026-02-26T10:50:26.063Z","avatar_url":"https://github.com/ENDEVSOLS.png","language":"Python","funding_links":["https://opencollective.com/longtrainer"],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/ENDEVSOLS/Long-Trainer/blob/master/assets/longtrainer-logo.png?raw=true\" alt=\"LongTrainer Logo\"\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eLongTrainer 1.2.0 — Production-Ready RAG Framework\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eMulti-tenant bots, streaming, tools, and persistent memory — all batteries included.\u003c/strong\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.org/project/longtrainer/\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/longtrainer\" alt=\"PyPI Version\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pepy.tech/project/longtrainer\"\u003e\n    \u003cimg src=\"https://static.pepy.tech/badge/longtrainer\" alt=\"Total Downloads\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pepy.tech/project/longtrainer\"\u003e\n    \u003cimg src=\"https://static.pepy.tech/badge/longtrainer/month\" alt=\"Monthly Downloads\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/ENDEVSOLS/Long-Trainer/stargazers\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/stars/ENDEVSOLS/Long-Trainer?style=flat\" alt=\"GitHub Stars\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/ENDEVSOLS/Long-Trainer/actions/workflows/ci.yml\"\u003e\n    \u003cimg src=\"https://github.com/ENDEVSOLS/Long-Trainer/actions/workflows/ci.yml/badge.svg\" alt=\"CI\"\u003e\n  \u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/pypi/pyversions/longtrainer\" alt=\"Python Versions\"\u003e\n  \u003ca href=\"https://github.com/ENDEVSOLS/Long-Trainer/blob/master/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/license/ENDEVSOLS/Long-Trainer\" alt=\"License\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://opencollective.com/longtrainer\"\u003e\n    \u003cimg src=\"https://img.shields.io/opencollective/all/longtrainer?label=sponsors\" alt=\"Open Collective\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://endevsols.github.io/Long-Trainer/\"\u003eDocumentation\u003c/a\u003e •\n  \u003ca href=\"#quick-start-\"\u003eQuick Start\u003c/a\u003e •\n  \u003ca href=\"#features-\"\u003eFeatures\u003c/a\u003e •\n  \u003ca href=\"#migration-from-034\"\u003eMigration from 0.3.4\u003c/a\u003e •\n  \u003ca href=\"#support-the-project-\"\u003eSponsor\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n## What is LongTrainer?\n\nLongTrainer is a **production-ready RAG framework** that turns your documents into intelligent, multi-tenant chatbots — with **5 lines of code**.\n\nBuilt on top of LangChain, LongTrainer handles the hard parts that every production RAG system needs: **multi-bot isolation, persistent MongoDB memory, FAISS vector search, streaming responses, custom tool calling, chat encryption, and vision support** — so you don't have to wire them together yourself.\n\n### Why LongTrainer over raw LangChain / LlamaIndex?\n\n| Problem | LangChain / LlamaIndex | LongTrainer |\n|---|---|---|\n| Multi-bot management | DIY — manage state per bot | Built-in: `initialize_bot_id()` → isolated bots |\n| Persistent chat memory | Wire MongoDB/Redis yourself | Built-in: MongoDB-backed, encrypted, restorable |\n| Document ingestion | Assemble loaders + splitters | One-liner: `add_document_from_path(path, bot_id)` |\n| Streaming responses | Implement `astream` yourself | `get_response(stream=True)` yields chunks |\n| Custom tool calling | Define tools, build agent | `add_tool(my_tool)` — plug and play |\n| Web search augmentation | Find and integrate search | Built-in toggle: `web_search=True` |\n| Vision chat | Complex multi-modal setup | `get_vision_response()` — pass images |\n| Self-improving from chats | Not a concept | `train_chats()` feeds Q\u0026A back into KB |\n| Encryption at rest | DIY | `encrypt_chats=True` — Fernet out of the box |\n\n---\n\n## Installation\n\n```bash\npip install longtrainer\n```\n\n**With agent/tool-calling support (optional):**\n\n```bash\npip install longtrainer[agent]\n```\n\n### System Dependencies\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eLinux (Ubuntu/Debian)\u003c/strong\u003e\u003c/summary\u003e\n\n```bash\nsudo apt install libmagic-dev poppler-utils tesseract-ocr qpdf libreoffice pandoc\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003emacOS\u003c/strong\u003e\u003c/summary\u003e\n\n```bash\nbrew install libmagic poppler tesseract qpdf libreoffice pandoc\n```\n\u003c/details\u003e\n\n---\n\n## Quick Start 🚀\n\n### 1. Zero-Code CLI \u0026 API Server (New in 1.2.0!)\n\nManage bots, chat, and run a production API directly from your terminal—no Python required.\n\n#### A. Interactive Terminal Chat\n```bash\n# 1. Initialize a new project and generate longtrainer.yaml\nlongtrainer init\n\n# 2. Create a new bot\nlongtrainer bot create --prompt \"You are a helpful assistant.\"\n\n# 3. Add a document (PDF, link, etc.)\nlongtrainer add-doc \u003cbot_id\u003e /path/to/document.pdf\n\n# 4. Start chatting!\nlongtrainer chat \u003cbot_id\u003e\n```\n\n#### B. FastAPI REST Server\nStart a production-ready API server backed by your LongTrainer bots:\n```bash\nlongtrainer serve\n```\n\nThis starts a FastAPI server running on `http://localhost:8000` with **16 REST endpoints**, including:\n- `/health`\n- `/bots` (CRUD)\n- `/bots/{id}/documents/path` (Ingest files)\n- `/bots/{id}/chats` (Create sessions)\n- `/bots/{id}/chats/{chat_id}` (Chat and Streaming)\n\nVisit `http://localhost:8000/docs` to see the auto-generated Swagger UI and test the API directly!\n\n### 2. Python SDK — Default RAG Mode\n\n```python\nfrom longtrainer.trainer import LongTrainer\nimport os\n\nos.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\n\n# Initialize\ntrainer = LongTrainer(mongo_endpoint=\"mongodb://localhost:27017/\")\nbot_id = trainer.initialize_bot_id()\n\n# Add documents (PDF, DOCX, CSV, HTML, MD, TXT, URLs, YouTube, Wikipedia)\ntrainer.add_document_from_path(\"path/to/your/data.pdf\", bot_id)\n\n# Create bot and start chatting\ntrainer.create_bot(bot_id)\nchat_id = trainer.new_chat(bot_id)\n\n# Get response\nanswer, sources = trainer.get_response(\"What is this document about?\", bot_id, chat_id)\nprint(answer)\n```\n\n### Streaming Responses\n\n```python\n# Stream tokens in real-time\nfor chunk in trainer.get_response(\"Summarize the key points\", bot_id, chat_id, stream=True):\n    print(chunk, end=\"\", flush=True)\n```\n\n### Async Streaming\n\n```python\nasync for chunk in trainer.aget_response(\"Explain the methodology\", bot_id, chat_id):\n    print(chunk, end=\"\", flush=True)\n```\n\n### AgentBot automatically routes questions to tools like web search when necessary.\n\n### 🌟 NEW: Dynamic ZERO CODE Tools\nLongTrainer V2 now integrates LangChain's massive dynamic tool ecosystem **natively**:\n```python\ntrainer.create_bot(\n    \"agent-id\", \n    agent_mode=True, \n    tools=[\"tavily_search_results_json\", \"wikipedia\", \"arxiv\", \"PythonREPLTool\", \"yahoo_finance_news\"]\n)\n```\n\nLongTrainer will dynamically import and initialize ANY string-based tool from `langchain.agents.load_tools` natively on the backend!\n\nYou may still register custom tools globally or per-bot explicitly:\n```python\nfrom langchain.tools import tool\n\n@tool\ndef get_weather(location: str):\n```\n\n### Agent Mode — With Custom Tools\n\n```python\nfrom longtrainer.tools import web_search\nfrom langchain_core.tools import tool\n\n# Add built-in web search tool\ntrainer.add_tool(web_search, bot_id)\n\n# Add your own custom tool\n@tool\ndef calculate(expression: str) -\u003e str:\n    \"\"\"Evaluate a math expression.\"\"\"\n    return str(eval(expression))\n\ntrainer.add_tool(calculate, bot_id)\n\n# Create bot in agent mode\ntrainer.create_bot(bot_id, agent_mode=True)\nchat_id = trainer.new_chat(bot_id)\n\nresponse, _ = trainer.get_response(\"What is 42 * 17?\", bot_id, chat_id)\nprint(response)\n```\n\n### Vision Chat\n\n```python\nvision_id = trainer.new_vision_chat(bot_id)\nresponse, sources = trainer.get_vision_response(\n    \"Describe what you see in this image\",\n    image_paths=[\"photo.jpg\"],\n    bot_id=bot_id,\n    vision_chat_id=vision_id,\n)\nprint(response)\n```\n\n### Per-Bot Customization\n\n```python\nfrom langchain_openai import ChatOpenAI, OpenAIEmbeddings\n\n# Each bot can have its own LLM, embeddings, and retrieval config\ntrainer.create_bot(\n    bot_id,\n    llm=ChatOpenAI(model=\"gpt-4o-mini\", temperature=0.2),\n    embedding_model=OpenAIEmbeddings(model=\"text-embedding-3-small\"),\n    num_k=5,                    # retrieve 5 docs per query\n    prompt_template=\"You are a helpful legal assistant. {context}\",\n    agent_mode=True,            # enable tool calling\n    tools=[web_search],\n)\n```\n\n---\n\n## Features ✨\n\n### Core\n- ✅ **Dual Mode:** RAG (LCEL chain) for simple Q\u0026A, Agent (LangGraph) for tool calling\n- ✅ **Streaming Responses:** Sync and async streaming out of the box\n- ✅ **Custom Tool Calling:** Add any LangChain `@tool` — web search, document reader, or your own\n- ✅ **Multi-Bot Management:** Isolated bots with independent sessions, data, and configs\n- ✅ **Persistent Memory:** MongoDB-backed chat history, fully restorable\n- ✅ **Chat Encryption:** Fernet encryption for stored conversations\n\n### Document Ingestion\n- ✅ **Standard Formats:** PDF, DOCX, CSV, HTML, Markdown, TXT\n- ✅ **Web \u0026 Crawling:** `add_document_from_link()`, `add_document_from_query()`, `add_document_from_crawl()`\n- ✅ **Cloud \u0026 Enterprise:** S3 (`add_document_from_aws_s3`), Google Drive (`add_document_from_google_drive`), Confluence (`add_document_from_confluence`)\n- ✅ **Structued Data:** Local Directory (`add_document_from_directory`), JSON \u0026 JQ (`add_document_from_json`), GitHub Repo (`add_document_from_github`)\n- ✅ **Dynamic Integrations:** Inject ANY LangChain document loader class dynamically via `add_document_from_dynamic_loader()`\n\n### RAG Pipeline \u0026 Vector DBs\n- ✅ **Vector Databases:** FAISS, Pinecone, Chroma, Qdrant, **PGVector, MongoDB Atlas, Milvus, Elasticsearch, Weaviate**\n- ✅ **Multi-Query Ensemble Retrieval:** Generates alternative queries for better recall\n- ✅ **Self-Improving Memory:** `train_chats()` feeds past Q\u0026A back into the knowledge base\n\n### Customization\n- ✅ **Per-bot LLM** — use different models for different bots\n- ✅ **Per-bot Embeddings** — custom embedding models per bot\n- ✅ **Per-bot Retrieval Config** — custom `num_k`, `chunk_size`, `chunk_overlap`\n- ✅ **Custom Prompt Templates** — full control over system prompts\n- ✅ **Vision Chat** — GPT-4 Vision support with image understanding\n\n### Works with All LangChain-Compatible LLMs\n\n- ✅ OpenAI (default)\n- ✅ Anthropic\n- ✅ Google VertexAI / Gemini\n- ✅ AWS Bedrock\n- ✅ HuggingFace\n- ✅ Groq\n- ✅ Together AI\n- ✅ Ollama (local models)\n- ✅ Any `BaseChatModel` implementation\n\n---\n\n## API Reference\n\n### `LongTrainer` — Main Class\n\n```python\ntrainer = LongTrainer(\n    mongo_endpoint=\"mongodb://localhost:27017/\",\n    llm=None,                # default: ChatOpenAI(model=\"gpt-4o-2024-08-06\")\n    embedding_model=None,    # default: OpenAIEmbeddings()\n    prompt_template=None,    # custom system prompt\n    max_token_limit=32000,   # conversation memory limit\n    num_k=3,                 # docs to retrieve per query\n    chunk_size=2048,         # text splitter chunk size\n    chunk_overlap=200,       # text splitter overlap\n    ensemble=False,          # enable multi-query ensemble retrieval\n    encrypt_chats=False,     # enable Fernet encryption\n    encryption_key=None,     # custom encryption key (auto-generated if None)\n)\n```\n\n### Key Methods\n\n| Method | Description |\n|---|---|\n| `initialize_bot_id()` | Create a new bot, returns `bot_id` |\n| `create_bot(bot_id, ...)` | Build the bot from loaded documents |\n| `load_bot(bot_id)` | Restore an existing bot from MongoDB + FAISS |\n| `new_chat(bot_id)` | Start a new chat session, returns `chat_id` |\n| `get_response(query, bot_id, chat_id, stream=False)` | Get response (or stream) |\n| `aget_response(query, bot_id, chat_id)` | Async streaming response |\n| `add_document_from_path(path, bot_id)` | Ingest a file |\n| `add_document_from_link(links, bot_id)` | Ingest URLs / YouTube links |\n| `add_tool(tool, bot_id)` | Register a tool for a bot |\n| `remove_tool(tool_name, bot_id)` | Remove a tool |\n| `list_tools(bot_id)` | List registered tools |\n| `train_chats(bot_id)` | Self-improve from chat history |\n| `new_vision_chat(bot_id)` | Start a vision chat session |\n| `get_vision_response(query, images, bot_id, vision_id)` | Vision response |\n\n---\n\n## Migration from 0.3.4\n\nLongTrainer 1.0.0 is a major upgrade with breaking changes:\n\n| 0.3.4 | 1.0.0 |\n|---|---|\n| `ConversationalRetrievalChain` | LCEL chain (`RAGBot`) or LangGraph agent (`AgentBot`) |\n| `requirements.txt` + `setup.py` | `pyproject.toml` (UV/pip compatible) |\n| No streaming | `stream=True` or `aget_response()` |\n| No tool calling | `add_tool()` + `agent_mode=True` |\n| `langchain.memory` | `langchain_core.chat_history` |\n| Fixed LLM for all bots | Per-bot LLM, embeddings, and config |\n\n**Upgrade path:**\n```bash\npip install --upgrade longtrainer\n```\n\nThe core API (`initialize_bot_id`, `create_bot`, `new_chat`, `get_response`) remains the same — existing code should work with minimal changes. The main difference is `get_response()` now returns `(answer, sources)` instead of `(answer, sources, web_sources)`.\n\n---\n\n## Support the Project 💖\n\nLongTrainer is free and open-source. If it's useful to you, consider sponsoring its development:\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://opencollective.com/longtrainer\"\u003e\n    \u003cimg src=\"https://opencollective.com/longtrainer/donate/button@2x.png?color=blue\" width=\"300\" alt=\"Donate to LongTrainer\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\nYour sponsorship helps fund:\n- 🚀 New features (CLI, API server, evaluation tools)\n- 🐛 Bug fixes and maintenance\n- 📖 Documentation and tutorials\n- 🧪 CI/CD infrastructure\n\n---\n\n## Citation\n\n```\n@misc{longtrainer,\n  author = {Endevsols},\n  title = {LongTrainer: Production-Ready RAG Framework},\n  year = {2024},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/ENDEVSOLS/Long-Trainer}},\n}\n```\n\n## License\n\n[MIT License](LICENSE)\n\n## Contributing\n\nWe welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fendevsols%2Flong-trainer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fendevsols%2Flong-trainer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fendevsols%2Flong-trainer/lists"}