{"id":29227210,"url":"https://github.com/royerlab/litemind","last_synced_at":"2025-08-18T10:23:51.221Z","repository":{"id":280252378,"uuid":"937867444","full_name":"royerlab/litemind","owner":"royerlab","description":"litemind is a Python library designed to empower developers to build sophisticated conversational agents and tools. It provides a flexible and elegant API for interacting with Large Language Models (LLMs) from various providers, along with a powerful agentic AI framework that supports multimodal inputs and outputs.","archived":false,"fork":false,"pushed_at":"2025-07-03T02:19:30.000Z","size":74856,"stargazers_count":16,"open_issues_count":1,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-07-03T03:26:52.251Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/royerlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-02-24T03:19:33.000Z","updated_at":"2025-07-03T02:19:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"181322ba-fefd-4f9d-afdf-ecb775deba54","html_url":"https://github.com/royerlab/litemind","commit_stats":null,"previous_names":["royerlab/litemind"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/royerlab/litemind","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Flitemind","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Flitemind/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Flitemind/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Flitemind/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/royerlab","download_url":"https://codeload.github.com/royerlab/litemind/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Flitemind/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263296638,"owners_count":23444499,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-03T09:10:14.888Z","updated_at":"2025-08-18T10:23:51.206Z","avatar_url":"https://github.com/royerlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# litemind\n\n[![PyPI version](https://badge.fury.io/py/litemind.svg)](https://pypi.org/project/litemind/)\n[![License: BSD-3-Clause](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](./LICENSE)\n[![Downloads](https://static.pepy.tech/badge/litemind)](https://pepy.tech/project/litemind)\n[![GitHub stars](https://img.shields.io/github/stars/royerlab/litemind.svg?style=social\u0026label=Star)](https://github.com/royerlab/litemind)\n\n---\n\n## Summary\n\n**Litemind** is a powerful, extensible, and user-friendly Python library for building next-generation multimodal, agentic AI applications. It provides a unified, high-level API for interacting with a wide range of Large Language Model (LLM) providers (OpenAI, Anthropic, Google Gemini, Ollama, and more), and enables the creation of advanced agents that can reason, use tools, access external knowledge, and process multimodal data (text, images, audio, video, tables, documents, and more).\n\nLitemind's philosophy is to make advanced agentic and multimodal AI accessible to all Python developers, with a focus on clarity, composability, and extensibility. Whether you want to build a simple chatbot, a research assistant, or a complex workflow that leverages retrieval-augmented generation (RAG), tool use, and multimodal reasoning, Litemind provides the building blocks you need.\n\n---\n\n## Features\n\n- **Unified API**: Seamlessly interact with multiple LLM providers (OpenAI, Anthropic, Gemini, Ollama, etc.) through a single, consistent interface.\n- **Agentic Framework**: Build agents that can reason, use tools, maintain conversations, and augment themselves with external knowledge.\n- **Multimodal Support**: Native support for text, images, audio, video, tables, documents, and more, both as inputs and outputs.\n- **Tool Integration**: Easily define and add custom Python functions as tools, or use built-in tools (web search, MCP protocol, etc.).\n- **Augmentations (RAG)**: Integrate vector databases and retrieval-augmented generation to ground agent responses in external knowledge.\n- **Automatic Model Feature Discovery**: Automatically select models based on required features (e.g., image input, tool use, reasoning).\n- **Extensible Media Classes**: Rich, type-safe representations for all supported media types.\n- **Comprehensive Conversion**: Automatic conversion between media types for maximum model compatibility.\n- **Command-Line Tools**: CLI utilities for code generation, repo export, and model feature scanning.\n- **Callback and Logging System**: Fine-grained logging and callback hooks for monitoring and debugging (powered by [Arbol](http://github.com/royerlab/arbol)).\n- **Robust Testing**: Extensive test suite covering all major features and edge cases.\n- **BSD-3-Clause License**: Open source and ready for both academic and commercial use.\n\n---\n\n## Installation\n\nLitemind requires Python 3.9 or newer.\n\nInstall the latest release from PyPI:\n\n```bash\npip install litemind\n```\n\nFor development (with all optional dependencies):\n\n```bash\ngit clone https://github.com/royerlab/litemind.git\ncd litemind\npip install -e \".[dev,rag,whisper,documents,tables,videos,audio,remote,tasks]\"\n```\n\n---\n\n## Basic Usage\n\nBelow are several illustrative examples of the agent-level API. Each example is self-contained and demonstrates a different aspect of Litemind's agentic capabilities.\n\n### 1. Basic Agent Usage\n\n```python\nfrom litemind import OpenAIApi\nfrom litemind.agent.agent import Agent\n\n# Initialize the OpenAI API\napi = OpenAIApi()\n\n# Create an agent\nagent = Agent(api=api, model_name=\"o3-high\")\n\n# Add a system message to guide the agent's behavior\nagent.append_system_message(\"You are a helpful assistant.\")\n\n# Ask a question\nresponse = agent(\"What is the capital of France?\")\n\nprint(\"Simple Agent Response:\", response)\n# Output: Simple Agent Response: [*assistant*:\n# The capital of France is Paris.\n# ]\n```\n\n---\n\n### 2. Agent with Tools\n\n```python\nfrom litemind import OpenAIApi\nfrom litemind.agent.agent import Agent\nfrom litemind.agent.tools.toolset import ToolSet\nfrom datetime import datetime\n\n# Define a function to get the current date\ndef get_current_date() -\u003e str:\n    \"\"\"Fetch the current date\"\"\"\n    return datetime.now().strftime(\"%Y-%m-%d\")\n\napi = OpenAIApi()\ntoolset = ToolSet()\ntoolset.add_function_tool(get_current_date)\n\nagent = Agent(api=api, toolset=toolset)\nagent.append_system_message(\"You are a helpful assistant.\")\n\nresponse = agent(\"What is the current date?\")\nprint(\"Agent with Tool Response:\", response)\n# Output: Agent with Tool Response: [*assistant*:\n# The current date is 2025-05-02.\n# ]\n```\n\n---\n\n### 3. Agent with Tools and Augmentation (RAG)\n\n```python\nfrom litemind import OpenAIApi\nfrom litemind.agent.agent import Agent\nfrom litemind.agent.tools.toolset import ToolSet\nfrom litemind.agent.augmentations.information.information import Information\nfrom litemind.agent.augmentations.vector_db.in_memory_vector_db import InMemoryVectorDatabase\nfrom litemind.media.types.media_text import Text\n\ndef get_current_date() -\u003e str:\n    from datetime import datetime\n    return datetime.now().strftime(\"%Y-%m-%d\")\n\napi = OpenAIApi()\ntoolset = ToolSet()\ntoolset.add_function_tool(get_current_date, \"Fetch the current date\")\n\nagent = Agent(api=api, toolset=toolset)\n\n# Create vector database augmentation\nvector_augmentation = InMemoryVectorDatabase(name=\"test_augmentation\")\n\n# Add sample informations to the augmentation\ninformations = [\n    Information(Text(\"Igor Bolupskisty was a German-born theoretical physicist who developed the theory of indelible unitarity.\"),\n                metadata={\"topic\": \"physics\", \"person\": \"Bolupskisty\"}),\n    Information(Text(\"The theory of indelible unitarity revolutionized our understanding of space, time and photons.\"),\n                metadata={\"topic\": \"physics\", \"concept\": \"unitarity\"}),\n    Information(Text(\"Quantum unitarity is a fundamental theory in physics that describes nature at the nano-atomic scale as it pertains to Pink Hamsters.\"),\n                metadata={\"topic\": \"physics\", \"concept\": \"quantum unitarity\"}),\n]\n\nvector_augmentation.add_informations(informations)\nagent.add_augmentation(vector_augmentation)\nagent.append_system_message(\"You are a helpful assistant.\")\n\nresponse = agent(\"Tell me about Igor Bolupskisty's theory of indelible unitarity. Also, what is the current date?\")\nprint(\"Agent with Tool and Augmentation Response:\", response)\n# Output: [*assistant*:\n# Igor Bolupskisty was a German-born theoretical physicist known for developing the theory of indelible unitarity...\n# Today's date is May 2, 2025.\n# ]\n```\n\n---\n\n### 4. More Complex Example: Multimodal Inputs, Tools, and Augmentations\n\n```python\nfrom litemind import OpenAIApi\nfrom litemind.agent.agent import Agent\nfrom litemind.agent.tools.toolset import ToolSet\nfrom litemind.agent.augmentations.vector_db.in_memory_vector_db import InMemoryVectorDatabase\nfrom litemind.agent.augmentations.information.information import Information\nfrom litemind.media.types.media_text import Text\nfrom litemind.media.types.media_image import Image\n\ndef get_current_date() -\u003e str:\n    from datetime import datetime\n    return datetime.now().strftime(\"%Y-%m-%d\")\n\napi = OpenAIApi()\ntoolset = ToolSet()\ntoolset.add_function_tool(get_current_date, \"Fetch the current date\")\n\nagent = Agent(api=api, toolset=toolset)\n\n# Add a multimodal information to the vector database\nvector_augmentation = InMemoryVectorDatabase(name=\"multimodal_augmentation\")\nvector_augmentation.add_informations([\n    Information(Image(\"https://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Einstein_1921_by_F_Schmutzer_-_restoration.jpg/456px-Einstein_1921_by_F_Schmutzer_-_restoration.jpg\"),\n                metadata={\"topic\": \"physics\", \"person\": \"Einstein\"}),\n])\nagent.add_augmentation(vector_augmentation)\nagent.append_system_message(\"You are a helpful assistant.\")\n\nresponse = agent(\"Describe the person in the image and tell me today's date.\")\nprint(\"Multimodal Agent Response:\", response)\n# Output: [*assistant*:\n# The image shows Albert Einstein, a renowned physicist...\n# Today's date is May 2, 2025.\n# ]\n```\n\n---\n\n**Note:** In all examples, `model_features` can be provided as a list of strings, a singleton string, or as enums (see `ModelFeatures.normalise`). For example, `model_features=[\"textgeneration\", \"tools\"]` or `model_features=ModelFeatures.TextGeneration`.\n\n---\n\n## Concepts\n\n### Main Classes\n\n- **Agent**: The core class representing an agentic AI entity. It manages conversation state, toolsets, augmentations, and interacts with the API.\n- **ToolSet**: A collection of tools (Python functions or agent tools) that the agent can use.\n- **AugmentationSet**: A collection of augmentations (e.g., vector databases) for retrieval-augmented generation (RAG).\n- **Information**: Represents a knowledge chunk (text, image, etc.) with metadata, used in augmentations.\n- **Media Classes**: Typed representations for all supported media (Text, Image, Audio, Video, Table, Document, etc.).\n- **API Classes**: Abstractions for LLM providers (OpenAIApi, AnthropicApi, GeminiApi, OllamaApi, CombinedApi, etc.).\n\n### API Layers\n\n- **Agentic API**: The high-level, agent-oriented API (as shown above). This is the recommended way to build complex, interactive, multimodal, and tool-using agents.\n- **Wrapper API**: Lower-level, direct access to LLM provider APIs (e.g., `api.generate_text(...)`, `api.generate_image(...)`). Use this for fine-grained control or when you don't need agentic features.\n\n**Difference:** The agentic API manages conversation, tool use, augmentation, and multimodal context automatically. The wrapper API is stateless and does not manage agent state or tool use.\n\n---\n\n## Multi-modality\n\nLitemind supports **multimodal inputs and outputs** natively. This means you can send images, audio, video, tables, and documents to models that support them, and receive rich outputs.\n\n### Model Features\n\n- **Model features** describe what a model can do (e.g., text generation, image input, tool use, etc.).\n- You can request features as enums, strings, or lists (e.g., `ModelFeatures.TextGeneration`, `\"textgeneration\"`, or `[\"textgeneration\", \"tools\"]`).\n- Litemind will automatically select the best model that supports the requested features.\n\n### Requesting Features\n\n```python\nfrom litemind.apis.model_features import ModelFeatures\n\n# As enums\nagent = Agent(api=api, model_features=[ModelFeatures.TextGeneration, ModelFeatures.Image])\n\n# As strings\nagent = Agent(api=api, model_features=[\"textgeneration\", \"image\"])\n```\n\n### Media Classes\n\n- All multimodal data is represented by dedicated media classes (e.g., `Text`, `Image`, `Audio`, `Video`, `Table`, `Document`, etc.).\n- These classes provide type safety, conversion utilities, and serialization.\n- When building messages, use the appropriate media class (e.g., `Message.append_image(...)`, `Message.append_audio(...)`).\n\n---\n\n## More Examples\n\nBelow are some advanced usage examples. For even more, see the [EXAMPLES.md](EXAMPLES.md) file.\n\n### 1. Using the CombinedAPI\n\n```python\nfrom litemind.apis.combined_api import CombinedApi\nfrom litemind.agent.agent import Agent\n\napi = CombinedApi()\nagent = Agent(api=api)\nagent.append_system_message(\"You are a helpful assistant.\")\nresponse = agent(\"What is the tallest mountain in the world?\")\nprint(response)\n```\n\n---\n\n### 2. Agent That Can Execute Python Code\n\n```python\nfrom litemind import OpenAIApi\nfrom litemind.agent.agent import Agent\nfrom litemind.agent.tools.toolset import ToolSet\n\ndef execute_python_code(code: str) -\u003e str:\n    try:\n        exec_globals = {}\n        exec(code, exec_globals)\n        return str(exec_globals)\n    except Exception as e:\n        return str(e)\n\napi = OpenAIApi()\ntoolset = ToolSet()\ntoolset.add_function_tool(execute_python_code, \"Execute Python code and return the result.\")\n\nagent = Agent(api=api, toolset=toolset)\nagent.append_system_message(\"You are a Python code executor.\")\nresponse = agent(\"What is the result of 2 + 2 in Python?\")\nprint(response)\n```\n\n---\n\n### 3. Using the Wrapper API for Structured Outputs\n\n```python\nfrom litemind import OpenAIApi\nfrom pydantic import BaseModel\n\nclass WeatherResponse(BaseModel):\n    temperature: float\n    condition: str\n    humidity: float\n\napi = OpenAIApi()\nmessages = [\n    {\"role\": \"system\", \"content\": \"You are a weather bot.\"},\n    {\"role\": \"user\", \"content\": \"What is the weather like in Paris?\"}\n]\nresponse = api.generate_text(messages=messages, response_format=WeatherResponse)\nprint(response)\n# Output: [WeatherResponse(temperature=22.5, condition='sunny', humidity=45.0)]\n```\n\n---\n\n### 4. Agent with a Tool That Generates an Image Using the Wrapper API\n\n```python\nfrom litemind import OpenAIApi\nfrom litemind.agent.agent import Agent\nfrom litemind.agent.tools.toolset import ToolSet\n\ndef generate_cat_image() -\u003e str:\n    api = OpenAIApi()\n    image = api.generate_image(positive_prompt=\"A cute fluffy cat\", image_width=512, image_height=512)\n    # Save or display the image as needed\n    return \"Cat image generated!\"\n\napi = OpenAIApi()\ntoolset = ToolSet()\ntoolset.add_function_tool(generate_cat_image, \"Generate a cat image.\")\n\nagent = Agent(api=api, toolset=toolset)\nagent.append_system_message(\"You are an assistant that can generate images.\")\nresponse = agent(\"Please generate a cat image for me.\")\nprint(response)\n```\n\n---\n\n**More examples can be found in the [EXAMPLES.md](EXAMPLES.md) file.**\n\n---\n\n## Command Line Tools\n\nLitemind provides several command-line tools for code generation, repository export, and model feature scanning.\n\n### Usage\n\n```bash\nlitemind codegen --api openai -m gpt-4o --file README\nlitemind export --folder-path . --output-file exported.txt\nlitemind scan --api openai gemini --models gpt-4o models/gemini-1.5-pro\n```\n\n### .codegen.yml Format\n\nTo use the `codegen` tool, create a `.codegen` folder in the root of your repository and add one or more `*.codegen.yml` files. Example:\n\n```yaml\nfile: README.md\nprompt: |\n  Please generate a detailed, complete and informative README.md file for this repository.\nfolder:\n  path: .\n  extensions: [\".py\", \".md\", \".toml\", \"LICENSE\"]\n  excluded: [\"dist\", \"build\", \"litemind.egg-info\"]\n```\n\n- The `folder` section specifies which files to include/exclude.\n- The `prompt` is the instruction for the agent.\n- You can have multiple `.codegen.yml` files for different outputs.\n\n---\n\n## Caveats and Limitations\n\n- **Error Handling**: Some error handling, especially in the wrapper API, can be improved.\n- **Token Management**: There is no built-in mechanism for managing token usage or quotas.\n- **API Key Management**: API keys are managed via environment variables; consider more secure solutions for production.\n- **Performance**: No explicit caching or async support yet; large RAG databases may impact performance.\n- **Failing Tests**: See the \"Code Health\" section for test status.\n- **Streaming**: Not all models/providers support streaming responses.\n- **Model Coverage**: Not all models support all features (e.g., not all support images, tools, or reasoning).\n- **Security**: Always keep your API keys secure and do not commit them to version control.\n\n---\n\n## Code Health\n\n- **Unit Tests**: The test suite covers a wide range of functionalities, including text generation, image generation, audio, RAG, tools, multimodal, and more.\n- **Test Results**: No test failures reported in the latest run (`test_reports/` is empty).\n- **Total Tests**: Hundreds of tests across all modules.\n- **Failures**: None reported.\n- **Assessment**: The codebase is robust and well-tested. Any failures would be non-critical unless otherwise noted in `test_report.md` or `ANALYSIS.md`.\n\n---\n\n## API Keys\n\nLitemind requires API keys for the various LLM providers:\n\n- **OpenAI**: `OPENAI_API_KEY`\n- **Anthropic (Claude)**: `ANTHROPIC_API_KEY`\n- **Google Gemini**: `GOOGLE_GEMINI_API_KEY`\n- **Ollama**: (local server, no key required by default)\n\n### Setting API Keys\n\n#### Linux / macOS\n\nAdd the following lines to your `~/.bashrc`, `~/.zshrc`, or `~/.profile`:\n\n```bash\nexport OPENAI_API_KEY=\"sk-...\"\nexport ANTHROPIC_API_KEY=\"sk-ant-...\"\nexport GOOGLE_GEMINI_API_KEY=\"...\"\n```\n\nThen reload your shell:\n\n```bash\nsource ~/.bashrc\n```\n\n#### Windows\n\nSet environment variables in the Command Prompt or PowerShell:\n\n```cmd\nsetx OPENAI_API_KEY \"sk-...\"\nsetx ANTHROPIC_API_KEY \"sk-ant-...\"\nsetx GOOGLE_GEMINI_API_KEY \"...\"\n```\n\nOr add them to your system environment variables via the Control Panel.\n\n#### In Python (not recommended for production):\n\n```python\nimport os\nos.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\n```\n\n---\n\n## Roadmap\n\n- [x] Setup a readme with a quick start guide.\n- [x] setup continuous integration and pipy deployment.\n- [x] Improve document conversion (page per page text and video interleaving + whole page images)\n- [x] Cleanup structured output with tool usage\n- [x] Implement streaming callbacks\n- [x] Improve folder/archive conversion: add ascii folder tree\n- [x] Reorganise media files used for testing into a single media folder\n- [x] Improve logging with arbol, with option to turn off.\n- [x] Use specialised libraries for document type identification\n- [x] Cleanup the document conversion code.\n- [x] Add support for adding nD images to messages.\n- [x] Automatic feature support discovery for models (which models support images as input, reasoning, etc...)\n- [x] Add support for OpenAI's new 'Response' API.\n- [x] Add support for builtin tools: Web search and MCP protocol.\n- [ ] Add webui functionality for agents using Reflex.\n- [ ] Video conversion temporal sampling should adapt to the video length, short videos should have more frames...\n- [ ] RAG ingestion code for arbitrary digital objects: folders, pdf, images, urls, etc...\n- [ ] Add more support for MCP protocol beyond built-in API support.\n- [ ] Use the faster pybase64 for base64 encoding/decoding.\n- [ ] Deal with message sizes in tokens sent to models\n- [ ] Improve vendor api robustness features such as retry call when server errors, etc...\n- [ ] Improve and uniformize exception handling\n- [ ] Implement 'brainstorming' mode for text generation, possibly with API fusion.\n\n---\n\n## Contributing\n\nContributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.\n\n- Fork the repository and create a feature branch.\n- Write clear, tested, and well-documented code.\n- Run `pytest` and ensure all tests pass.\n- Use [Black](https://github.com/psf/black), [isort](https://pycqa.github.io/isort/), [flake8](https://flake8.pycqa.org/), and [mypy](http://mypy-lang.org/) for code quality.\n- Submit a pull request and describe your changes.\n\n---\n\n## License\n\nBSD-3-Clause. See [LICENSE](LICENSE) for details.\n\n---\n\n## Logging\n\nLitemind uses [Arbol](http://github.com/royerlab/arbol) for logging. You can deactivate logging by setting `Arbol.passthrough = True` in your code.\n\n---\n\n\u003e _This README was generated with the help of AI._\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froyerlab%2Flitemind","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froyerlab%2Flitemind","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froyerlab%2Flitemind/lists"}