{"id":26913687,"url":"https://github.com/pierrebrunelle/pixelbot","last_synced_at":"2026-05-06T20:34:53.274Z","repository":{"id":262728132,"uuid":"888167945","full_name":"pierrebrunelle/pixelbot","owner":"pierrebrunelle","description":"A context-aware Discord bot with semantic search and conversational memory. Uses Pixeltable + OpenAI for human-like responses","archived":false,"fork":false,"pushed_at":"2024-11-15T02:41:56.000Z","size":50,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-15T03:25:05.651Z","etag":null,"topics":["chatbot","context-aware","discord-bot","discord-py","infinite-memory","npl","openai","pixeltable","rag","semantic-search","vector-embeddings"],"latest_commit_sha":null,"homepage":"https://discord.com/application-directory/1304932122611552346","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pierrebrunelle.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-13T23:44:14.000Z","updated_at":"2024-11-15T02:50:33.000Z","dependencies_parsed_at":"2024-11-15T03:25:12.490Z","dependency_job_id":"a93098d4-f4e7-48aa-968d-81bc1f4c0dee","html_url":"https://github.com/pierrebrunelle/pixelbot","commit_stats":null,"previous_names":["pierrebrunelle/discot-bot","pierrebrunelle/pixelbot"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pierrebrunelle%2Fpixelbot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pierrebrunelle%2Fpixelbot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pierrebrunelle%2Fpixelbot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pierrebrunelle%2Fpixelbot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pierrebrunelle","download_url":"https://codeload.github.com/pierrebrunelle/pixelbot/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246672004,"owners_count":20815312,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbot","context-aware","discord-bot","discord-py","infinite-memory","npl","openai","pixeltable","rag","semantic-search","vector-embeddings"],"created_at":"2025-04-01T16:38:49.661Z","updated_at":"2026-05-06T20:34:53.266Z","avatar_url":"https://github.com/pierrebrunelle.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1\u003e\n  \u003cimg src=\"static/image/Pixelbot.png\" alt=\"Pixelbot Logo\" width=\"100\" style=\"vertical-align:middle; margin-right: 10px;\"\u003e\n  Pixelbot: Multimodal AI Agent with Pixeltable\n\u003c/h1\u003e\n\n---\n\n\u003cdiv\u003e\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-0530AD.svg)](https://opensource.org/licenses/Apache-2.0)\n[![My Discord (1306431018890166272)](https://img.shields.io/badge/💬-Discord-%235865F2.svg)](https://discord.gg/QPyqFYx2UN)\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n \u003ctable\u003e\n   \u003ctr\u003e\n     \u003ctd align=\"center\" width=\"49%\"\u003e\n       \u003cimg src=\"static/image/screenshot.png\" alt=\"Application Screenshot 1\" width=\"100%\"/\u003e\n     \u003c/td\u003e\n     \u003ctd align=\"center\" width=\"49%\"\u003e\n       \u003cimg src=\"static/image/screenshot-2.png\" alt=\"Application Screenshot 2\" width=\"100%\"/\u003e\n     \u003c/td\u003e\n   \u003c/tr\u003e\n  \u003c/table\u003e\n\u003c/div\u003e\n\nThis application demonstrates **Pixelbot**, a multimodal AI agent built using [Pixeltable](https://github.com/pixeltable/pixeltable), open-source AI data infrastructure. The agent can process and reason about various data types (documents, images, videos, audio), use external tools, generate images, maintain a chat history, and utilize a persistent memory bank.\n\nThe frontend is built with Flask and vanilla JavaScript, providing a user-friendly interface for interaction.\n\n## Key Features\n\n- **Unified Multimodal Data Management**: Ingests and manages text, PDFs, images (JPG, PNG), videos (MP4), and audio files (MP3, WAV).\n- **Declarative AI Pipelines**: Leverages Pixeltable's computed columns and views to declaratively define complex workflows including data processing, embedding generation, AI model inference, and more.\n- **Semantic Search**: Implements vector search across:\n    - Document chunks (`sentence-transformers`)\n    - Images \u0026 Video Frames (`CLIP`)\n    - Audio Transcripts (`sentence-transformers`)\n    - Chat History (`sentence-transformers`)\n    - Memory Bank items (`sentence-transformers`)\n- **LLM Integration**:\n    - **Reasoning \u0026 Tool Use**: Anthropic Claude 3.5 Sonnet\n    - **Audio Transcription**: OpenAI Whisper\n    - **Image Generation**: OpenAI DALL-E 3\n    - **Follow-up Suggestions**: Mistral Small Latest\n- **External Tools**: Integrates with external APIs for real-time information (e.g., NewsAPI, Yahoo Finance via `yfinance`, DuckDuckGo Search, and more).\n- **Chat History**: Persistently stores and allows searching through conversation history.\n- **Memory Bank**: Allows saving and semantically searching through important text snippets or code blocks.\n- **Image Generation**: Generates images based on user prompts using DALL-E 3.\n- **Responsive UI**: A clean web interface built with Flask, Tailwind CSS, and JavaScript for interacting with the agent, managing files, viewing history, and configuring settings.\n- **Centralized Configuration**: Uses `config.py` to manage model IDs, default system prompts, LLM parameters, and other settings.\n\n## Architecture Overview\n\nPixeltable serves as the central nervous system, orchestrating data flow, model execution, and state management. When a user submits a query:\n\n1.  The query is inserted into the main `agents.tools` Pixeltable table.\n2.  Pixeltable's computed columns trigger sequentially:\n    -   Semantic searches across indexed documents, images, video frames, audio transcripts, memory bank, and chat history.\n    -   An initial LLM call (Claude) analyzes the prompt and determines if any external tools are needed.\n    -   If tools are required, Pixeltable executes the relevant User-Defined Functions (UDFs) via `pxt.invoke_tools`.\n    -   Context from searches and tool outputs is assembled.\n    -   Recent chat history is retrieved.\n    -   A final LLM call (Claude) synthesizes the context, history, and original prompt to generate the main response.\n    -   A follow-up LLM call (Mistral) generates potential next questions.\n    -   The user query and assistant response are logged to the `agents.chat_history` table.\n3.  The final answer, relevant context (images/video frames), and follow-up suggestions are returned to the UI.\n\nImage generation requests follow a similar pattern, inserting into `agents.image_generation_tasks` and using a computed column to call the DALL-E API.\n\n```mermaid\ngraph TD\n    %% User Interaction\n    User([User]) --\u003e|Query| ToolsTable[agents.tools Table]\n    User --\u003e|Save Memory| MemoryBankTable[agents.memory_bank Table]\n    User --\u003e|Upload File| SourceTables[\"Source Data Tables\\n(collection, images, videos, audios)\"]\n    User --\u003e|Generate Image| ImageGenTable[agents.image_generation_tasks Table]\n\n    %% Main Agent Workflow (Computed Columns on agents.tools)\n    ToolsTable --\u003e|Prompt| ChatHistorySearch[Search Chat History]\n    ToolsTable --\u003e|Prompt| MemorySearch[Search Memory Bank]\n    ToolsTable --\u003e|Prompt| DocSearch[Search Documents]\n    ToolsTable --\u003e|Prompt| ImageSearch[\"Search Images\"]\n    ToolsTable --\u003e|Prompt| VideoFrameSearch[\"Search Video Frames\"]\n    ToolsTable --\u003e|Prompt| AudioTranscriptSearch[Search Audio Transcripts]\n    ToolsTable --\u003e|Prompt, Params| InitialLLM[\"Claude 3.5 (Tool Choice)\"]\n\n    InitialLLM --\u003e|Tool Choice| ToolExecution[\"Execute Tools (UDFs)\"]\n\n    %% Step 1: Assemble TEXT context ONLY\n    ChatHistorySearch --\u003e|Context| AssembleTextContext[Assemble Text Context]\n    MemorySearch ------\u003e|Context| AssembleTextContext\n    DocSearch ---------\u003e|Context| AssembleTextContext\n    AudioTranscriptSearch --\u003e|Context| AssembleTextContext\n    ToolExecution ------\u003e|Tool Output| AssembleTextContext\n\n    %% Step 2: Assemble FINAL messages (including images/frames/recent history)\n    AssembleTextContext --\u003e|Text Summary| FinalMessages[Assemble Final Messages]\n    ImageSearch -------\u003e|Image Context| FinalMessages\n    VideoFrameSearch --\u003e|Frame Context| FinalMessages\n    ToolsTable --\u003e|Recent History| FinalMessages\n\n    FinalMessages ---\u003e|Messages| FinalLLM[\"Claude 3.5 (Answer)\"]\n    FinalLLM --\u003e|Answer| ExtractAnswer[Extract Answer]\n    ExtractAnswer --\u003e|Answer| User\n\n    %% Follow-up Generation\n    FinalLLM --\u003e|Answer| FollowUpLLM[\"mistral-small-latest (Follow-up)\"]\n    FollowUpLLM --\u003e|Suggestions| User\n\n    %% Chat History Logging\n    ExtractAnswer --\u003e|Answer| LogChat[Log to agents.chat_history]\n    ToolsTable --\u003e|User Prompt| LogChat\n\n    %% Image Generation Workflow\n    ImageGenTable --\u003e|Prompt| OpenAI_Dalle[\"OpenAI DALL-E 3 (Computed Col)\"]\n    OpenAI_Dalle --\u003e|Image Data| ImageGenTable\n    ImageGenTable --\u003e|Retrieve Image| User\n\n    %% Implicit Dependencies (Views, Indexing)\n    subgraph \"Supporting Data \u0026 Indexes\"\n        direction LR\n        SourceTables --\u003e Views[\"Views (Chunks, Frames, Sentences)\"]\n        Views --\u003e Indexes[\"Embedding Indexes (E5, CLIP)\"]\n        MemoryBankTable --\u003e MemIndex[E5 Index]\n        LogChat --\u003e ChatHistIndex[E5 Index]\n    end\n\n    %% Styling\n    classDef table fill:#ffc0cb,stroke:#333\n    classDef view fill:#add8e6,stroke:#333\n    classDef llm fill:#ffe4b5,stroke:#333\n    classDef workflow fill:#e6e6fa,stroke:#333\n    classDef search fill:#98fb98,stroke:#333\n    classDef udf fill:#f5f5dc,stroke:#666\n    classDef io fill:#fff,stroke:#000,stroke-width:2px\n\n    class User,SourceTables,ImageGenTable,LogChat,MemoryBankTable io\n    class ToolsTable table\n    class Views view\n    class Indexes,MemIndex,ChatHistIndex search\n    class InitialLLM,FinalLLM,FollowUpLLM,OpenAI_Dalle llm\n    class ChatHistorySearch,MemorySearch,DocSearch,ImageSearch,VideoFrameSearch,AudioTranscriptSearch,AssembleTextContext,FinalMessages,ExtractAnswer,ToolExecution workflow\n```\n\n## Project Structure\n\n```\n.\n├── .env                  # Environment variables (API keys)\n├── .venv/                # Virtual environment files (if created here)\n├── data/                 # Default directory for uploaded/source media files\n├── logs/                 # Application logs\n│   └── app.log\n├── static/               # Static assets for Flask frontend (CSS, JS)\n│   ├── css/\n│   │   └── style.css\n│   ├── image/\n│   │   └── *.png\n│   ├── js/\n│   │   ├── api.js\n│   │   └── ui.js\n│   └── manifest.json\n├── templates/            # HTML templates for Flask frontend\n│   └── index.html\n├── endpoint.py           # Flask backend: API endpoints and UI rendering\n├── functions.py          # Python UDFs registered as Pixeltable functions/tools\n├── config.py             # Central configuration (model IDs, defaults)\n├── requirements.txt      # Python dependencies\n├── setup_pixeltable.py   # Pixeltable schema setup script\n└── README.md             # This file\n```\n\n## Pixeltable Schema Overview\n\nPixeltable organizes data in directories, tables, and views. This application uses the following structure within the `agents` directory:\n\n```\nagents/\n├── collection              # Table: Source documents (PDF, TXT, etc.)\n│   ├── document: pxt.Document\n│   ├── uuid: pxt.String\n│   └── timestamp: pxt.Timestamp\n├── images                  # Table: Source images\n│   ├── image: pxt.Image\n│   ├── uuid: pxt.String\n│   ├── timestamp: pxt.Timestamp\n│   └── thumbnail: pxt.String(computed) # Base64 sidebar thumbnail\n├── videos                  # Table: Source videos\n│   ├── video: pxt.Video\n│   ├── uuid: pxt.String\n│   ├── timestamp: pxt.Timestamp\n│   └── audio: pxt.Audio(computed)      # Extracted audio\n├── audios                  # Table: Source audio files (MP3, WAV)\n│   ├── audio: pxt.Audio\n│   ├── uuid: pxt.String\n│   └── timestamp: pxt.Timestamp\n├── chat_history            # Table: Stores conversation turns\n│   ├── role: pxt.String        # 'user' or 'assistant'\n│   ├── content: pxt.String\n│   └── timestamp: pxt.Timestamp\n├── memory_bank             # Table: Saved text/code snippets\n│   ├── content: pxt.String\n│   ├── type: pxt.String         # 'code' or 'text'\n│   ├── language: pxt.String    # e.g., 'python'\n│   ├── context_query: pxt.String # Original query or note\n│   └── timestamp: pxt.Timestamp\n├── image_generation_tasks  # Table: Image generation requests \u0026 results\n│   ├── prompt: pxt.String\n│   ├── timestamp: pxt.Timestamp\n│   └── generated_image: pxt.Image(computed) # DALL-E 3 output\n├── tools                   # Table: Main agent workflow orchestration\n│   ├── prompt: pxt.String\n│   ├── timestamp: pxt.Timestamp\n│   ├── initial_system_prompt: pxt.String\n│   ├── final_system_prompt: pxt.String\n│   ├── max_tokens, stop_sequences, temperature, top_k, top_p # LLM Params\n│   ├── initial_response: pxt.Json(computed)  # Claude tool choice\n│   ├── tool_output: pxt.Json(computed)       # Output from executed UDFs\n│   ├── doc_context: pxt.Json(computed)       # Results from document search\n│   ├── image_context: pxt.Json(computed)     # Results from image search\n│   ├── video_transcript_context: pxt.Json(computed) # Results from video audio search\n│   ├── audio_transcript_context: pxt.Json(computed) # Results from direct audio search\n│   ├── video_frame_context: pxt.Json(computed) # Results from video frame search\n│   ├── memory_context: pxt.Json(computed)    # Results from memory bank search\n│   ├── chat_memory_context: pxt.Json(computed) # Results from chat history search\n│   ├── history_context: pxt.Json(computed)   # Recent chat turns\n│   ├── multimodal_context_summary: pxt.String(computed) # Assembled non-image/frame context string\n│   ├── final_prompt_messages: pxt.Json(computed) # Fully assembled messages for final LLM\n│   ├── final_response: pxt.Json(computed)    # Claude final answer generation\n│   ├── answer: pxt.String(computed)          # Extracted text answer\n│   ├── follow_up_input_message: pxt.String(computed) # Prompt for Mistral\n│   ├── follow_up_raw_response: pxt.Json(computed) # Raw Mistral response\n│   └── follow_up_text: pxt.String(computed) # Extracted follow-up suggestions\n├── chunks                  # View: Document chunks\n│   └── (Implicit: EmbeddingIndex: E5-large-instruct on text)\n├── video_frames            # View: Video frames (1 FPS)\n│   └── (Implicit: EmbeddingIndex: CLIP on frame)\n├── video_audio_chunks      # View: Audio chunks from videos\n│   └── transcription: pxt.Json(computed)   # Whisper transcription\n├── video_transcript_sentences # View: Sentences from video transcripts\n│   └── (Implicit: EmbeddingIndex: E5-large-instruct on text)\n├── audio_chunks            # View: Audio chunks from direct audio files\n│   └── transcription: pxt.Json(computed)   # Whisper transcription\n└── audio_transcript_sentences # View: Sentences from direct audio transcripts\n    └── (Implicit: EmbeddingIndex: E5-large-instruct on text)\n\n# Embedding Indexes Enabled On:\n# - agents.chunks.text\n# - agents.images.image\n# - agents.video_frames.frame\n# - agents.video_transcript_sentences.text\n# - agents.audio_transcript_sentences.text\n# - agents.memory_bank.content\n# - agents.chat_history.content\n```\n\n## Getting Started\n\n### Prerequisites\n\n-   Python 3.9+\n-   API Keys:\n    -   Anthropic (Claude)\n    -   OpenAI (Whisper \u0026 DALL-E)\n    -   Mistral AI (Mistral Small)\n    -   NewsAPI (Optional, for news tool)\n# Get API Keys from:\n    -   [Anthropic](https://console.anthropic.com/)\n    -   [OpenAI](https://platform.openai.com/api-keys)\n    -   [Mistral AI](https://console.mistral.ai/api-keys/)\n    -   [NewsAPI](https://newsapi.org/) (Optional, for news tool)\n-   [Pixeltable](https://github.com/pixeltable/pixeltable) (Installed via `requirements.txt`)\n\n### Installation\n\n```bash\n# 1. Create and activate a virtual environment (recommended)\npython -m venv .venv\n# Windows: .venv\\\\Scripts\\\\activate\n# macOS/Linux: source .venv/bin/activate\n\n# 2. Install dependencies\npip install -r requirements.txt\n```\n\n### Environment Setup\n\nCreate a `.env` file in the project root and add your API keys. Keys marked with `*` are required for core functionality.\n\n```dotenv\n# Required for Core LLM Functionality *\nANTHROPIC_API_KEY=sk-ant-api03-...  # For main reasoning/tool use (Claude 3.5 Sonnet)\nOPENAI_API_KEY=sk-...             # For audio transcription (Whisper) \u0026 image generation (DALL-E 3)\nMISTRAL_API_KEY=...               # For follow-up question suggestions (Mistral Small)\n\n# Optional (Enable specific tools by providing keys)\nNEWS_API_KEY=...                  # Enables the NewsAPI tool\n# Note: yfinance and DuckDuckGo Search tools do not require API keys.\n\n# Optional: Flask Environment Setting\n# Uncomment for development mode (debugging, auto-reload)\n# FLASK_ENV=development\n```\n\n### Running the Application\n\n1.  **Initialize Pixeltable Schema:**\n    This script creates the necessary Pixeltable directories, tables, views, and computed columns defined in `setup_pixeltable.py`. It also inserts some sample data. Run this *once* initially.\n\n    *Why run this?* This defines the data structures and processing logic within Pixeltable. It tells Pixeltable how to store, transform, and index your data.\n\n```bash\npython setup_pixeltable.py\n```\n\n2.  **Start the Web Server:**\n    This runs the Flask application using the Waitress production server by default.\n\n```bash\npython endpoint.py\n    ```\n\n    To run in Flask's development mode (with auto-reload and debugging):\n\n    ```bash\n    # Set environment variable first\n    # Windows: set FLASK_ENV=development\n    # macOS/Linux: export FLASK_ENV=development\npython endpoint.py\n```\n\n    The application will be available at `http://localhost:5000`.\n\n**Data Persistence Note:** Pixeltable stores all its data (file references, tables, views, indexes) locally, typically in a `.pixeltable` directory created within your project workspace. This means your uploaded files, generated images, chat history, and memory bank are persistent across application restarts.\n\n## Usage Overview\n\nThe web interface provides several tabs:\n\n-   **Chat Interface**: Main interaction area. Ask questions, switch between chat and image generation modes. View results, including context retrieved (images, video frames) and follow-up suggestions. Save responses to the Memory Bank.\n-   **Agent Settings**: Configure the system prompts (initial for tool use, final for answer generation) and LLM parameters (temperature, max tokens, etc.) used by Claude.\n-   **Chat History**: View past queries and responses. Search history and view detailed execution metadata for each query. Download history as JSON.\n-   **Generated Images**: View images created using the image generation mode. Search by prompt, view details, download, or delete images.\n-   **Memory Bank**: View, search, manually add, and delete saved text/code snippets. Download memory as JSON.\n-   **How it Works**: Provides a technical overview of how Pixeltable powers the application's features.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit pull requests or open issues.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpierrebrunelle%2Fpixelbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpierrebrunelle%2Fpixelbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpierrebrunelle%2Fpixelbot/lists"}