{"id":29180506,"url":"https://github.com/gnuheike/ai-conversationinsights-python3-chromadb-llvm","last_synced_at":"2026-04-28T14:34:58.166Z","repository":{"id":301749650,"uuid":"1010198572","full_name":"gnuheike/AI-ConversationInsights-Python3-ChromaDB-LLVM","owner":"gnuheike","description":"A Python application for analyzing Telegram chat messages using ChromaDB and LLMs. This tool analyzes Telegram chat exports to answer questions about conversation patterns using embeddings and locally-run LLMs.","archived":false,"fork":false,"pushed_at":"2025-06-28T15:52:36.000Z","size":125,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-28T16:24:50.688Z","etag":null,"topics":["analytics","chromadb","llvm","privacy","python","telegram"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gnuheike.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-28T15:05:10.000Z","updated_at":"2025-06-28T15:52:39.000Z","dependencies_parsed_at":"2025-06-28T16:24:52.949Z","dependency_job_id":"412335cf-77cb-43bd-b74c-6b474ef43633","html_url":"https://github.com/gnuheike/AI-ConversationInsights-Python3-ChromaDB-LLVM","commit_stats":null,"previous_names":["gnuheike/llvm_telegram_analyzer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gnuheike/AI-ConversationInsights-Python3-ChromaDB-LLVM","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnuheike%2FAI-ConversationInsights-Python3-ChromaDB-LLVM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnuheike%2FAI-ConversationInsights-Python3-ChromaDB-LLVM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnuheike%2FAI-ConversationInsights-Python3-ChromaDB-LLVM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnuheike%2FAI-ConversationInsights-Python3-ChromaDB-LLVM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gnuheike","download_url":"https://codeload.github.com/gnuheike/AI-ConversationInsights-Python3-ChromaDB-LLVM/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnuheike%2FAI-ConversationInsights-Python3-ChromaDB-LLVM/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263029211,"owners_count":23402354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","chromadb","llvm","privacy","python","telegram"],"created_at":"2025-07-01T20:00:32.203Z","updated_at":"2026-04-28T14:34:58.123Z","avatar_url":"https://github.com/gnuheike.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Telegram Analyze](https://raw.githubusercontent.com/gnuheike/llvm_telegram_analyzer/refs/heads/main/logo.jpg \"Telegram Analyzer\")\n\n# Telegram Analyzer\n\nA Python application for analyzing Telegram chat messages using ChromaDB and LLMs. This tool analyzes Telegram chat exports to answer questions about\nconversation patterns using embeddings and locally-run LLMs.\n\n## Project Purpose\n\nTelegram Analyzer is designed to help users extract insights from their Telegram conversations without compromising privacy. It processes JSON exports of\nTelegram chats, stores them in a vector database (ChromaDB), and allows users to ask natural language questions about the content. The tool runs entirely\nlocally, ensuring that sensitive conversation data never leaves your machine.\n\n## Overview\n\nTelegram Analyzer is a tool that allows you to:\n\n1. Parse JSON Telegram message exports\n2. Load messages into ChromaDB for semantic search\n3. Query the database to ask questions about the conversation\n4. Generate answers using Ollama LLM models\n\n## Features\n\n- **Semantic Search**: Uses embeddings to find relevant messages based on meaning, not just keywords\n- **Local Execution**: Runs entirely on your machine with no data sent to external services\n- **Batch Processing**: Process multiple questions at once for efficient analysis\n- **Customizable Models**: Works with various Ollama models to balance speed and accuracy\n- **Privacy-Focused**: Designed with data privacy as a core principle\n- **Detailed Output**: Provides answers with metadata about processing time and relevant message count\n\n## How to use\n\n1. Export your Telegram messages to a JSON file (click 3 dots in the Telegram chat, export messages to JSON without any attachments)\n2. Process messages to store the vectors in ChromaDB (see \"Loading Data\" section below)\n3. Use LLMs to query the messages using ChromaDB as context (see \"Querying\" section below)\n\n## Installation\n\n### Prerequisites\n\n- Python 3.9+\n- [Ollama](https://ollama.ai/) installed and running locally\n\n### Setup\n\n1. Clone the repository:\n   ```bash\n   git clone \u003crepository-url\u003e\n   cd telegram-analyzer\n   ```\n\n2. Set up your configuration (see [Configuration](#configuration) section)\n\n3. Create and activate a virtual environment:\n   ```bash\n   python -m venv .venv\n   source .venv/bin/activate  # On Windows: .venv\\Scripts\\activate\n   ```\n\n4. Install dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n5. Pull the required Ollama model (you can refer to https://www.reddit.com/r/LocalLLaMA/comments/16y95hk/a_starter_guide_for_playing_with_your_own_local_ai/ to\n   select the model)\n   ```bash\n   ollama pull deepseek-r1:32b-qwen-distill-q8_0\n   ```\n   You can configure a different model in `telegram_analyzer/config.py`.\n\n## Usage\n\nThe application provides a command-line interface with several commands:\n\n### Loading Data\n\nTo load Telegram messages from a JSON export file into ChromaDB:\n\n```bash\npython main.py load result.json --collection my_chat --batch-size 1000\n```\n\nThis will process the messages in batches of 1000 and store them in a ChromaDB collection named \"my_chat\". The process includes:\n\n1. Parsing the JSON file to extract messages\n2. Converting messages to embeddings using a sentence transformer model\n3. Storing the embeddings and original messages in ChromaDB for semantic search\n\nYou'll see progress logs as the data is processed, and a final confirmation when loading is complete:\n\n```\nINFO: Loading messages from result.json\nINFO: Loaded 15423 messages from JSON file\nINFO: Processing batch 1/16 (1000 messages)\n...\nINFO: Successfully loaded 15423 messages into collection my_chat\n```\n\nOptions:\n\n- `--collection`: Name of the ChromaDB collection (default: \"telegram_messages\")\n- `--batch-size`: Number of messages to process in each batch (default: 5000)\n- `--no-reset`: Don't reset the collection before loading (useful for adding new messages to an existing collection)\n\n### Querying\n\nTo ask a question about the messages:\n\n```bash\npython main.py query \"What topics were most frequently discussed?\" --collection my_chat --output answer.md\n```\n\nThis will:\n\n1. Find the most relevant messages related to your question\n2. Use the Ollama LLM to generate a comprehensive answer based on those messages\n3. Save the answer to answer.md\n\nExample output in the terminal:\n\n```\n================================================================================\nQuestion: What topics were most frequently discussed?\n================================================================================\nAnswer: Based on the conversation history, the most frequently discussed topics include:\n\n1. Project planning and development - There are numerous discussions about timelines, \n   feature implementation, and development progress.\n\n2. Technical issues - The conversation contains many exchanges about debugging problems, \n   code reviews, and technical solutions.\n\n3. Meeting coordination - Team members frequently discuss scheduling meetings, \n   sharing agendas, and following up on action items.\n...\n================================================================================\nProcessing time: 5.23 seconds\nRelevant messages: 1000\n================================================================================\n```\n\nOptions:\n\n- `--collection`: Name of the ChromaDB collection (default: \"telegram_messages\")\n- `--model`: Name of the Ollama model (default: \"deepseek-r1:32b-qwen-distill-q8_0\")\n- `--top-k`: Number of relevant messages to include in the context (default: 1000)\n- `--output`: Path to save the answer (default: print to stdout)\n\n### Batch Processing\n\nProcess multiple questions from a file:\n\n```bash\npython main.py batch questions.txt --output results.md\n```\n\nOptions:\n\n- `--collection`: Name of the ChromaDB collection (default: \"telegram_messages\")\n- `--model`: Name of the Ollama model (default: \"deepseek-r1:32b-qwen-distill-q8_0\")\n- `--top-k`: Number of relevant messages to include in the context (default: 1000)\n- `--output`: Path to save the answers in markdown format (default: \"telegram_analysis_results.md\")\n\n### Checking Database\n\nCheck the status of the ChromaDB collection:\n\n```bash\npython main.py check\n```\n\nOptions:\n\n- `--collection`: Name of the ChromaDB collection (default: \"telegram_messages\")\n\n### Processing Query Sets\n\nProcess predefined sets of questions from Python files in the 'queries' folder:\n\n```bash\npython main.py process-queries\n```\n\nThis command will:\n\n1. Scan the 'queries' folder for Python files containing question sets\n2. Allow you to select a specific query set or process a specified file\n3. Process each question in the selected set using the QueryProcessor\n4. Save the results to a Markdown file with the query set name and date\n\nExample output in the terminal:\n\n```\nFound 10 query file(s) to process\n\nAvailable query sets:\n1. Couple Queries (couple)\n   Queries for analyzing relationships between two people.\n\n2. Team Queries (team)\n   Queries for analyzing team communication and dynamics.\n\n...\n\nSelect a query set (number) or 'q' to quit: 1\nCreated output file: couple_results_2023-11-15.md\nProcessing question 1/55: 'What did the husband and wife discuss regarding dinner plans?'\nProcessed 1/55: 'What did the husband and wife discuss regarding dinner...' (3.45s)\n...\nQuery set processed. Processed 55/55 questions. Results saved to couple_results_2023-11-15.md\n```\n\nThe output file will contain:\n\n- The title and description of the query set\n- The date of analysis\n- Each question followed by its answer\n\nOptions:\n\n- `--file`: Specific query file to process (default: interactive selection)\n- `--collection`: Name of the ChromaDB collection (default: \"telegram_messages\")\n- `--model`: Name of the Ollama model (default: \"deepseek-r1:32b-qwen-distill-q8_0\")\n- `--top-k`: Number of relevant messages to include in the context (default: 1000)\n\n## Configuration\n\nThe application uses a configuration file located at `telegram_analyzer/config.py`. To set up your configuration:\n\n1. Copy the example configuration file:\n   ```bash\n   cp telegram_analyzer/config.example.py telegram_analyzer/config.py\n   ```\n\n2. Edit `config.py` to customize the settings according to your needs:\n    - **ChromaDB settings**: Change persistence directory or collection name\n    - **Sentence Transformer model**: Select a different embedding model or change the device (cpu/cuda/mps)\n    - **Query parameters**: Adjust the number of messages to include in context\n    - **Ollama model settings**: Change the model or adjust generation parameters (temperature, context size)\n    - **Output and logging settings**: Modify output file paths and log levels\n\nThe example configuration file includes detailed comments explaining each setting and its possible values.\n\n## Project Structure\n\n```\ntelegram_analyzer/\n├── __init__.py          # Package initialization\n├── config.py            # Configuration settings\n├── logging.py           # Logging setup\n├── data_processing.py   # Telegram data processing\n├── database.py          # ChromaDB interaction\n├── query.py             # Query processing and answer generation\n└── cli.py               # Command-line interface\n```\n\n## Data Privacy\n\nThis tool is designed to run entirely locally and does not send data to any cloud providers. All processing happens on your machine, ensuring that your\nsensitive conversation data remains private. Key privacy features include:\n\n- **Local Execution**: All data processing and LLM inference runs on your local machine\n- **No Data Transmission**: No data is sent to external servers or APIs\n- **Persistent Storage Control**: You control where the data is stored on your system\n- **No Account Required**: No need to create accounts or authenticate with external services\n\nUsers are responsible for ensuring that sensitive data is not uploaded to public repositories or shared inappropriately. Always be cautious when sharing\nanalysis results that might contain private information.\n\n## Limitations\n\nCurrent limitations of the tool include:\n\n- **JSON Format Only**: Currently supports only JSON exports from Telegram Desktop\n- **Text-Only Analysis**: Media files (images, videos, audio) are not processed\n- **Local LLM Dependency**: Requires Ollama to be installed and running locally\n- **Resource Intensive**: Processing large chat histories may require significant memory and storage\n- **English-Centric**: Works best with English language content, though other languages are supported\n\n## Contributing\n\nContributions are welcome! If you'd like to improve Telegram Analyzer, please:\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Submit a pull request\n\nFor bug reports, feature requests, or questions, please open an issue in the repository.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgnuheike%2Fai-conversationinsights-python3-chromadb-llvm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgnuheike%2Fai-conversationinsights-python3-chromadb-llvm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgnuheike%2Fai-conversationinsights-python3-chromadb-llvm/lists"}