An open API service indexing awesome lists of open source software.

https://github.com/angrysky56/coder_db

An intelligent code memory system that leverages vector embeddings, structured databases, and knowledge graphs to store, retrieve, and analyze code patterns with semantic search capabilities, quality metrics, and relationship modeling. Designed to enhance programming workflows through contextual recall of best practices, algorithms, and solutions.
https://github.com/angrysky56/coder_db

agent ai claude coder db mcp-server

Last synced: 5 months ago
JSON representation

An intelligent code memory system that leverages vector embeddings, structured databases, and knowledge graphs to store, retrieve, and analyze code patterns with semantic search capabilities, quality metrics, and relationship modeling. Designed to enhance programming workflows through contextual recall of best practices, algorithms, and solutions.

Awesome Lists containing this project

README

          

# Coder DB - AI Memory Enhancement System

A structured memory system for AI assistants to enhance coding capabilities using database integration utilizing Claude Desktop and MCP Servers.

## Overview

This system leverages multiple database types to create a comprehensive memory system for coding assistance:

1. **Qdrant Vector Database**: For semantic search and retrieval of code patterns
2. **SQLite Database**: For structured algorithm storage and versioning
3. **Knowledge Graph**: For representing relationships between coding concepts

## Database Usage Guide

### Qdrant Memory Storage

For storing and retrieving code snippets, patterns, and solutions by semantic meaning.

**What to store:**
- Reusable code patterns with explanations
- Solutions to complex problems
- Best practices and design patterns
- Documentation fragments and explanations

**Enhanced Metadata:**
- Language and framework details
- Complexity level (simple, intermediate, advanced)
- Dependencies and requirements
- Quality metrics (cyclomatic complexity, documentation coverage)
- User feedback and ratings

**Example Usage:**
```python
# Storing a code pattern
information = {
"type": "code_pattern",
"language": "python",
"name": "Context Manager Pattern",
"code": "class MyContextManager:\n def __enter__(self):\n # Setup code\n return self\n def __exit__(self, exc_type, exc_val, exc_tb):\n # Cleanup code\n pass",
"explanation": "Context managers provide a clean way to manage resources like file handles.",
"tags": ["python", "resource management", "context manager"],
"complexity": "intermediate",
"quality_metrics": {
"cyclomatic_complexity": 2,
"documentation_coverage": 0.85
},
"user_rating": 4.5
}
# Store in Qdrant
```

### SQLite Algorithm Database

For maintaining a structured catalog of algorithms with proper versioning.

**Database Schema:**
- `algorithms`: Basic algorithm information (name, description)
- `algorithm_versions`: Different versions of algorithm implementations
- `algorithm_categories`: Categories like Sorting, Searching, Graph, etc.
- `performance_metrics`: Performance data for different implementations
- `improvements`: Tracked improvements between versions
- `change_logs`: Detailed logs of changes with rationale and context

**Version Diffing:**
- Store diffs between algorithm versions
- Track performance improvements across versions
- Document rationale behind changes

**Example Query:**
```sql
-- Find all sorting algorithms with performance metrics
SELECT a.name, a.description, v.version_number, p.time_complexity, p.space_complexity
FROM algorithms a
JOIN algorithm_versions v ON a.id = v.algorithm_id
JOIN performance_metrics p ON v.id = p.version_id
JOIN algorithm_category_mapping m ON a.id = m.algorithm_id
JOIN algorithm_categories c ON m.category_id = c.id
WHERE c.name = 'Sorting'
ORDER BY a.name, v.version_number DESC;

-- Get change logs for a specific algorithm
SELECT v.version_number, c.change_description, c.rationale, c.created_at
FROM algorithm_versions v
JOIN change_logs c ON v.id = c.version_id
WHERE v.algorithm_id = 5
ORDER BY v.version_number;
```

### Knowledge Graph Integration

For representing complex relationships between coding concepts, patterns, and solutions.

**Advanced Ontology:**
- Algorithm
- DesignPattern
- CodeConcept
- ProblemType
- Solution
- Framework
- Library
- Language

**Rich Relation Types:**
- IMPLEMENTS (Algorithm → CodeConcept)
- SOLVES (DesignPattern → ProblemType)
- OPTIMIZES (Algorithm → Performance)
- RELATED_TO (Any → Any)
- IMPROVES_UPON (Solution → Solution)
- ALTERNATIVELY_SOLVES (Solution → ProblemType)
- EXTENDS (Pattern → Pattern)
- DEPENDS_ON (Solution → Library)
- COMPATIBLE_WITH (Framework → Language)

**Graph Analytics:**
- Identify frequently co-occurring patterns
- Discover emerging trends in coding practices
- Map problem domains to solution approaches

## Usage Workflows

### 1. Enhanced Problem-Solving Workflow

When facing a new coding problem:

1. **Context Gathering**:
- Clearly define the problem and constraints
- Identify performance requirements and environment details
- Document project-specific considerations

2. **Memory Querying**:
- Break down the problem using sequential thinking
- Query Qdrant for similar solutions: `qdrant-find-memories("efficient way to traverse binary tree")`
- Filter results by language, complexity, and quality metrics
- Check algorithm database for relevant algorithms: `SELECT * FROM algorithms WHERE name LIKE '%tree%'`
- Explore knowledge graph for related concepts and alternative approaches

3. **Solution Application**:
- Test and verify solution in REPL
- Document performance characteristics
- Compare against alternatives

4. **Feedback Loop**:
- Store successful solution back in Qdrant with detailed metadata
- Log performance metrics and usage context
- Update knowledge graph connections

### 2. Pattern Learning & Storage

When discovering a useful pattern:

1. **Automated Documentation**:
- Generate initial documentation using AI tools
- Include detailed usage examples
- Document edge cases and limitations

2. **Quality Assessment**:
- Run linters and static analyzers to ensure code quality
- Calculate and store quality metrics
- Validate against best practices

3. **Metadata Enrichment**:
- Document the pattern with clear examples
- Add comprehensive metadata (language, complexity, dependencies)
- Apply consistent tagging from controlled vocabulary

4. **Knowledge Integration**:
- Store in Qdrant with appropriate tags and explanation
- Create knowledge graph connections to related concepts
- Add to SQL database if it's an algorithm implementation
- Suggest automatic connections based on content similarity

### 3. Project Setup & Boilerplate

When starting a new project:

1. **Template Selection**:
- Choose from library of project templates
- Customize based on project requirements
- Select language, framework, and testing tools

2. **Automated Setup**:
- Generate project structure with proper directory layout
- Set up version control with appropriate .gitignore
- Configure linting and code quality tools
- Initialize testing framework

3. **Best Practices Integration**:
- Query memory system for relevant boilerplate code
- Retrieve best practices for the specific project type
- Use stored documentation templates for initial setup
- Configure CI/CD based on project requirements

## Security & Data Integrity

1. **Access Controls**:
- Role-based access for sensitive code repositories
- Permissions for viewing vs. modifying memories

2. **Backup & Recovery**:
- Regular backups of Qdrant and SQLite databases
- Version control for knowledge graph
- Recovery procedures for data corruption

3. **Sensitive Information**:
- Sanitize code examples to remove sensitive data
- Validate code snippets before storage
- Flag and restrict access to sensitive patterns

## Monitoring & Analytics

1. **Usage Tracking**:
- Monitor which patterns are most frequently retrieved
- Track search query patterns to identify knowledge gaps
- Log user ratings and feedback

2. **Performance Metrics**:
- Monitor database response times
- Track memory usage and scaling requirements
- Optimize queries based on usage patterns

## Maintenance Guidelines

1. **Quality over Quantity**: Only store high-quality, well-documented code
2. **Regular Review**: Periodically review and update stored patterns
3. **Contextual Storage**: Include usage context with each stored pattern
4. **Versioning**: Track improvements and versions in SQLite
5. **Tagging Consistency**: Use controlled vocabulary for better retrieval
6. **Performance Optimization**: Regularly optimize database queries
7. **Feedback Integration**: Update patterns based on usage feedback

## Getting Started

1. Store your first code memory:
```
qdrant-store-memory(json.dumps({
"type": "code_pattern",
"name": "Python decorator pattern",
"code": "def my_decorator(func):\n def wrapper(*args, **kwargs):\n # Do something before\n result = func(*args, **kwargs)\n # Do something after\n return result\n return wrapper",
"explanation": "Decorators provide a way to modify functions without changing their code.",
"tags": ["python", "decorator", "metaprogramming"],
"complexity": "intermediate"
}))
```

2. Retrieve it later:
```
qdrant-find-memories("python decorator pattern")
```

## Future Enhancements

- Advanced code quality assessment before storage
- Integration with version control systems
- Learning from usage patterns to improve retrieval
- Automated documentation generation
- Custom IDE plugins for seamless access
- Multi-modal storage (code, diagrams, explanations)
- Natural language interface for querying
- Performance benchmark database
- Install script for MCP Servers and DB

## MCP Server

The MCP (Model Context Protocol) server provides a standardized interface for AI models to interact with the Coder DB memory system. It is built using FastAPI and Uvicorn.

### Project Structure (`mcp_server/`)

The `mcp_server` directory contains the FastAPI application:
```
mcp_server/
├── core/ # Core logic, Pydantic models, configuration
│ ├── __init__.py
│ ├── config.py # Application settings
│ └── models.py # Pydantic data models
├── database/ # Database connectors and SQLAlchemy models
│ ├── __init__.py
│ ├── qdrant_connector.py # Logic for Qdrant vector database
│ ├── sqlite_connector.py # Logic for SQLite relational database
│ └── sql_models.py # SQLAlchemy ORM models for SQLite
├── routers/ # FastAPI routers for different API endpoints
│ ├── __init__.py
│ ├── algorithm.py # Endpoints for /algorithm
│ ├── health.py # Endpoint for /health
│ └── memory.py # Endpoints for /memory
├── tests/ # Unit and integration tests
│ ├── __init__.py
│ ├── test_algorithm_api.py
│ ├── test_health.py
│ └── test_memory_api.py
├── main.py # Main FastAPI application setup and startup
└── pyproject.toml # Project dependencies and metadata (using Poetry)
```
*(Note: `requirements.txt` is part of the old setup and can be removed if `pyproject.toml` and Poetry are used exclusively.)*

### Setup and Running

This project uses [Poetry](https://python-poetry.org/) for dependency management and packaging.

1. **Install Poetry** (if you haven't already):
Follow the instructions on the [Poetry website](https://python-poetry.org/docs/#installation).

2. **Install Dependencies:**
Navigate to the directory containing `pyproject.toml` (this should be the `mcp_server` directory if you created it as a self-contained Poetry project, or the root of this repository if `mcp_server` is a sub-package of a larger Poetry project) and run:
```bash
poetry install --with dev # --with dev includes testing dependencies
```
This will create a virtual environment (if one doesn't exist for this project) and install all dependencies.

3. **Environment Configuration (Optional but Recommended):**
The application uses settings defined in `mcp_server/core/config.py`. You can override these by creating a `.env` file in the same directory where you run `uvicorn` (typically the root of the repository or `mcp_server/` if running from there).
Example `.env` file content:
```env
# mcp_server/.env or project_root/.env
# QDRANT_HOST="your_qdrant_host_if_not_localhost" # Overrides default 'localhost'
# QDRANT_PORT=6334 # Overrides default 6333
# QDRANT_API_KEY="your_qdrant_api_key_if_any"
# SQLITE_DATABASE_URL="sqlite+aiosqlite:///./custom_mcp_data.db" # Overrides default ./mcp_server.db
# SQLITE_ECHO_LOG=True # To see SQLAlchemy logs
```
Refer to `mcp_server/core/config.py` for all available settings.

4. **Run the Development Server:**
Ensure your Poetry environment is active (e.g., by running `poetry shell` in the directory with `pyproject.toml`) or prepend commands with `poetry run`.
From the root of the repository (the directory containing the `mcp_server` folder):
```bash
poetry run uvicorn mcp_server.main:app --reload --host 0.0.0.0 --port 8000
```
The server will be available at `http://127.0.0.1:8000`.
* Interactive API documentation (Swagger UI): `http://127.0.0.1:8000/docs`
* Alternative API documentation (ReDoc): `http://127.0.0.1:8000/redoc`

### Running Tests

With development dependencies installed (`poetry install --with dev`):
From the root of the repository:
```bash
poetry run pytest
```
This will discover and run tests located in the `mcp_server/tests/` directory.

### API Endpoints

The server currently exposes the following main API endpoints. For detailed request/response models and to try them out, please visit the `/docs` URL when the server is running.

* **System Endpoints (Tag: `System`)**
* `GET /`: Provides basic information about the server.
* `GET /health`: A simple health check endpoint. Returns `{"status": "OK"}`.

* **Memory Management Endpoints (Tag: `Memory Management`, Prefix: `/memory`)**
* `POST /memory/store`: Stores a new memory item (e.g., code pattern, solution, documentation snippet) into the Qdrant vector database.
* **Request Body**: `StoreMemoryRequest` (JSON object containing a `MemoryItem`).
* **Response**: `StoreMemoryResponse` (JSON object with the ID of the stored item and status).
* `POST /memory/find`: Searches for memory items in Qdrant based on a natural language query and/or filters.
* **Request Body**: `FindMemoryRequest` (JSON object with query string, limit, and optional filters like language, tags, complexity).
* **Response**: `FindMemoryResponse` (JSON object containing a list of matching `MemoryItem`s and a count).

* **Algorithm Management Endpoints (Tag: `Algorithm Management`, Prefix: `/algorithm`)**
* `POST /algorithm/store`: Stores a new algorithm or a new version of an existing algorithm in the SQLite database.
* **Request Body**: `StoreAlgorithmRequest` (JSON object containing `Algorithm` data, including its versions).
* **Response**: `StoreAlgorithmResponse` (JSON object with the ID of the stored/updated algorithm and status).
* `POST /algorithm/find`: Searches for algorithms in the SQLite database by name or category.
* **Request Body**: `FindAlgorithmRequest` (JSON object with optional `name` and `category` fields).
* **Response**: `FindAlgorithmResponse` (JSON object containing a list of matching `Algorithm`s and a count).

* **Collection Management Endpoints (Tag: `Collection Management`, Prefix: `/coder/collections`)**
* `POST /coder/collections/create`: Creates a new Qdrant collection.
* **Request Body**: `CreateCollectionRequest` (JSON object with `collection_name` and optional `model_name`). If `model_name` is omitted, the default embedding model specified in server configuration is used. The model's name and vector size are stored in the collection's metadata.
* **Response**: Confirmation message including the collection name and the embedding model information that was applied.
* `GET /coder/collections/{collection_name}/info`: Retrieves information about a specific Qdrant collection, including its configuration and associated embedding model metadata.
* **Path Parameter**: `collection_name`.
* **Response**: JSON object with collection details (status, point/vector counts, configuration) and embedding model metadata.

### Embedding Management

The server now uses actual text embeddings (via the FastEmbed library) for storing and searching memories in Qdrant.
* **Default Model**: The server is configured with a default embedding model (e.g., `sentence-transformers/all-MiniLM-L6-v2`). This model's dimension is used as the default vector size for new collections if no specific model is indicated.
* **Collection-Specific Models**: When creating a collection via the `/coder/collections/create` endpoint, you can specify a different (supported) FastEmbed model. The chosen model's name and its specific vector dimension will be associated with the collection and stored in its metadata. This dimension will be used when configuring the vector parameters for that collection in Qdrant.
* **Dynamic Embedding**: When storing or searching memories:
* The system determines the correct embedding model (and its vector dimension) by checking the metadata of the target Qdrant collection.
* If a memory is stored or searched in a collection that does not have this specific model metadata (e.g., a collection created externally or before this feature was implemented), the server's configured default embedding model and its dimension are used.
* The `store_memory` operation, if targeting a non-existent collection, will attempt to create it using `create_coder_collection`. If no model is specified for this implicit creation, the default model and its settings will be applied to the new collection.
* The actual text (e.g., from `memory_item.explanation` and/or `memory_item.code`) is then embedded using the determined provider before being sent to Qdrant.
* **Supported Models**: The `EnhancedEmbeddingModelManager` has a predefined list of supported FastEmbed models and their properties. This list can be expanded.

### Knowledge Graph (Neo4j) Integration (Initial)

The server now includes an initial integration with Neo4j as a knowledge graph database. This is the third core database type, enabling the representation and querying of complex relationships between coding concepts, stored memory items, and algorithms.

* **Purpose**:
* To model relationships like "MemoryItem X uses Language Y," "Algorithm Z is a type of Concept A," "Concept B is related to Concept C."
* To enable more advanced contextual understanding, discovery, and navigation of stored knowledge.
* To serve as a foundation for future analytical capabilities (e.g., identifying highly connected concepts, suggesting related memories based on graph paths).

* **Setup Requirements**:
* A running Neo4j instance (version 4.4 or 5.x recommended for `IF NOT EXISTS` clause support in constraints).
* Connection parameters (URL, user, password, database name) must be configured in `mcp_server/core/config.py` or via environment variables (e.g., `NEO4J_URL`, `NEO4J_USER`, `NEO4J_PASSWORD`, `NEO4J_DATABASE_NAME`). The defaults are `bolt://localhost:7687`, user `neo4j`, password `password`, database `neo4j`.
* The `neo4j` Python driver (e.g., `neo4j>=5.17.0`) is now a project dependency (see `pyproject.toml`).

* **Initialization**:
* On application startup, the server initializes an asynchronous Neo4j driver.
* It also attempts to create basic schema constraints to ensure uniqueness for key node properties (e.g., `KGConcept.name`, `KGMemory.memory_id`, `KGAlgorithm.algorithm_id`). These operations are idempotent.

* **Current Knowledge Graph Schema Nodes (Initial Definition)**:
* **`:KGConcept`**: Represents abstract concepts.
* Properties: `name` (string, unique), `type` (string, e.g., 'language', 'framework', 'tag', 'methodology', 'algorithm_category', 'memory_type'), `description` (optional string), `created_at` (datetime).
* **`:KGMemory`**: Represents a `MemoryItem` that has been stored in Qdrant, linking it into the graph.
* Properties: `memory_id` (string, unique, typically the Qdrant point ID or internal UUID), `name` (string, name of the memory item), `type` (string, type of the memory item like 'code_pattern'), `created_at` (datetime).
* **`:KGAlgorithm`**: Represents an `Algorithm` stored in SQLite, linking it into the graph.
* Properties: `algorithm_id` (integer, unique, from SQLite primary key), `name` (string), `category` (string), `created_at` (datetime).

* **Relationships (Initial Definition)**:
* Memory items are linked to concepts using a generic relationship `[:KG_RELATED_TO {type: "SPECIFIC_REL_TYPE"}]`. The `type` property on the relationship stores the semantic meaning. Examples:
* `(:KGMemory)-[:KG_RELATED_TO {type: "USES_LANGUAGE"}]->(:KGConcept {name: "Python"})`
* `(:KGMemory)-[:KG_RELATED_TO {type: "TAGGED_WITH"}]->(:KGConcept {name: "asyncio"})`
* `(:KGMemory)-[:KG_RELATED_TO {type: "IS_OF_TYPE"}]->(:KGConcept {name: "code_pattern"})`

* **API Endpoints for Knowledge Graph (Prefix: `/kg`)**:
* `POST /kg/concept`: Creates or merges a `:KGConcept` node.
* Request Body: `KGConcept` model (`name`, `type`, `description`).
* Response: Details of the created/merged concept.
* `POST /kg/link_memory_to_concept`: Links a memory item to a concept. Creates the `:KGMemory` and `:KGConcept` nodes if they don't exist, then creates/merges the relationship.
* Request Body: `LinkMemoryToConceptRequest` (`memory_id`, `memory_name`, `memory_type`, `concept_name`, `concept_type`, `relationship_type`).
* Response: Confirmation of the link.
* `GET /kg/memory_item/{memory_id}/related_concepts`: Retrieves all `:KGConcept` nodes directly linked to a given `:KGMemory` node (identified by `memory_id`), along with the type of relationship.
* Response: List of concepts with their names, types, descriptions, and the relationship type.

* **Automatic Knowledge Graph Updates**:
* When a new `MemoryItem` is stored via the `POST /memory/store` endpoint:
1. A corresponding `:KGMemory` node is created/merged in Neo4j using the `memory_id` from Qdrant.
2. `:KGConcept` nodes are created/merged for the memory item's `language` (if present), each of its `tags`, and its own `type` (e.g., 'code_pattern').
3. The `:KGMemory` node is then linked to these `:KGConcept` nodes using appropriate `KG_RELATED_TO` relationships (e.g., with `type: "USES_LANGUAGE"`, `type: "TAGGED_WITH"`, `type: "IS_OF_TYPE"`).
* This ensures that new memories are automatically contextualized within the knowledge graph as they are added.

This initial Neo4j integration provides the foundational structure for building more sophisticated knowledge representation and graph-based query capabilities as outlined in the Coder DB vision. Future work will involve expanding the schema, adding more relationship types, integrating algorithms into the graph, and developing more complex analytical queries.