https://github.com/kenken64/misoto-indexer
An AI-powered terminal application for intelligent code search and indexing using Spring AI and Qdrant vector databases.
https://github.com/kenken64/misoto-indexer
embedded ollama qdrant-vector-database spring spring-ai
Last synced: 1 day ago
JSON representation
An AI-powered terminal application for intelligent code search and indexing using Spring AI and Qdrant vector databases.
- Host: GitHub
- URL: https://github.com/kenken64/misoto-indexer
- Owner: kenken64
- Created: 2025-06-23T17:01:28.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-06-29T10:27:55.000Z (12 months ago)
- Last Synced: 2025-06-29T11:29:27.277Z (12 months ago)
- Topics: embedded, ollama, qdrant-vector-database, spring, spring-ai
- Language: Java
- Homepage:
- Size: 374 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Misoto Codebase Indexer
An AI-powered terminal application for intelligent code search and indexing using Spring AI and vector databases.
## Features
- π **Natural Language Search**: Search code using plain English queries
- π§ **Semantic Search**: Find conceptually similar code using AI embeddings
- π **Text Search**: Traditional keyword-based search
- βοΈ **Advanced Search**: Filter by file type, language, repository
- π **Intelligent Indexing**: AI-powered code analysis and indexing
- π **Detailed Status Tracking**: Real-time indexing progress and file type statistics
- πΎ **Persistent Caching**: Avoids re-indexing unchanged files
- π **Background Processing**: Non-blocking indexing with immediate search availability
## π Application Logic Flow
### **Hybrid Indexing Pipeline**
```mermaid
graph TD
A[π Application Start] --> B[π Initialize Qdrant Collection]
B --> C[π Set Indexing Directory]
C --> D[π Load File Cache]
D --> E[π Scan Directory Structure]
E --> F{π File Validation}
F -->|Supported Extension| G[β
Check Cache Status]
F -->|Unsupported Extension| H[π Track Skipped Extensions]
G -->|New/Modified| I[π Queue for Indexing]
G -->|Unchanged| J[βοΈ Skip Processing]
I --> K[π Phase 1: Priority Files]
K --> L[β‘ Virtual Thread Processing]
L --> M[π Raw Text Extraction]
M --> N[π€ nomic-embed-text Embedding]
N --> O[π 768D Vector Generation]
O --> P[βοΈ Qdrant Vector Storage]
P --> Q[πΎ Update Cache]
Q --> R[π Phase 2: Remaining Files]
R --> S[π Background Batch Processing]
S --> T[β
Indexing Complete]
H --> U[π Status Reporting]
J --> U
T --> U
```
### **Embedding Flow Architecture**
```
π Raw Text (from source files)
β
π€ nomic-embed-text (Ollama embedding model - 768 dimensions)
β
π Vector Representation (768-dimensional float array)
β
βοΈ Qdrant Cloud (vector database storage with metadata)
```
### **File Processing Strategy**
#### **Priority-Based Indexing**
1. **Phase 1 - Critical Files (Priority 1-5):**
- Controllers (`*Controller.java`) - Priority 1
- Services (`*Service.java`) - Priority 2
- Repositories (`*Repository.java`) - Priority 3
- Configuration (`*Config.java`) - Priority 4
- Applications (`*Application.java`) - Priority 5
2. **Phase 2 - Background Processing:**
- All remaining supported files
- Processed in batches using virtual threads
- Non-blocking execution
#### **Supported File Extensions**
| Category | Extensions | Purpose |
|----------|------------|---------|
| **Java Ecosystem** | `.java`, `.xml`, `.properties`, `.yml`, `.yaml`, `.json` | Core application files |
| **Documentation** | `.md`, `.txt`, `.st`, `.adoc` | Project documentation |
| **JVM Languages** | `.kt`, `.scala` | Kotlin and Scala source |
| **Database** | `.sql`, `.cql` | Database schemas and queries |
| **Web Technologies** | `.html`, `.css`, `.js`, `.ts`, `.jsp`, `.asp`, `.aspx`, `.php` | Frontend and web components |
| **System Scripts** | `.conf`, `.cmd`, `.sh`, `.ps1` | Configuration and automation |
| **Programming Languages** | `.py`, `.c`, `.cpp`, `.cs`, `.rb`, `.vb`, `.go`, `.swift`, `.lua`, `.pl`, `.r` | Multi-language support |
| **Documents** | `.pdf` | Documentation and specs |
### **Search Execution Flow**
```mermaid
graph LR
A[π Search Query] --> B{Search Type}
B -->|Natural Language| C[π€ Process with LLM]
B -->|Semantic| D[π§ Direct Vector Search]
B -->|Text| E[π Keyword Search]
C --> F[π Generate Search Context]
F --> G[π Vector Similarity Search]
D --> G
E --> H[π File Content Search]
G --> I[π Rank Results by Relevance]
H --> I
I --> J[π Format and Display Results]
```
### **Performance Optimizations**
- **Virtual Threads**: Concurrent processing for I/O-intensive operations
- **Persistent Cache**: Tracks file modification times to avoid re-indexing
- **Batch Processing**: Groups files for efficient processing
- **Priority Queuing**: Critical files indexed first for immediate search availability
- **Smart Chunking**: Large files split into manageable 3KB chunks with 500-character overlap
- **Background Execution**: Indexing runs asynchronously without blocking the CLI
### **Status Tracking & Metrics**
The application provides comprehensive real-time metrics:
- **π Progress**: Indexed vs. total files percentage
- **β±οΈ Timing**: Current duration, estimated completion time
- **π Performance**: Files per second processing speed
- **π§΅ Threading**: Active and peak virtual thread usage
- **π File Types**: Breakdown by extension and count
- **β οΈ Issues**: Failed and skipped file counts
- **π« Skipped Extensions**: Non-supported file types encountered
## Prerequisites
- Java 21+
- Maven 3.8+
- Ollama (for local AI models)
- Qdrant Cloud cluster (for vector search)
## π€ Ollama Model Setup
This application uses specialized AI models for embeddings and chat:
### Required Models:
- **nomic-embed-text**: High-quality embedding model (768 dimensions)
- **codellama:7b**: Code-aware chat model for intelligent analysis
### Quick Setup:
```bash
# Run the setup script (Windows)
setup-models.bat
# Or run manually:
ollama pull nomic-embed-text
ollama pull codellama:7b
```
### Linux/Mac Setup:
```bash
# Make script executable and run
chmod +x setup-models.sh
./setup-models.sh
```
### Why nomic-embed-text?
- **Optimized for text**: Better semantic understanding than code-specific models for embeddings
- **Efficient**: 768-dimensional vectors (vs 4096 for CodeLlama)
- **Fast**: Quicker indexing and search operations
- **Quality**: High-quality embeddings for code and documentation
## βοΈ Qdrant Cloud Setup
1. **Create Qdrant Cloud Account:**
- Go to [https://cloud.qdrant.io/](https://cloud.qdrant.io/)
- Sign up for a free account (includes 1GB storage)
2. **Create a Cluster:**
- Click "Create Cluster"
- Choose your preferred region
- Select the free tier
- Wait for cluster deployment
3. **Get Connection Details:**
- Copy your cluster URL (e.g., `https://xyz-123.qdrant.tech`)
- Generate an API key from the dashboard
4. **Update Configuration:**
```bash
# Copy the environment template
cp .env.example .env
# Edit .env file with your Qdrant details:
QDRANT_HOST=xyz-123.qdrant.tech
QDRANT_API_KEY=your-generated-api-key
```
5. **Dynamic Collection Naming:**
- **Codebase directory**: Creates collection `codebase-index-ollama`
- **Other directories**: Creates collection `codebase-index`
- Collection names are set automatically based on the directory being indexed
## π Quick Start Summary
1. **Install Ollama**
```bash
# Download and install Ollama from https://ollama.ai
# Or use curl (Linux/macOS):
curl -fsSL https://ollama.ai/install.sh | sh
```
2. **Pull CodeLlama Model**
```bash
ollama pull codellama:7b
```
3. **Clone and Build**
```bash
git clone
cd misoto-indexer
mvn clean compile
```
4. **Configure Environment Variables**
```bash
# Copy the environment template
cp .env.example .env
# Edit .env with your configuration:
# - Qdrant Cloud details (QDRANT_HOST, QDRANT_API_KEY)
# - Ollama configuration (OLLAMA_BASE_URL, models)
```
5. **Run the Application**
```bash
mvn spring-boot:run
# OR use the clean run script (recommended)
run-clean.bat
```
6. **Access Interactive CLI Menu**
The application will start with a clean interface directly to the menu:
```
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MISOTO CODEBASE INDEXER β
β Intelligent Code Search β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββ SEARCH MENU ββββββββββββββββββββ
β 1. [>] Search with Natural Language Prompt β
β 2. [i] Indexing Status β
β 3. [S] Semantic Code Search β
β 4. [T] Text Search β
β 5. [A] Advanced Search β
β 6. [I] Index Codebase β
β 7. [?] Help β
β 0. [X] Exit β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Detailed Menu Options
#### **1. π Natural Language Search**
Use conversational queries to find code with AI assistance:
**Example Queries:**
```
π Search Query: Find authentication logic
π Search Query: Show me REST API endpoints for user management
π Search Query: Classes that implement caching
π Search Query: Database connection configuration
π Search Query: JWT token validation
```
**How it works:**
- AI processes your natural language intent
- Converts to optimized search terms
- Returns ranked results with relevance scores
- Shows code snippets with context
#### **2. π Indexing Status**
Monitor real-time indexing progress and system performance:
```
ββββββββββββββββββ INDEXING STATUS ββββββββββββββββββ
β π Progress: 1,247 / 2,150 files (58.0%) β
β β±οΈ Duration: 45s | Estimated: 78s remaining β
β π Speed: 27.7 files/second β
β π§΅ Threads: 8 active, 12 peak β
β β
β π File Types Indexed: β
β β’ .java: 423 files β
β β’ .xml: 156 files β
β β’ .properties: 89 files β
β β’ .md: 67 files β
β β’ .kt: 45 files β
β β
β π« Skipped Extensions: .class (234), .jar (12) β
β β οΈ Failed: 3 files | Skipped: 456 files β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
**Status Information:**
- **Progress**: Percentage of files processed
- **Performance**: Files per second processing speed
- **Threading**: Virtual thread usage for optimal performance
- **File Breakdown**: Count by file type/extension
- **Issues**: Failed and skipped file tracking
#### **3. π§ Semantic Code Search**
Find conceptually similar code using vector embeddings:
**Example Usage:**
```
π§ Enter search query: database repository pattern
π― Similarity threshold (0.0-1.0) [0.7]: 0.8
π Max results [10]: 5
π Found 5 results (similarity > 0.8):
1. UserRepository.java (0.92) - Line 23
@Repository
public class UserRepository extends JpaRepository {
Optional findByUsername(String username);
}
2. ProductService.java (0.89) - Line 45
private final ProductRepository productRepository;
3. OrderRepository.java (0.85) - Line 12
public interface OrderRepository extends CrudRepository {
```
**Features:**
- Adjustable similarity threshold (0.0 to 1.0)
- Vector-based semantic matching
- Ranked results by relevance score
- Context-aware code snippets
#### **4. π Text Search**
Fast keyword-based search across all indexed files:
**Example Usage:**
```
π Enter search term: @RestController
π Case sensitive? [y/N]: n
π Max results [20]: 10
π Found 8 matches in 6 files:
1. UserController.java - Line 15
@RestController
@RequestMapping("/api/users")
public class UserController {
2. AuthController.java - Line 12
@RestController
@RequestMapping("/api/auth")
public class AuthController {
```
**Search Options:**
- Case-sensitive or insensitive matching
- Regular expression support
- File path filtering
- Configurable result limits
#### **5. βοΈ Advanced Search**
Combine multiple search criteria for precise results:
**Filter Options:**
```
βοΈ Advanced Search Configuration:
π File extensions: .java,.kt,.scala
π·οΈ File name pattern: *Service*
π Directory filter: src/main/java
π Content contains: @Transactional
π File size: 1KB - 100KB
π
Modified after: 2024-01-01
```
**Example Results:**
```
π Advanced Search Results (12 matches):
Filters Applied:
β
Extensions: .java, .kt
β
Pattern: *Service*
β
Content: @Transactional
β
Directory: src/main/java
1. UserService.java (src/main/java/service/)
@Transactional
public void updateUser(User user) { ... }
2. OrderService.kt (src/main/java/service/)
@Transactional
fun processOrder(order: Order) { ... }
```
#### **6. π Index Codebase**
Start or restart the indexing process:
**Options:**
```
π Codebase Indexing Options:
1. π Restart indexing (current directory)
2. π Change indexing directory
3. ποΈ Clear cache and reindex all files
4. βΈοΈ Pause/Resume indexing
5. π View indexing statistics
Current directory: /path/to/project/src
Indexed files: 1,247 | Cache entries: 1,189
```
**Directory Selection:**
```
π Select indexing directory:
Current: /project/src
1. π /project/src (current)
2. π /project/src/main/java
3. π /project/codebase
4. π Enter custom path
5. π Back to main menu
Enter choice [1-5]:
```
#### **7. β Help**
Comprehensive help and documentation:
```
ββββββββββββββββββββ HELP & TIPS ββββββββββββββββββββ
β β
β π SEARCH TIPS: β
β β’ Use specific terms: "JWT authentication" β
β β’ Try different phrasings if no results β
β β’ Combine keywords: "user repository database" β
β β
β π― SIMILARITY THRESHOLDS: β
β β’ 0.9-1.0: Very similar (exact matches) β
β β’ 0.7-0.9: Similar (related concepts) β
β β’ 0.5-0.7: Somewhat related β
β β’ 0.0-0.5: Loose associations β
β β
β π SUPPORTED FILE TYPES: β
β β’ Code: .java, .kt, .scala, .py, .js, .ts β
β β’ Config: .xml, .yml, .properties, .json β
β β’ Web: .html, .css, .jsp, .php β
β β’ Docs: .md, .txt, .adoc β
β β’ Scripts: .sh, .cmd, .ps1, .sql β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Search Examples & Best Practices
#### **Natural Language Search Examples**
| Query Type | Example | What it finds |
|------------|---------|---------------|
| **Functionality** | "user authentication" | Login methods, auth filters, JWT handling |
| **Architecture** | "repository pattern" | Data access objects, JPA repositories |
| **Error Handling** | "exception handling" | Try-catch blocks, error controllers |
| **Configuration** | "database configuration" | DataSource beans, connection properties |
| **API Endpoints** | "REST endpoints for users" | UserController methods, API routes |
| **Security** | "authorization logic" | Security configs, role-based access |
#### **Semantic Search Best Practices**
- **High Similarity (0.8-1.0)**: Find exact patterns and implementations
- **Medium Similarity (0.6-0.8)**: Find related concepts and similar logic
- **Low Similarity (0.4-0.6)**: Explore loosely related code
- **Use specific technical terms**: "repository", "controller", "service"
- **Combine concepts**: "user authentication JWT token"
#### **Text Search Tips**
- **Class names**: `UserService`, `@RestController`
- **Method names**: `findByUsername`, `authenticate`
- **Annotations**: `@Transactional`, `@Autowired`
- **Patterns**: Use wildcards like `find*` or `*Controller`
- **Regular expressions**: Enable regex for complex patterns
### Workflow Examples
#### **Example 1: Finding Authentication Code**
```
1. Start with Natural Language: "user authentication"
2. Review results, note relevant classes
3. Use Semantic Search: "JWT token validation" (similarity: 0.7)
4. Drill down with Text Search: "@PreAuthorize"
5. Use Advanced Search: Files containing "auth" in src/main/java
```
#### **Example 2: Understanding Data Access Layer**
```
1. Natural Language: "database repository pattern"
2. Semantic Search: "JPA repository" (similarity: 0.8)
3. Text Search: "extends JpaRepository"
4. Advanced Search: Filter by *.java files containing "@Repository"
```
#### **Example 3: API Endpoint Discovery**
```
1. Natural Language: "REST API endpoints"
2. Text Search: "@RestController"
3. Semantic Search: "HTTP GET POST endpoints" (similarity: 0.7)
4. Advanced Search: Files matching "*Controller.java"
```
### Performance & Monitoring
- **Real-time Status**: Check option 2 for live indexing progress
- **Search During Indexing**: Search works immediately, even while indexing
- **Cache Management**: System automatically manages file change detection
- **Background Processing**: Indexing doesn't block the interactive menu
- **Memory Efficient**: Virtual threads optimize resource usage
## Development
### Project Structure
```
src/main/java/sg/edu/nus/iss/codebase/indexer/
βββ IndexerApplication.java # Main Spring Boot application
βββ cli/
β βββ SearchCLI.java # Interactive command-line interface
β βββ command/ # Command Pattern implementation
β βββ Command.java # Command interface
β βββ IndexingStatusCommand.java # Status display command
βββ config/
β βββ EnvironmentConfig.java # Environment variable configuration
β βββ IndexingConfiguration.java # Centralized indexing configuration
β βββ QdrantCollectionInitializer.java # Vector database setup
β βββ VirtualThreadConfig.java # Async processing configuration
βββ controller/
β βββ SearchController.java # REST API endpoints (optional)
βββ dto/
β βββ SearchRequest.java # Data transfer objects
βββ model/
β βββ IndexingStatus.java # Status and metrics model
βββ service/
β βββ FileSearchService.java # File-based search implementation
β βββ HybridSearchService.java # Main search orchestration
β βββ impl/ # Service implementations
β β βββ DocumentFactoryManager.java # Factory manager
β β βββ FileCacheRepositoryImpl.java # Cache repository implementation
β β βββ FileIndexingServiceImpl.java # Core indexing service
β β βββ TextDocumentFactory.java # Text document factory
β β βββ search/ # Search strategy implementations
β β βββ SemanticSearchStrategy.java # Semantic search strategy
β βββ interfaces/ # Service interfaces
β βββ DocumentFactory.java # Factory pattern interface
β βββ FileCacheRepository.java # Repository pattern interface
β βββ FileIndexingService.java # Service interface
β βββ IndexingStatusObserver.java # Observer pattern interface
β βββ SearchStrategy.java # Strategy pattern interface
```
### Architecture & Design Patterns
## π Sequence Diagrams
### **Main Application Flow - Indexing and Search Operations**
```mermaid
sequenceDiagram
participant User
participant CLI as SearchCLI
participant IndexSvc as FileIndexingService
participant Config as IndexingConfiguration
participant Cache as FileCacheRepository
participant QdrantSvc as QdrantDocumentService
participant SearchSvc as HybridSearchService
participant SearchStrat as SearchStrategy
participant Observer as StatusObserver
Note over User,Observer: Application Startup & Initialization
User->>CLI: Start Application
CLI->>Config: Load Configuration
Config-->>CLI: Return config settings
CLI->>IndexSvc: Initialize indexing service
IndexSvc->>QdrantSvc: Initialize Qdrant collection
QdrantSvc-->>IndexSvc: Collection ready
CLI->>IndexSvc: Add CLI as status observer
Note over User,Observer: Directory Indexing Flow
User->>CLI: Select "Index Directory"
CLI->>User: Prompt for directory path
User->>CLI: Provide directory path
CLI->>IndexSvc: indexDirectory(path)
IndexSvc->>Cache: loadCache()
Cache-->>IndexSvc: Return cached file info
IndexSvc->>IndexSvc: scanDirectory(path)
IndexSvc->>Config: getSupportedExtensions()
Config-->>IndexSvc: Return extensions list
loop For each file in directory
IndexSvc->>Cache: isFileModified(file)
alt File is new or modified
Cache-->>IndexSvc: true
IndexSvc->>IndexSvc: queueForIndexing(file)
else File unchanged
Cache-->>IndexSvc: false
Note over IndexSvc: Skip processing
end
end
Note over IndexSvc,Observer: Background Processing with Status Updates
IndexSvc->>Observer: onStatusUpdate(status)
Observer->>CLI: displayStatusUpdate(status)
CLI->>User: Show progress information
loop Batch processing files
IndexSvc->>QdrantSvc: processFileChunks(file)
QdrantSvc->>QdrantSvc: extractText(file)
QdrantSvc->>QdrantSvc: generateEmbeddings(text)
QdrantSvc->>QdrantSvc: storeVectors(embeddings)
QdrantSvc-->>IndexSvc: Processing complete
IndexSvc->>Cache: updateFileCache(file)
IndexSvc->>Observer: onStatusUpdate(updatedStatus)
Observer->>CLI: displayStatusUpdate(updatedStatus)
end
IndexSvc->>Observer: onIndexingComplete(finalStatus)
Observer->>CLI: displayCompletionMessage()
CLI->>User: Show indexing completed
Note over User,Observer: Search Operation Flow
User->>CLI: Select search option
CLI->>User: Prompt for search query
User->>CLI: Provide search query and type
CLI->>SearchSvc: search(searchRequest)
SearchSvc->>SearchStrat: findStrategy(searchType)
SearchStrat-->>SearchSvc: Return appropriate strategy
alt Semantic Search
SearchSvc->>SearchStrat: search(semanticQuery)
SearchStrat->>QdrantSvc: vectorSearch(queryEmbedding)
QdrantSvc-->>SearchStrat: Return vector results
else Text Search
SearchSvc->>SearchStrat: search(textQuery)
SearchStrat->>SearchStrat: performTextSearch(query)
else Natural Language Search
SearchSvc->>SearchStrat: search(nlQuery)
SearchStrat->>SearchStrat: processNaturalLanguageQuery()
SearchStrat->>SearchStrat: delegateToSemanticSearch()
end
SearchStrat-->>SearchSvc: Return search results
SearchSvc->>SearchSvc: rankAndMergeResults()
SearchSvc-->>CLI: Return formatted results
CLI->>User: Display search results
Note over User,Observer: Status Monitoring Flow
User->>CLI: Select "View Status"
CLI->>IndexSvc: getIndexingStatus()
IndexSvc->>IndexSvc: calculateCurrentMetrics()
IndexSvc->>QdrantSvc: getCollectionInfo()
QdrantSvc-->>IndexSvc: Return collection stats
IndexSvc-->>CLI: Return status information
CLI->>CLI: formatStatusDisplay()
CLI->>User: Display detailed status
```
### **Error Handling and Recovery Flow**
```mermaid
sequenceDiagram
participant CLI as SearchCLI
participant IndexSvc as FileIndexingService
participant QdrantSvc as QdrantDocumentService
participant Cache as FileCacheRepository
participant Observer as StatusObserver
Note over CLI,Observer: Error Scenarios and Recovery
CLI->>IndexSvc: indexDirectory(invalidPath)
IndexSvc->>IndexSvc: validateDirectory(path)
alt Directory doesn't exist
IndexSvc-->>CLI: DirectoryNotFoundException
CLI->>CLI: handleDirectoryError()
CLI->>CLI: showErrorMessage()
CLI->>CLI: promptForValidPath()
end
IndexSvc->>QdrantSvc: processFile(corruptedFile)
QdrantSvc->>QdrantSvc: extractText(file)
alt File processing fails
QdrantSvc-->>IndexSvc: FileProcessingException
IndexSvc->>IndexSvc: incrementFailedCount()
IndexSvc->>Cache: markFileAsFailed(file)
IndexSvc->>Observer: onStatusUpdate(statusWithError)
Observer->>CLI: displayErrorInStatus()
end
CLI->>QdrantSvc: search(query)
QdrantSvc->>QdrantSvc: performVectorSearch()
alt Qdrant collection not found
QdrantSvc-->>CLI: QdrantException("Collection not found")
CLI->>CLI: handleQdrantError()
CLI->>CLI: showNoIndexMessage()
CLI->>CLI: promptForIndexing()
end
IndexSvc->>Cache: loadCache()
Cache->>Cache: readCacheFile()
alt Cache file corrupted
Cache-->>IndexSvc: CacheCorruptedException
IndexSvc->>Cache: rebuildCache()
IndexSvc->>Observer: onStatusUpdate(rebuildingStatus)
Observer->>CLI: showCacheRebuildMessage()
end
```
## ποΈ Deployment Diagrams
### **System Architecture and Component Deployment**
```mermaid
graph TB
subgraph "User Environment π»"
USER[π€ Developer]
TERMINAL[π₯οΈ Terminal/CLI]
IDE[π» IDE/VS Code]
WORKSPACE[π Code Workspace]
end
subgraph "Local Machine π₯οΈ"
subgraph "Spring Boot Application π"
CLI_APP[ποΈ SearchCLI
Interactive Menu]
SPRING_BOOT[βοΈ Spring Boot Container
Port: 8080]
subgraph "Service Layer π§"
INDEX_SVC[π FileIndexingService
Virtual Thread Pool]
SEARCH_SVC[π HybridSearchService
Multi-Strategy Search]
CACHE_SVC[πΎ FileCacheRepository
Local File Cache]
end
subgraph "Configuration π"
CONFIG[βοΈ IndexingConfiguration
application.properties]
ENV_CONFIG[π Environment Variables
.env file]
end
end
subgraph "Ollama AI Platform π€"
OLLAMA_SERVER[π€ Ollama Server
Port: 11434]
EMBEDDING_MODEL[π nomic-embed-text
768D Embeddings]
CHAT_MODEL[π¬ codellama:7b
Natural Language]
end
subgraph "Local Storage πΎ"
FILE_CACHE[π File Cache
.indexer-cache.json]
LOG_FILES[π Application Logs
logs/]
TEMP_FILES[ποΈ Temporary Files
temp/]
end
end
subgraph "Cloud Infrastructure βοΈ"
subgraph "Qdrant Cloud π"
QDRANT_CLUSTER[ποΈ Qdrant Vector DB
Cloud Cluster]
VECTOR_STORE[π Vector Collections
768D Embeddings]
METADATA_STORE[π Document Metadata
File Paths & Content]
end
end
subgraph "External Resources π"
GITHUB[π GitHub Repositories
Source Code]
DOCS[π Documentation
Markdown/Text Files]
CONFIG_FILES[βοΈ Configuration Files
YAML/Properties/JSON]
end
%% User Interactions
USER --> TERMINAL
USER --> IDE
TERMINAL --> CLI_APP
IDE --> WORKSPACE
%% Application Flow
CLI_APP --> INDEX_SVC
CLI_APP --> SEARCH_SVC
INDEX_SVC --> CACHE_SVC
SEARCH_SVC --> INDEX_SVC
%% Configuration
SPRING_BOOT --> CONFIG
CONFIG --> ENV_CONFIG
%% AI Model Integration
INDEX_SVC -.->|HTTP/REST| OLLAMA_SERVER
SEARCH_SVC -.->|HTTP/REST| OLLAMA_SERVER
OLLAMA_SERVER --> EMBEDDING_MODEL
OLLAMA_SERVER --> CHAT_MODEL
%% Vector Database
INDEX_SVC -.->|HTTPS/gRPC| QDRANT_CLUSTER
SEARCH_SVC -.->|HTTPS/gRPC| QDRANT_CLUSTER
QDRANT_CLUSTER --> VECTOR_STORE
QDRANT_CLUSTER --> METADATA_STORE
%% Local Storage
CACHE_SVC --> FILE_CACHE
SPRING_BOOT --> LOG_FILES
INDEX_SVC --> TEMP_FILES
%% Data Sources
WORKSPACE --> GITHUB
WORKSPACE --> DOCS
WORKSPACE --> CONFIG_FILES
INDEX_SVC --> WORKSPACE
%% Styling
style USER fill:#e1f5fe
style CLI_APP fill:#f3e5f5
style SPRING_BOOT fill:#e8f5e8
style OLLAMA_SERVER fill:#fff3e0
style QDRANT_CLUSTER fill:#fce4ec
style WORKSPACE fill:#f1f8e9
```
### **Network Communication and Data Flow**
```mermaid
graph LR
subgraph "Local Development Environment"
subgraph "Spring Boot Application:8080"
CLI[ποΈ CLI Interface]
APP[π Spring Boot App]
CACHE[πΎ Local Cache]
end
subgraph "Ollama AI:11434"
OLLAMA[π€ Ollama API]
MODELS[π AI Models]
end
end
subgraph "Cloud Services"
QDRANT[βοΈ Qdrant Cloud
443/HTTPS]
CDN[π Model CDN
Ollama Registry]
end
subgraph "File System"
WORKSPACE[π Code Workspace]
CACHE_FILE[π .indexer-cache.json]
LOGS[π Application Logs]
end
%% API Communications
CLI -.->|REST API| APP
APP -.->|HTTP POST
Embeddings| OLLAMA
APP -.->|HTTPS
Vector Ops| QDRANT
%% Data Persistence
APP --> CACHE_FILE
APP --> LOGS
APP --> WORKSPACE
CACHE --> CACHE_FILE
%% Model Management
OLLAMA -.->|Model Download
HTTPS| CDN
%% Data Flow Labels
APP -.->|"π Store Vectors
π Query Metadata"| QDRANT
OLLAMA -.->|"π 768D Embeddings
π¬ Chat Responses"| APP
WORKSPACE -.->|"π File Content
π Directory Scan"| APP
```
### **Deployment Architecture by Environment**
```mermaid
graph TB
subgraph "Development Environment π οΈ"
subgraph "Developer Workstation"
DEV_IDE[π» IDE/Terminal]
DEV_SPRING[π Spring Boot Dev]
DEV_OLLAMA[π€ Ollama Local]
DEV_CACHE[πΎ Local Cache]
end
DEV_IDE --> DEV_SPRING
DEV_SPRING --> DEV_OLLAMA
DEV_SPRING --> DEV_CACHE
DEV_SPRING -.->|HTTPS| QDRANT_DEV[βοΈ Qdrant Dev Cluster]
end
subgraph "Production Environment π"
subgraph "Production Server"
PROD_CLI[ποΈ Production CLI]
PROD_SPRING[βοΈ Spring Boot Prod]
PROD_OLLAMA[π€ Ollama Server]
PROD_CACHE[πΎ Persistent Cache]
PROD_LOGS[π Centralized Logs]
end
PROD_CLI --> PROD_SPRING
PROD_SPRING --> PROD_OLLAMA
PROD_SPRING --> PROD_CACHE
PROD_SPRING --> PROD_LOGS
PROD_SPRING -.->|HTTPS| QDRANT_PROD[βοΈ Qdrant Prod Cluster]
end
subgraph "CI/CD Environment π"
subgraph "Build Pipeline"
CI_BUILD[π¨ Maven Build]
CI_TEST[π§ͺ Unit Tests]
CI_PACKAGE[π¦ JAR Package]
CI_DEPLOY[π Deployment]
end
CI_BUILD --> CI_TEST
CI_TEST --> CI_PACKAGE
CI_PACKAGE --> CI_DEPLOY
CI_DEPLOY -.-> PROD_SPRING
end
subgraph "Monitoring & Observability π"
METRICS[π Application Metrics]
HEALTH[π Health Checks]
ALERTS[π¨ Alert System]
PROD_SPRING --> METRICS
PROD_SPRING --> HEALTH
HEALTH --> ALERTS
end
%% Environment Connections
DEV_SPRING -.->|"Promote to Prod"| CI_BUILD
METRICS -.->|"Feedback"| DEV_SPRING
```
### **Security and Access Control**
```mermaid
graph TB
subgraph "Security Layers π"
subgraph "Authentication & Authorization"
ENV_VARS[π Environment Variables
API Keys & Secrets]
API_KEYS[ποΈ Qdrant API Key
Encrypted Storage]
SSL_CERTS[π SSL Certificates
HTTPS/TLS 1.3]
end
subgraph "Network Security"
FIREWALL[π‘οΈ Local Firewall
Port Restrictions]
VPN[π VPN Connection
Secure Tunneling]
RATE_LIMIT[β±οΈ Rate Limiting
API Call Throttling]
end
subgraph "Data Protection"
ENCRYPTION[π Data Encryption
At Rest & In Transit]
BACKUP[πΎ Encrypted Backups
Cache & Logs]
AUDIT[π Audit Logging
Access Tracking]
end
end
subgraph "Application Security"
INPUT_VALID[β
Input Validation
Search Queries]
ERROR_HANDLE[π« Error Handling
No Data Leakage]
SECURE_CONFIG[βοΈ Secure Configuration
Default Deny]
end
%% Security Flow
ENV_VARS --> API_KEYS
API_KEYS --> SSL_CERTS
SSL_CERTS --> ENCRYPTION
FIREWALL --> VPN
VPN --> RATE_LIMIT
ENCRYPTION --> BACKUP
BACKUP --> AUDIT
INPUT_VALID --> ERROR_HANDLE
ERROR_HANDLE --> SECURE_CONFIG
%% Cross-cutting Security
ENV_VARS -.-> INPUT_VALID
RATE_LIMIT -.-> ERROR_HANDLE
AUDIT -.-> SECURE_CONFIG
```
### **Scalability and Performance Architecture**
```mermaid
graph TB
subgraph "Performance Optimization π"
subgraph "Concurrent Processing"
VIRTUAL_THREADS[π§΅ Virtual Threads
JDK 21 Fibers]
THREAD_POOL[πββοΈ Thread Pool
Configurable Size]
ASYNC_PROC[β‘ Async Processing
Non-blocking I/O]
end
subgraph "Caching Strategy"
L1_CACHE[πΎ L1: Memory Cache
Hot Data]
L2_CACHE[π L2: File Cache
Persistent Storage]
L3_CACHE[βοΈ L3: Vector Cache
Qdrant Optimization]
end
subgraph "Resource Management"
MEMORY_OPT[π§ Memory Optimization
JVM Tuning]
DISK_OPT[πΏ Disk Optimization
Sequential I/O]
NETWORK_OPT[π Network Optimization
Connection Pooling]
end
end
subgraph "Scaling Capabilities π"
subgraph "Horizontal Scaling"
LOAD_BALANCE[βοΈ Load Balancing
Multiple Instances]
DISTRIBUTED[π Distributed Processing
Cluster Mode]
QUEUE[π Job Queuing
Background Tasks]
end
subgraph "Vertical Scaling"
CPU_SCALE[β‘ CPU Scaling
Multi-core Usage]
RAM_SCALE[π§ Memory Scaling
Heap Optimization]
STORAGE_SCALE[πΎ Storage Scaling
SSD Performance]
end
end
%% Performance Connections
VIRTUAL_THREADS --> ASYNC_PROC
ASYNC_PROC --> THREAD_POOL
L1_CACHE --> L2_CACHE
L2_CACHE --> L3_CACHE
MEMORY_OPT --> DISK_OPT
DISK_OPT --> NETWORK_OPT
%% Scaling Connections
LOAD_BALANCE --> DISTRIBUTED
DISTRIBUTED --> QUEUE
CPU_SCALE --> RAM_SCALE
RAM_SCALE --> STORAGE_SCALE
%% Cross-cutting Optimizations
VIRTUAL_THREADS -.-> CPU_SCALE
L1_CACHE -.-> RAM_SCALE
NETWORK_OPT -.-> DISTRIBUTED
```
These deployment diagrams provide a comprehensive view of:
1. **System Architecture**: Complete component deployment across user environment, local machine, and cloud infrastructure
2. **Network Communication**: Data flow and API communications between services
3. **Multi-Environment Support**: Development, production, and CI/CD pipeline architectures
4. **Security Architecture**: Comprehensive security layers and access controls
5. **Performance & Scalability**: Optimization strategies and scaling capabilities
The diagrams show how the Misoto Codebase Indexer integrates with:
- **Local Development Tools**: IDEs, terminals, and file systems
- **AI Platforms**: Ollama for embeddings and natural language processing
- **Cloud Services**: Qdrant Cloud for vector storage and search
- **Infrastructure**: Security, monitoring, and deployment pipelines
## π₯ Use Case Diagrams
### **Primary Use Cases and Actor Interactions**
```mermaid
graph TB
subgraph "Misoto Codebase Indexer System"
subgraph "Search Use Cases π"
UC1[Search Code with Natural Language]
UC2[Perform Semantic Code Search]
UC3[Execute Text-based Search]
UC4[Advanced Multi-filter Search]
UC5[Browse Search Results]
UC6[Export Search Results]
end
subgraph "Indexing Use Cases π"
UC7[Index Codebase Directory]
UC8[Monitor Indexing Progress]
UC9[Configure Indexing Settings]
UC10[Manage File Cache]
UC11[Handle Indexing Errors]
UC12[Validate File Types]
end
subgraph "Configuration Use Cases βοΈ"
UC13[Setup AI Models - Ollama]
UC14[Configure Vector Database - Qdrant]
UC15[Manage Environment Variables]
UC16[Customize File Priorities]
UC17[Set Performance Parameters]
end
subgraph "Monitoring Use Cases π"
UC18[View System Status]
UC19[Track Performance Metrics]
UC20[Monitor Resource Usage]
UC21[Handle System Errors]
UC22[Generate Status Reports]
end
subgraph "Management Use Cases π§"
UC23[Clear System Cache]
UC24[Restart Indexing Process]
UC25[Change Target Directory]
UC26[Backup/Restore Index Data]
UC27[Update System Configuration]
end
end
subgraph "External Systems π"
EXT1[Ollama AI Platform]
EXT2[Qdrant Cloud Service]
EXT3[File System]
EXT4[Git Repositories]
EXT5[IDE Integration]
end
subgraph "Actors π₯"
DEV[π¨βπ» Software Developer]
ADMIN[π¨βπ§ System Administrator]
ANALYST[π¨βπΌ Code Analyst]
RESEARCHER[π©βπ¬ Researcher]
TEAM_LEAD[π¨βπΌ Team Lead]
end
%% Developer Use Cases
DEV --> UC1
DEV --> UC2
DEV --> UC3
DEV --> UC4
DEV --> UC5
DEV --> UC7
DEV --> UC8
DEV --> UC25
%% System Administrator Use Cases
ADMIN --> UC9
ADMIN --> UC10
ADMIN --> UC13
ADMIN --> UC14
ADMIN --> UC15
ADMIN --> UC16
ADMIN --> UC17
ADMIN --> UC23
ADMIN --> UC24
ADMIN --> UC26
ADMIN --> UC27
%% Code Analyst Use Cases
ANALYST --> UC1
ANALYST --> UC2
ANALYST --> UC4
ANALYST --> UC6
ANALYST --> UC18
ANALYST --> UC22
%% Researcher Use Cases
RESEARCHER --> UC2
RESEARCHER --> UC4
RESEARCHER --> UC6
RESEARCHER --> UC19
RESEARCHER --> UC22
%% Team Lead Use Cases
TEAM_LEAD --> UC18
TEAM_LEAD --> UC19
TEAM_LEAD --> UC20
TEAM_LEAD --> UC22
TEAM_LEAD --> UC27
%% System Dependencies
UC1 -.-> EXT1
UC2 -.-> EXT1
UC2 -.-> EXT2
UC7 -.-> EXT3
UC7 -.-> EXT4
UC13 -.-> EXT1
UC14 -.-> EXT2
UC25 -.-> EXT3
UC5 -.-> EXT5
%% Use Case Relationships
UC7 --> UC8
UC8 --> UC11
UC9 --> UC16
UC9 --> UC17
UC13 --> UC14
UC18 --> UC19
UC19 --> UC20
UC23 --> UC24
%% Styling
style DEV fill:#e1f5fe
style ADMIN fill:#f3e5f5
style ANALYST fill:#e8f5e8
style RESEARCHER fill:#fff3e0
style TEAM_LEAD fill:#fce4ec
```
### **Detailed Use Case Scenarios**
```mermaid
graph LR
subgraph "Search Workflow π"
subgraph "Natural Language Search"
NL1[Enter Query: Find authentication logic]
NL2[AI Processing: Query Understanding]
NL3[Context Generation: Search Terms]
NL4[Vector Search: Semantic Matching]
NL5[Results Ranking: Relevance Scoring]
NL6[Display Results: Code Snippets]
NL1 --> NL2 --> NL3 --> NL4 --> NL5 --> NL6
end
subgraph "Semantic Search"
SEM1[Enter Technical Query: repository pattern]
SEM2[Set Similarity: Threshold 0.7]
SEM3[Generate Embeddings: 768D Vectors]
SEM4[Vector Similarity: Search in Qdrant]
SEM5[Filter Results: By Similarity Score]
SEM6[Present Matches: With Context]
SEM1 --> SEM2 --> SEM3 --> SEM4 --> SEM5 --> SEM6
end
subgraph "Text Search"
TXT1[Enter Keywords: @RestController]
TXT2[Configure Options: Case Sensitivity]
TXT3[Scan Files: Pattern Matching]
TXT4[Collect Matches: Line Numbers]
TXT5[Format Output: File Locations]
TXT1 --> TXT2 --> TXT3 --> TXT4 --> TXT5
end
end
subgraph "Indexing Workflow π"
subgraph "Initial Indexing"
IDX1[Select Directory: Choose Codebase]
IDX2[Scan Structure: File Discovery]
IDX3[Validate Files: Extension Check]
IDX4[Priority Sorting: Critical Files First]
IDX5[Batch Processing: Virtual Threads]
IDX6[Vector Generation: Embedding Creation]
IDX7[Store Vectors: Qdrant Upload]
IDX8[Update Cache: File Tracking]
IDX1 --> IDX2 --> IDX3 --> IDX4 --> IDX5 --> IDX6 --> IDX7 --> IDX8
end
subgraph "Incremental Indexing"
INC1[Monitor Changes: File Modification]
INC2[Check Cache: Comparison]
INC3[Queue Updates: Modified Files]
INC4[Background Process: Non-blocking]
INC5[Merge Vectors: Update Collection]
INC1 --> INC2 --> INC3 --> INC4 --> INC5
end
end
subgraph "Configuration Workflow βοΈ"
subgraph "System Setup"
CFG1[Install Ollama: AI Platform]
CFG2[Download Models: nomic-embed-text]
CFG3[Setup Qdrant: Cloud Account]
CFG4[Configure API: Keys & URLs]
CFG5[Test Connection: Verify Setup]
CFG1 --> CFG2 --> CFG3 --> CFG4 --> CFG5
end
subgraph "Performance Tuning"
PERF1[Set Thread Pool: Virtual Threads]
PERF2[Configure Cache: Size & Location]
PERF3[Adjust Batch Size: Processing Groups]
PERF4[Set Timeouts: Network Calls]
PERF5[Monitor Metrics: Performance Check]
PERF1 --> PERF2 --> PERF3 --> PERF4 --> PERF5
end
end
```
### **Actor Responsibilities and Permissions**
```mermaid
graph TB
subgraph "Role-Based Access Control π"
subgraph "Software Developer π¨βπ»"
DEV_PERM[Permissions:]
DEV_P1[β’ Search all indexed code]
DEV_P2[β’ View search results]
DEV_P3[β’ Index personal projects]
DEV_P4[β’ Monitor indexing status]
DEV_P5[β’ Change target directories]
DEV_P6[β’ Export search results]
DEV_PERM --> DEV_P1
DEV_PERM --> DEV_P2
DEV_PERM --> DEV_P3
DEV_PERM --> DEV_P4
DEV_PERM --> DEV_P5
DEV_PERM --> DEV_P6
end
subgraph "System Administrator π¨βπ§"
ADMIN_PERM[Permissions:]
ADMIN_P1[β’ Full system configuration]
ADMIN_P2[β’ Manage AI model setup]
ADMIN_P3[β’ Configure Qdrant connection]
ADMIN_P4[β’ Set performance parameters]
ADMIN_P5[β’ Clear system cache]
ADMIN_P6[β’ Backup/restore data]
ADMIN_P7[β’ Monitor system health]
ADMIN_P8[β’ Manage user access]
ADMIN_PERM --> ADMIN_P1
ADMIN_PERM --> ADMIN_P2
ADMIN_PERM --> ADMIN_P3
ADMIN_PERM --> ADMIN_P4
ADMIN_PERM --> ADMIN_P5
ADMIN_PERM --> ADMIN_P6
ADMIN_PERM --> ADMIN_P7
ADMIN_PERM --> ADMIN_P8
end
subgraph "Code Analyst π¨βπΌ"
ANALYST_PERM[Permissions:]
ANALYST_P1[β’ Advanced search features]
ANALYST_P2[β’ Generate analysis reports]
ANALYST_P3[β’ Export detailed results]
ANALYST_P4[β’ Access metrics dashboard]
ANALYST_P5[β’ Configure search filters]
ANALYST_P6[β’ View system statistics]
ANALYST_PERM --> ANALYST_P1
ANALYST_PERM --> ANALYST_P2
ANALYST_PERM --> ANALYST_P3
ANALYST_PERM --> ANALYST_P4
ANALYST_PERM --> ANALYST_P5
ANALYST_PERM --> ANALYST_P6
end
subgraph "Researcher π©βπ¬"
RESEARCHER_PERM[Permissions:]
RESEARCHER_P1[β’ Semantic search access]
RESEARCHER_P2[β’ Pattern analysis tools]
RESEARCHER_P3[β’ Research data export]
RESEARCHER_P4[β’ Custom query building]
RESEARCHER_P5[β’ Similarity threshold tuning]
RESEARCHER_PERM --> RESEARCHER_P1
RESEARCHER_PERM --> RESEARCHER_P2
RESEARCHER_PERM --> RESEARCHER_P3
RESEARCHER_PERM --> RESEARCHER_P4
RESEARCHER_PERM --> RESEARCHER_P5
end
subgraph "Team Lead π¨βπΌ"
LEAD_PERM[Permissions:]
LEAD_P1[β’ Team usage monitoring]
LEAD_P2[β’ Performance oversight]
LEAD_P3[β’ Resource planning]
LEAD_P4[β’ Usage reports generation]
LEAD_P5[β’ Configuration approval]
LEAD_PERM --> LEAD_P1
LEAD_PERM --> LEAD_P2
LEAD_PERM --> LEAD_P3
LEAD_PERM --> LEAD_P4
LEAD_PERM --> LEAD_P5
end
end
subgraph "Common Use Cases π"
COMMON[All Users Can:]
COMMON_P1[β’ View help documentation]
COMMON_P2[β’ Access basic search]
COMMON_P3[β’ See indexing status]
COMMON_P4[β’ Use interactive CLI]
COMMON --> COMMON_P1
COMMON --> COMMON_P2
COMMON --> COMMON_P3
COMMON --> COMMON_P4
end
```
### **System Integration Use Cases**
```mermaid
graph TB
subgraph "External System Integrations π"
subgraph "Ollama AI Integration"
OLL1[Install Ollama Platform]
OLL2[Download AI Models]
OLL3[Start Ollama Service]
OLL4[Generate Embeddings]
OLL5[Process Natural Language]
OLL6[Monitor AI Performance]
OLL1 --> OLL2 --> OLL3
OLL3 --> OLL4
OLL3 --> OLL5
OLL4 --> OLL6
OLL5 --> OLL6
end
subgraph "Qdrant Cloud Integration"
QDR1[Create Qdrant Account]
QDR2[Setup Cloud Cluster]
QDR3[Configure API Access]
QDR4[Initialize Collections]
QDR5[Store Vector Data]
QDR6[Perform Vector Search]
QDR7[Manage Collection Metadata]
QDR1 --> QDR2 --> QDR3 --> QDR4
QDR4 --> QDR5
QDR4 --> QDR6
QDR5 --> QDR7
QDR6 --> QDR7
end
subgraph "File System Integration"
FS1[Access Local Directories]
FS2[Read Source Code Files]
FS3[Monitor File Changes]
FS4[Cache File Metadata]
FS5[Handle File Permissions]
FS6[Manage Temporary Files]
FS1 --> FS2 --> FS3
FS2 --> FS4
FS3 --> FS4
FS1 --> FS5
FS2 --> FS6
end
subgraph "IDE Integration"
IDE1[VS Code Extension]
IDE2[IntelliJ Plugin]
IDE3[Search Result Display]
IDE4[Code Navigation]
IDE5[Context Menu Integration]
IDE1 --> IDE3 --> IDE4
IDE2 --> IDE3 --> IDE4
IDE3 --> IDE5
end
end
subgraph "Actor Interactions with External Systems π₯π"
DEV_EXT[Developer] --> OLL4
DEV_EXT --> QDR6
DEV_EXT --> FS2
DEV_EXT --> IDE3
ADMIN_EXT[Administrator] --> OLL1
ADMIN_EXT --> QDR2
ADMIN_EXT --> FS5
ANALYST_EXT[Analyst] --> QDR7
ANALYST_EXT --> FS4
ANALYST_EXT --> IDE4
RESEARCHER_EXT[Researcher] --> OLL5
RESEARCHER_EXT --> QDR6
LEAD_EXT[Team Lead] --> OLL6
LEAD_EXT --> QDR7
end
```
### **Error Handling and Recovery Use Cases**
```mermaid
graph LR
subgraph "Error Scenarios and Recovery π¨"
subgraph "Indexing Errors"
ERR1[File Access Denied]
ERR2[Corrupted File Content]
ERR3[Network Connection Lost]
ERR4[Qdrant Service Unavailable]
ERR5[Ollama Model Not Found]
ERR6[Insufficient Disk Space]
REC1[Retry with Permissions]
REC2[Skip and Log Error]
REC3[Queue for Retry]
REC4[Switch to Offline Mode]
REC5[Download Missing Model]
REC6[Clean Temporary Files]
ERR1 --> REC1
ERR2 --> REC2
ERR3 --> REC3
ERR4 --> REC4
ERR5 --> REC5
ERR6 --> REC6
end
subgraph "Search Errors"
SERR1[No Results Found]
SERR2[Search Timeout]
SERR3[Invalid Query Syntax]
SERR4[Collection Not Initialized]
SERR5[AI Model Overloaded]
SREC1[Suggest Alternative Queries]
SREC2[Extend Timeout Period]
SREC3[Provide Query Examples]
SREC4[Initialize Collection]
SREC5[Queue Request for Retry]
SERR1 --> SREC1
SERR2 --> SREC2
SERR3 --> SREC3
SERR4 --> SREC4
SERR5 --> SREC5
end
subgraph "System Errors"
SYSERR1[Configuration Missing]
SYSERR2[Cache Corruption]
SYSERR3[Memory Overflow]
SYSERR4[Thread Pool Exhaustion]
SYSREC1[Load Default Config]
SYSREC2[Rebuild Cache]
SYSREC3[Restart with More Memory]
SYSREC4[Scale Thread Pool]
SYSERR1 --> SYSREC1
SYSERR2 --> SYSREC2
SYSERR3 --> SYSREC3
SYSERR4 --> SYSREC4
end
end
```
The codebase has been refactored to implement several design patterns for better maintainability and extensibility:
- **Command Pattern**: For encapsulating indexing status commands
- **Strategy Pattern**: To define and switch between search algorithms
- **Observer Pattern**: For notifying status updates to the CLI
- **Factory Pattern**: To create document instances for indexing
- **Repository Pattern**: For abstracting file cache operations
- **Dependency Injection**: Using Spring's @Autowired for service dependencies
### **Command Pattern Example**
```java
// Command.java - Command interface
public interface Command {
void execute();
}
// IndexingStatusCommand.java - Concrete command
public class IndexingStatusCommand implements Command {
private final IndexingService indexingService;
public IndexingStatusCommand(IndexingService indexingService) {
this.indexingService = indexingService;
}
@Override
public void execute() {
indexingService.displayStatus();
}
}
```
### **Strategy Pattern Example**
```java
// SearchStrategy.java - Strategy interface
public interface SearchStrategy {
List search(String query, double threshold);
}
// SemanticSearchStrategy.java - Concrete strategy
public class SemanticSearchStrategy implements SearchStrategy {
@Override
public List search(String query, double threshold) {
// Implementation for semantic search using embeddings
}
}
```
### **Observer Pattern Example**
```java
// IndexingStatusObserver.java - Observer interface
public interface IndexingStatusObserver {
void onStatusUpdate(IndexingStatus status);
}
// CLI.java - Concrete observer
public class CLI implements IndexingStatusObserver {
@Override
public void onStatusUpdate(IndexingStatus status) {
displayStatus(status);
}
}
```
### **Factory Pattern Example**
```java
// DocumentFactory.java - Factory interface
public interface DocumentFactory {
Document createDocument(File file);
}
// TextDocumentFactory.java - Concrete factory
public class TextDocumentFactory implements DocumentFactory {
@Override
public Document createDocument(File file) {
return new TextDocument(file);
}
}
```
### **Repository Pattern Example**
```java
// FileCacheRepository.java - Repository interface
public interface FileCacheRepository {
void save(FileCacheEntry entry);
FileCacheEntry find(String filePath);
}
// FileCacheRepositoryImpl.java - Repository implementation
public class FileCacheRepositoryImpl implements FileCacheRepository {
@Override
public void save(FileCacheEntry entry) {
// Save to cache
}
@Override
public FileCacheEntry find(String filePath) {
// Find from cache
}
}
```
### **Dependency Injection Example**
```java
// SearchService.java - Service with dependencies
@Service
public class SearchService {
private final FileSearchService fileSearchService;
private final HybridSearchService hybridSearchService;
@Autowired
public SearchService(FileSearchService fileSearchService, HybridSearchService hybridSearchService) {
this.fileSearchService = fileSearchService;
this.hybridSearchService = hybridSearchService;
}
public List search(String query) {
// Use injected services to perform search
}
}
```