{"id":28280355,"url":"https://github.com/mocksi/json-rag","last_synced_at":"2025-06-16T21:32:58.833Z","repository":{"id":268457663,"uuid":"904420420","full_name":"Mocksi/json-rag","owner":"Mocksi","description":"Reference implementation for chunking nested JSON into RAG-friendly document structures","archived":false,"fork":false,"pushed_at":"2025-04-04T00:03:16.000Z","size":1026,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-13T23:23:47.361Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mocksi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-16T21:19:51.000Z","updated_at":"2025-04-28T17:12:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"fba89cb6-8358-4d4e-940a-5f0675278cc7","html_url":"https://github.com/Mocksi/json-rag","commit_stats":null,"previous_names":["mocksi/json-rag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Mocksi/json-rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mocksi%2Fjson-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mocksi%2Fjson-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mocksi%2Fjson-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mocksi%2Fjson-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mocksi","download_url":"https://codeload.github.com/Mocksi/json-rag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mocksi%2Fjson-rag/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260244938,"owners_count":22980104,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-21T10:16:43.530Z","updated_at":"2025-06-16T21:32:58.822Z","avatar_url":"https://github.com/Mocksi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# JSON RAG Integration\n\nA tool for efficiently loading and integrating nested JSON data structures into RAG (Retrieval-Augmented Generation) systems, with enhanced entity tracking, relationship detection, and context preservation.\n\n## Key Features\n\n* **Advanced Query Understanding**:\n  - Temporal patterns (exact dates, relative ranges, named periods)\n  - Metric aggregations (average, maximum, minimum, sum, count)\n  - Entity relationships (direct, semantic, and cross-file connections)\n  - State transitions and system conditions\n  - Hybrid search combining vector similarity, relationships, and filters\n\n* **Smart Data Processing**:\n  - Automatic entity detection and relationship mapping\n  - Cross-file relationship detection and validation\n  - Key-value pair extraction for filtered searches\n  - Embedded metadata tracking\n  - Batch processing with change detection\n\n* **Archetype-Aware Processing**:\n  - Pattern detection (entities, events, metrics, collections)\n  - Archetype-based scoring and ranking\n  - Relationship validation by archetype\n  - Context-aware embedding generation\n  - Archetype-specific traversal strategies\n\n* **Hierarchical Data Management**:\n  - Full JSON structure preservation\n  - Parent-child relationship tracking\n  - Cross-file relationship mapping\n  - Contextual embedding with ancestry\n  - Path-based chunk identification\n\n* **Enhanced Retrieval**:\n  - Vector similarity search using PGVector\n  - Relationship-aware context assembly\n  - Entity-aware result filtering\n  - Cross-file context expansion\n  - Confidence-based scoring and ranking\n\n\n## Quick Start\n\n1. Clone and install:\n```bash\ngit clone https://github.com/Mocksi/json-rag.git\ncd json_rag\nuv venv rag_env\nsource rag_env/bin/activate  # Windows: .\\rag_env\\Scripts\\activate\nuv pip install -r requirements.txt\n```\n\n2. Set up environment:\n```bash\n# Create .env file with:\nOPENAI_API_KEY=your-key-here\nPOSTGRES_DB=crowllector\nPOSTGRES_USER=crowllector\nPOSTGRES_PASSWORD=yourpassword\nPOSTGRES_HOST=localhost\nPOSTGRES_DB_PORT=5432\n```\n\n3. Initialize and run:\n```bash\npython -m app.main --new  # Truncates all tables and starts fresh\npython -m app.main        # Normal operation\n```\n\n## Architecture\n```\napp/\n├── analysis/           # Analysis and pattern detection\n│   ├── archetype.py   # Pattern and archetype detection\n│   └── relationships.py# Cross-file relationship analysis\n├── core/              # Core system components\n│   ├── config.py      # Configuration settings\n│   └── models.py      # Data models\n├── processing/        # Data processing modules\n│   ├── json_parser.py # JSON structure parsing\n│   ├── parsing.py     # Document parsing and chunking\n│   └── processor.py   # Data processing pipeline\n├── retrieval/         # Query processing and retrieval\n│   ├── embedding.py   # Vector embedding generation\n│   └── retrieval.py   # Query pipeline and execution\n├── storage/           # Data persistence\n│   └── database.py    # PostgreSQL and vector storage\n├── utils/             # Utility modules\n│   └── logging_config.py # Logging configuration\n├── __init__.py        # Package initialization\n├── chat.py           # Chat interface and interactions\n└── main.py           # Application entry point\n```\n\nThe codebase is organized into logical modules:\n\n- **analysis/**: Modules for analyzing data patterns, cross-file relationships, and user intent\n- **core/**: Core system configuration and shared components\n- **processing/**: Data processing and relationship detection modules\n- **retrieval/**: Relationship-aware search and context assembly\n- **storage/**: Database interaction and relationship persistence\n- **utils/**: Shared utility functions and helpers\n\nEach module is designed to be independent with clear responsibilities, while working together through well-defined interfaces.\n\n## Installation Requirements\n\n- Python 3.8 or higher\n- PostgreSQL 12 or higher with PGVector extension\n- OpenAI API key\n- Required Python packages (see requirements.txt)\n\n## Documentation\n\nThe codebase features comprehensive inline documentation:\n- Detailed module-level docstrings explaining key concepts\n- Function and class documentation with examples\n- Type hints and parameter descriptions\n- Usage examples and implementation notes\n\n## Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on:\n- Setting up your development environment\n- Code style guidelines\n- Pull request process\n- Development workflow\n\n## Code of Conduct\n\nThis project follows the [Contributor Covenant Code of Conduct](CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code. Please report unacceptable behavior.\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Roadmap\n\n- [x] Cross-file relationship detection\n- [x] Archetype-aware retrieval\n- [x] Relationship-based context expansion\n- [x] Confidence scoring algorithm refinement\n- [ ] State transition handling improvements\n- [ ] Batch processing optimization\n- [ ] Metric aggregation capabilities\n- [ ] Entity filtering rules improvement\n- [ ] Context assembly performance optimization\n- [ ] Advanced archetype pattern detection\n\n## Query Pipeline\n\nThe system implements a structured reasoning pipeline:\n\n1. **Query Analysis**: \n   - Determines required data types\n   - Identifies needed operations (filtering, aggregation)\n   - Detects relationships and constraints\n\n2. **Plan Creation**:\n   - Builds retrieval strategy\n   - Plans processing operations\n   - Determines result formatting\n\n3. **Execution**:\n   - Retrieves relevant chunks\n   - Processes according to plan\n   - Assembles coherent response\n\nThis systematic approach ensures consistent and reliable query handling while preserving context and relationships.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmocksi%2Fjson-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmocksi%2Fjson-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmocksi%2Fjson-rag/lists"}