{"id":31532630,"url":"https://github.com/markaronov/superfind","last_synced_at":"2025-10-04T03:56:58.850Z","repository":{"id":309563674,"uuid":"1035418481","full_name":"MarkAronov/SuperFind","owner":"MarkAronov","description":"Intelligent Semantic Search API with Vector Database \u0026 RAG","archived":false,"fork":false,"pushed_at":"2025-10-02T00:04:42.000Z","size":226,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-02T02:29:03.155Z","etag":null,"topics":["bun","hono","langchain","qdrant","typescript"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MarkAronov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-10T11:04:09.000Z","updated_at":"2025-10-02T00:08:54.000Z","dependencies_parsed_at":"2025-08-12T16:29:41.375Z","dependency_job_id":"3a150e01-45f9-4555-b56e-0efbc3c524ce","html_url":"https://github.com/MarkAronov/SuperFind","commit_stats":null,"previous_names":["markaronov/superfind"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MarkAronov/SuperFind","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkAronov%2FSuperFind","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkAronov%2FSuperFind/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkAronov%2FSuperFind/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkAronov%2FSuperFind/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MarkAronov","download_url":"https://codeload.github.com/MarkAronov/SuperFind/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkAronov%2FSuperFind/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278262444,"owners_count":25957938,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bun","hono","langchain","qdrant","typescript"],"created_at":"2025-10-04T03:56:46.182Z","updated_at":"2025-10-04T03:56:58.839Z","avatar_url":"https://github.com/MarkAronov.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# SuperFind\n\n### Intelligent Semantic Search API with Vector Database \u0026 RAG\n\n[![Bun](https://img.shields.io/badge/Bun-1.0+-000000?style=for-the-badge\u0026logo=bun\u0026logoColor=white)](https://bun.sh)\n[![TypeScript](https://img.shields.io/badge/TypeScript-5.0+-3178C6?style=for-the-badge\u0026logo=typescript\u0026logoColor=white)](https://www.typescriptlang.org/)\n[![Hono](https://img.shields.io/badge/Hono-4.9+-E36002?style=for-the-badge\u0026logo=hono\u0026logoColor=white)](https://hono.dev/)\n[![Qdrant](https://img.shields.io/badge/Qdrant-Vector_DB-DC244C?style=for-the-badge\u0026logo=qdrant\u0026logoColor=white)](https://qdrant.tech/)\n[![LangChain](https://img.shields.io/badge/LangChain-0.3-1C3C3C?style=for-the-badge\u0026logo=chainlink\u0026logoColor=white)](https://js.langchain.com/)\n[![OpenAI](https://img.shields.io/badge/OpenAI-GPT--4-412991?style=for-the-badge\u0026logo=openai\u0026logoColor=white)](https://openai.com/)\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=for-the-badge)](http://makeapullrequest.com)\n\n**A production-ready semantic search engine powered by vector embeddings, multiple AI providers, and RAG (Retrieval Augmented Generation) pattern.**\n\n[Features](#-features) • [Quick Start](#-quick-start) • [API Usage](#-api-usage) • [Architecture](#-architecture) • [Roadmap](#-roadmap-future-enhancements)\n\n\u003c/div\u003e\n\n---\n\n## Features\n\n### Core Capabilities\n- **Semantic Search**: Find relevant results based on meaning, not just keywords\n- **Complex Query Support**: Handles multi-criteria queries (experience + location, skills + role)\n- **Multi-AI Provider Support**: OpenAI, Anthropic, Google Gemini, Ollama, HuggingFace\n- **Vector Database Integration**: Qdrant for high-performance similarity search\n- **RAG Pattern**: Retrieval Augmented Generation for context-aware AI responses\n- **Multi-Format Data Parsing**: CSV, JSON, and TXT file support\n- **Pagination**: Efficient result pagination with metadata\n- **Lightning Fast**: Built on Bun runtime for maximum performance\n\n### Technical Features\n- **Type-Safe**: Full TypeScript implementation\n- **RESTful API**: Clean and intuitive Hono-based endpoints\n- **Flexible Embeddings**: Support for multiple embedding models\n- **Vector Similarity**: Cosine similarity search with 3072-dimensional vectors\n- **Health Monitoring**: Built-in health check endpoints\n- **Environment Config**: Secure credential management\n\n---\n\n## Quick Start\n\n### Prerequisites\n- [Bun](https://bun.sh) 1.0+\n- [Qdrant](https://qdrant.tech/) running locally or remote instance\n- API keys for AI providers (OpenAI, Anthropic, etc.)\n\n### Installation\n\n1. **Clone the repository**\n   ```bash\n   git clone https://github.com/MarkAronov/SuperFind.git\n   cd SuperFind\n   ```\n\n2. **Install dependencies**\n   ```bash\n   bun install\n   ```\n\n3. **Configure environment variables**\n   ```bash\n   # Create .env file with your API keys\n   OPENAI_API_KEY=your_key_here\n   ANTHROPIC_API_KEY=your_key_here\n   GEMINI_API_KEY=your_key_here\n   # ... add other providers as needed\n   ```\n\n4. **Start Qdrant** (if running locally)\n   ```bash\n   docker run -p 6333:6333 qdrant/qdrant\n   ```\n\n5. **Start the development server**\n   ```bash\n   bun run dev\n   ```\n\n6. **Test the API**\n   ```bash\n   curl \"http://localhost:3000/api/search?query=devops+engineer\u0026limit=5\"\n   ```\n\n---\n\n## API Usage\n\n### Search Endpoint\n```http\nGET /api/search?query=\u003csearch_term\u003e\u0026limit=\u003cnum\u003e\u0026offset=\u003cnum\u003e\u0026provider=\u003cai_provider\u003e\n```\n\n#### Parameters\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `query` | string | *required* | Search query (semantic search) |\n| `limit` | number | `5` | Results per page (1-50) |\n| `offset` | number | `0` | Skip N results (pagination) |\n| `provider` | string | `openai` | AI provider for answer generation |\n\n#### Example Request\n```bash\ncurl \"http://localhost:3000/api/search?query=senior+backend+developer\u0026limit=10\u0026offset=0\u0026provider=openai\"\n```\n\n#### Example Response\n```json\n{\n  \"answer\": \"Based on the search results, here are the senior backend developers...\",\n  \"sources\": [\n    {\n      \"pageContent\": \"John Doe - Senior Backend Engineer with 8 years of experience...\",\n      \"metadata\": {\n        \"name\": \"John Doe\",\n        \"skills\": \"Node.js, TypeScript, PostgreSQL\",\n        \"location\": \"USA\"\n      }\n    }\n  ],\n  \"pagination\": {\n    \"total\": 25,\n    \"returned\": 10,\n    \"limit\": 10,\n    \"offset\": 0\n  }\n}\n```\n\n### Health Check\n```http\nGET /health\n```\n\n---\n\n## Architecture\n\n### Tech Stack\n- **Runtime**: Bun (faster Node.js alternative)\n- **Framework**: Hono (lightweight web framework)\n- **Vector DB**: Qdrant (similarity search)\n- **AI Orchestration**: LangChain.js\n- **Embeddings**: OpenAI text-embedding-3-large (3072 dimensions)\n- **LLM Providers**: OpenAI, Anthropic, Gemini, Ollama, HuggingFace\n\n### How It Works\n1. **User Query** → Converted to vector embedding (3072 dimensions)\n2. **Vector Search** → Qdrant finds similar documents via cosine similarity\n3. **Context Retrieval** → Top K relevant documents retrieved\n4. **AI Generation** → LLM generates answer using retrieved context (RAG)\n5. **Response** → JSON with answer, sources, and pagination metadata\n\n### Project Structure\n```\nSuperFind/\n├── src/\n│   ├── ai/                    # AI routes and services\n│   │   ├── ai.routes.ts       # Search endpoints\n│   │   ├── ai.services.ts     # RAG logic\n│   │   └── providers/         # AI provider implementations\n│   ├── vector/                # Vector database services\n│   │   ├── qdrant.services.ts # Qdrant integration\n│   │   └── embedding-factory.ts\n│   ├── parser/                # Data parsing services\n│   └── config/                # Environment configuration\n├── static-data/               # Sample datasets\n│   ├── csv/                   # CSV files\n│   ├── json/                  # JSON files\n│   └── text/                  # Text files\n├── frontend/                  # React frontend (optional)\n└── docs/                      # Documentation\n```\n\n---\n\n## Documentation\n\n- [API Routes](./API_ROUTES.md) - Complete API reference\n- [Architecture](./ARCHITECTURE.md) - System design and patterns\n- [AI Providers](./AI_PROVIDERS.md) - Supported AI providers\n- [Vector Database](./VECTOR_DATABASE.md) - Qdrant integration details\n- [Pagination](./PAGINATION.md) - Pagination implementation\n- [Implementation Guide](./IMPLEMENTATION.md) - Learning path for developers\n- [Search Tests](./SEARCH_TESTS.md) - Comprehensive test scenarios and examples\n- [Test Results](./TEST_RESULTS.md) - Validation results with 100% accuracy\n\n---\n\n## Roadmap \u0026 Future Enhancements\n\n### High Priority\n- [ ] **Authentication \u0026 Authorization**\n  - JWT-based authentication\n  - API key management\n  - Rate limiting per user\n  - Role-based access control (RBAC)\n\n- [ ] **Advanced Search Features**\n  - Hybrid search (vector + keyword/BM25)\n  - Metadata filtering (location, skills, experience level)\n  - Faceted search with filters\n  - Search suggestions/autocomplete\n  - Search history tracking\n\n- [ ] **Performance Optimizations**\n  - Response caching (Redis)\n  - Query result caching\n  - Embedding caching for common queries\n  - Database connection pooling\n  - CDN integration for static assets\n\n### Medium Priority\n- [ ] **Frontend Enhancements**\n  - Advanced search UI with filters\n  - Real-time search (debounced)\n  - Infinite scroll pagination\n  - Search result highlighting\n  - Export results (CSV, JSON, PDF)\n  - Dark mode support\n\n- [ ] **Data Management**\n  - Admin dashboard for data management\n  - Bulk upload/import (CSV, JSON, Excel)\n  - Data deduplication\n  - Automated data refresh/sync\n  - Data versioning and rollback\n\n- [ ] **Analytics \u0026 Monitoring**\n  - Search analytics dashboard\n  - Query performance metrics\n  - User behavior tracking\n  - Error logging and alerting (Sentry)\n  - Prometheus + Grafana integration\n\n### Advanced Features\n- [ ] **Multi-Modal Search**\n  - Image-based search\n  - Voice search integration\n  - PDF/document search\n  - Code search capabilities\n\n- [ ] **Machine Learning Enhancements**\n  - Custom fine-tuned embeddings\n  - Personalized search ranking\n  - A/B testing for search algorithms\n  - Semantic clustering/categorization\n  - Query intent classification\n\n- [ ] **Scalability**\n  - Kubernetes deployment configs\n  - Load balancing\n  - Multi-region Qdrant clusters\n  - Horizontal scaling support\n  - Microservices architecture\n\n### Experimental\n- [ ] **Conversational Search**\n  - Multi-turn conversations\n  - Context-aware follow-up queries\n  - Chat history persistence\n  - Streaming responses (SSE)\n\n- [ ] **Integration Ecosystem**\n  - Slack/Discord bot integration\n  - Chrome extension\n  - VS Code extension\n  - REST API SDK libraries (Python, JS, Go)\n  - Webhooks for events\n\n- [ ] **Enterprise Features**\n  - Multi-tenancy support\n  - Custom embedding models per tenant\n  - SLA monitoring\n  - Audit logs\n  - Data residency compliance\n\n### Developer Experience\n- [ ] **Testing \u0026 Quality**\n  - Unit test coverage (Jest/Vitest)\n  - Integration tests\n  - E2E tests (Playwright)\n  - Performance benchmarks\n  - Load testing scripts\n\n- [ ] **Documentation**\n  - Interactive API documentation (Swagger/OpenAPI)\n  - Video tutorials\n  - Blog post series\n  - Code examples repository\n  - Contribution guidelines\n\n- [ ] **DevOps**\n  - CI/CD pipeline (GitHub Actions)\n  - Automated deployments\n  - Database migration scripts\n  - Docker Compose for local dev\n  - Infrastructure as Code (Terraform)\n\n---\n\n## Contributing\n\nContributions are welcome! Here's how you can help:\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\nPlease read our [Contributing Guidelines](CONTRIBUTING.md) before submitting PRs.\n\n---\n\n## Scripts\n\n| Command | Description |\n|---------|-------------|\n| `bun run dev` | Start development server with hot reload |\n| `bun start` | Start production server |\n| `bun run lint` | Lint code with Biome |\n| `bun run format` | Format code with Biome |\n| `bun run check` | Check code quality |\n| `bun run check:fix` | Auto-fix code quality issues |\n\n---\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## Acknowledgments\n\n- [Qdrant](https://qdrant.tech/) - High-performance vector database\n- [LangChain](https://js.langchain.com/) - LLM orchestration framework\n- [OpenAI](https://openai.com/) - Embeddings and language models\n- [Hono](https://hono.dev/) - Ultrafast web framework\n- [Bun](https://bun.sh/) - JavaScript runtime\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n**Built by [MarkAronov](https://github.com/MarkAronov)**\n\nStar this repo if you find it helpful!\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarkaronov%2Fsuperfind","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarkaronov%2Fsuperfind","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarkaronov%2Fsuperfind/lists"}