{"id":50391013,"url":"https://github.com/hallelx2/vectorless","last_synced_at":"2026-05-30T18:01:35.877Z","repository":{"id":350289981,"uuid":"1204886312","full_name":"hallelx2/vectorless","owner":"hallelx2","description":"Vectorless — document retrieval for the reasoning era. Structure-preserving retrieval that lets LLMs reason over document maps instead of vector search.","archived":false,"fork":false,"pushed_at":"2026-05-23T23:55:47.000Z","size":1317,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-24T01:19:51.441Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://vectorless.store","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hallelx2.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-08T12:33:53.000Z","updated_at":"2026-05-23T23:55:51.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/hallelx2/vectorless","commit_stats":null,"previous_names":["hallelx2/vectorless"],"tags_count":3,"template":false,"template_full_name":"google-gemini/aistudio-repository-template","purl":"pkg:github/hallelx2/vectorless","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hallelx2%2Fvectorless","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hallelx2%2Fvectorless/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hallelx2%2Fvectorless/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hallelx2%2Fvectorless/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hallelx2","download_url":"https://codeload.github.com/hallelx2/vectorless/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hallelx2%2Fvectorless/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33703065,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-30T02:00:06.278Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-30T18:01:35.043Z","updated_at":"2026-05-30T18:01:35.868Z","avatar_url":"https://github.com/hallelx2.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Vectorless\n\n**Document retrieval for the reasoning era.**\n\n[![CI](https://github.com/hallelx2/vectorless/actions/workflows/ci.yml/badge.svg)](https://github.com/hallelx2/vectorless/actions/workflows/ci.yml)\n[![Deploy](https://deploy-badge.vercel.app/vercel/vectorless-web?style=flat\u0026name=deploy)](https://vectorless-web.vercel.app)\n[![npm](https://img.shields.io/npm/v/vectorless?color=blue\u0026label=npm)](https://www.npmjs.com/package/vectorless)\n[![PyPI](https://img.shields.io/pypi/v/vectorless-sdk?color=blue\u0026label=pypi)](https://pypi.org/project/vectorless-sdk/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![TypeScript](https://img.shields.io/badge/TypeScript-5.9-blue?logo=typescript\u0026logoColor=white)](https://www.typescriptlang.org/)\n[![Python](https://img.shields.io/badge/Python-3.9+-blue?logo=python\u0026logoColor=white)](https://python.org)\n[![Next.js](https://img.shields.io/badge/Next.js-15-black?logo=next.js\u0026logoColor=white)](https://nextjs.org)\n[![Fastify](https://img.shields.io/badge/Fastify-5-black?logo=fastify\u0026logoColor=white)](https://fastify.dev)\n\n[Website](https://vectorless.store) | [npm](https://www.npmjs.com/package/vectorless) | [PyPI](https://pypi.org/project/vectorless-sdk/)\n\n\u003c/div\u003e\n\n---\n\n## What is Vectorless?\n\nVectorless is a document retrieval platform that replaces traditional RAG chunking with **structure-preserving retrieval**. Instead of splitting documents into arbitrary chunks and using vector similarity search, Vectorless:\n\n1. **Preserves document structure** -- sections, headings, chapters stay intact\n2. **Generates navigable document maps** -- a Table of Contents with summaries for each section\n3. **Lets LLMs reason** over the map to select exactly the sections they need\n\nThe result: more accurate retrieval, complete context, and every choice is traceable.\n\n## How It Works\n\n\u003cdiv align=\"center\"\u003e\n\n![How Vectorless Works](docs/how-it-works.svg)\n\n\u003c/div\u003e\n\nTraditional RAG shatters documents into arbitrary chunks and relies on vector similarity -- a black box that destroys structure and loses context. Vectorless takes a fundamentally different approach:\n\n| | Traditional RAG | Vectorless |\n|---|----------------|------------|\n| **Splitting** | Arbitrary chunks | Natural sections |\n| **Retrieval** | Vector similarity | LLM reasoning |\n| **Structure** | Destroyed | Preserved |\n| **Traceability** | Black box ranking | Every choice explainable |\n| **Context** | Fragments | Complete sections |\n\n## Architecture\n\n\u003cdiv align=\"center\"\u003e\n\n![System Architecture](docs/architecture.svg)\n\n\u003c/div\u003e\n\nVectorless is designed as a modular stack. Your application talks to the API through one of the official SDKs (TypeScript or Python). The API orchestrates document processing across four infrastructure services:\n\n- **Neon (PostgreSQL)** -- stores documents, sections, ToC maps, and metadata\n- **Cloudflare R2** -- holds uploaded document files (S3-compatible)\n- **Upstash QStash** -- manages background jobs for async document processing\n- **Gemini / Claude** -- LLM used to generate summaries and ToC maps\n\n## The Retrieval Flow\n\n\u003cdiv align=\"center\"\u003e\n\n![Query Retrieval Flow](docs/retrieval-flow.svg)\n\n\u003c/div\u003e\n\nRetrieval happens in three steps:\n\n1. **Get the Document Map** -- call `getToC()` to receive a structured Table of Contents with section titles, summaries, and IDs. This is lightweight metadata, not the full content.\n\n2. **LLM Reasons Over the Map** -- pass the ToC to your LLM along with the user's query. The LLM reads the summaries and selects exactly the sections relevant to the question. Every choice is visible and explainable.\n\n3. **Fetch Complete Sections** -- call `fetchSections()` with the IDs your LLM selected. You get back full, unbroken section content -- no fragments, no missing context.\n\n## Quick Start\n\n### TypeScript\n\n```bash\nnpm install vectorless\n```\n\n```typescript\nimport { VectorlessClient } from \"vectorless\";\n\nconst client = new VectorlessClient({ apiKey: \"vl_sk_live_...\" });\n\n// Upload a document\nconst { doc_id, toc } = await client.addDocument(file, {\n  tocStrategy: \"hybrid\",\n});\n\n// Get the document map\nconst toc = await client.getToC(doc_id);\nfor (const section of toc.sections) {\n  console.log(`${section.title}: ${section.summary}`);\n}\n\n// Fetch specific sections (after your LLM picks them)\nconst sections = await client.fetchSections(doc_id, [\"section-1\", \"section-2\"]);\n```\n\n### Python\n\n```bash\npip install vectorless-sdk\n```\n\n```python\nfrom vectorless import VectorlessClient\n\nclient = VectorlessClient(api_key=\"vl_sk_live_...\")\n\n# Upload a document\nresult = client.add_document(\"report.pdf\", options=AddDocumentOptions(\n    toc_strategy=\"hybrid\",\n))\n\n# Get the document map\ntoc = client.get_toc(result.doc_id)\nfor section in toc.sections:\n    print(f\"{section.title}: {section.summary}\")\n\n# Fetch specific sections\nsections = client.fetch_sections(result.doc_id, [\"section-1\", \"section-2\"])\n```\n\n## ToC Strategies\n\n| Strategy | When to Use | LLM Required |\n|----------|------------|-------------|\n| **extract** | Documents with clear headings (PDF with bookmarks, Markdown with `#` headings) | No |\n| **hybrid** | Headings exist but summaries need to be precise for retrieval | Yes |\n| **generate** | Unstructured documents with no headings | Yes |\n\n## Supported Formats\n\n- **PDF** -- text extraction + heading detection from structure and font patterns\n- **DOCX** -- heading hierarchy from Word styles\n- **Markdown / TXT** -- `#` headings, setext headings, ALL CAPS detection\n- **URL** -- fetches HTML, strips navigation, extracts heading structure\n\n## SDK Reference\n\n### TypeScript SDK (`vectorless`)\n\n| Method | Description |\n|--------|-------------|\n| `addDocument(source, options?)` | Upload and ingest a document |\n| `getToC(docId)` | Get the Table of Contents manifest |\n| `fetchSection(docId, sectionId)` | Fetch a single section |\n| `fetchSections(docId, sectionIds)` | Batch fetch multiple sections |\n| `getDocument(docId)` | Get document status and metadata |\n| `listDocuments(options?)` | List all documents |\n| `deleteDocument(docId)` | Delete a document and all sections |\n\n### Python SDK (`vectorless-sdk`)\n\n| Method | Description |\n|--------|-------------|\n| `add_document(source, options?)` | Upload and ingest a document |\n| `get_toc(doc_id)` | Get the Table of Contents manifest |\n| `fetch_section(doc_id, section_id)` | Fetch a single section |\n| `fetch_sections(doc_id, section_ids)` | Batch fetch multiple sections |\n| `get_document(doc_id)` | Get document status and metadata |\n| `list_documents(options?)` | List all documents |\n| `delete_document(doc_id)` | Delete a document and all sections |\n\nBoth SDKs also support async clients. The Python SDK provides `AsyncVectorlessClient`.\n\n## Project Structure\n\n```\nvectorless/\n  apps/\n    web/         # Next.js dashboard + marketing site\n    api/         # Fastify REST API server\n  packages/\n    shared/      # Shared TypeScript types + Zod schemas\n    ts-sdk/      # TypeScript SDK (npm: vectorless)\n    openapi/     # OpenAPI 3.1 specification\n  sdks/\n    python/      # Python SDK (PyPI: vectorless-sdk)\n  docs/          # SVG diagrams and documentation assets\n```\n\n## Self-Hosting\n\nVectorless can be self-hosted. You need:\n\n- **PostgreSQL** with pgvector (we recommend [Neon](https://neon.tech))\n- **Cloudflare R2** or any S3-compatible storage for document files\n- **Upstash QStash** for background job processing\n- **Gemini** (Vertex AI) or **Anthropic** (Claude) for ToC generation\n\nSee the [Deployment Guide](./DEPLOYMENT.md) for step-by-step instructions.\n\n## BYOK (Bring Your Own Key)\n\nUsers can configure their own LLM API keys in the dashboard. Keys are encrypted with AES-256-GCM before storage and are never exposed in logs or responses. If no BYOK key is configured, the platform's default LLM is used.\n\n## Contributing\n\nContributions are welcome. Please open an issue first to discuss what you'd like to change.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhallelx2%2Fvectorless","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhallelx2%2Fvectorless","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhallelx2%2Fvectorless/lists"}