{"id":49362486,"url":"https://github.com/legionio/lex-knowledge","last_synced_at":"2026-05-07T06:12:56.132Z","repository":{"id":346628531,"uuid":"1190942688","full_name":"LegionIO/lex-knowledge","owner":"LegionIO","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-15T15:04:56.000Z","size":138,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-15T17:09:01.387Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LegionIO.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-24T19:13:13.000Z","updated_at":"2026-04-15T15:04:25.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/LegionIO/lex-knowledge","commit_stats":null,"previous_names":["legionio/lex-knowledge"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/LegionIO/lex-knowledge","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LegionIO%2Flex-knowledge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LegionIO%2Flex-knowledge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LegionIO%2Flex-knowledge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LegionIO%2Flex-knowledge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LegionIO","download_url":"https://codeload.github.com/LegionIO/lex-knowledge/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LegionIO%2Flex-knowledge/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32345816,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T23:26:28.701Z","status":"online","status_checked_at":"2026-04-27T02:00:06.769Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-27T17:01:43.613Z","updated_at":"2026-04-27T17:01:44.525Z","avatar_url":"https://github.com/LegionIO.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# lex-knowledge\n\nDocument corpus ingestion and knowledge query pipeline for LegionIO.\n\n`lex-knowledge` walks a directory of documents, parses them into sections, splits sections into token-aware chunks, and writes each chunk to Apollo as a searchable knowledge entry. A query runner retrieves relevant chunks via semantic search and optionally synthesizes an answer through the LLM pipeline.\n\n## Phase A: Corpus Ingestion\n\nThis gem implements Phase A of the knowledge pipeline:\n\n- **Manifest**: file walker with SHA256 fingerprinting and incremental diff support\n- **Parser**: section-aware extraction for Markdown and plain text\n- **Chunker**: paragraph-respecting splits with configurable token budget and overlap\n- **Ingest runners**: full corpus or single-file ingestion, writing chunks to Apollo\n- **Query runners**: retrieval-only or retrieval + LLM synthesis\n\n`.docx` and `.pdf` parsing are deferred to a later phase.\n\n## Usage\n\n```ruby\nrequire 'legion/extensions/knowledge'\n\n# Ingest an entire directory\nLegion::Extensions::Knowledge::Runners::Ingest.ingest_corpus(\n  path:    '/path/to/docs',\n  dry_run: false,\n  force:   false\n)\n# =\u003e { success: true, files_scanned: 12, chunks_created: 84, chunks_skipped: 0, chunks_updated: 0 }\n\n# Ingest a single file\nLegion::Extensions::Knowledge::Runners::Ingest.ingest_file(\n  file_path: '/path/to/docs/guide.md'\n)\n# =\u003e { success: true, file: '...', chunks_created: 7, chunks_skipped: 0, chunks_updated: 0 }\n\n# Query with LLM synthesis\nLegion::Extensions::Knowledge::Runners::Query.query(\n  question:   'How does Legion route tasks?',\n  top_k:      5,\n  synthesize: true\n)\n# =\u003e { success: true, answer: '...', sources: [...], metadata: { retrieval_score: 0.87, chunk_count: 5, latency_ms: 312 } }\n\n# Retrieval only (no LLM)\nLegion::Extensions::Knowledge::Runners::Query.retrieve(\n  question: 'What is a LEX extension?',\n  top_k:    3\n)\n# =\u003e { success: true, sources: [...], metadata: { chunk_count: 3 } }\n```\n\n## Configuration\n\nSettings are read from `Legion::Settings` under the `:knowledge` key:\n\n```yaml\nknowledge:\n  chunker:\n    max_tokens: 512      # default 512\n    overlap_tokens: 128  # default 128\n  query:\n    top_k: 5             # default 5\n```\n\n## Dependencies\n\n- `legion-cache`, `legion-crypt`, `legion-data`, `legion-json`, `legion-logging`, `legion-settings`, `legion-transport`\n- `lex-apollo` (optional): chunk storage and vector retrieval\n- `legion-llm` (optional): answer synthesis\n\nBoth optional dependencies are guarded with `defined?()` — the gem degrades gracefully when they are absent.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flegionio%2Flex-knowledge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flegionio%2Flex-knowledge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flegionio%2Flex-knowledge/lists"}