{"id":18623634,"url":"https://github.com/devan-kerman/orchestrator","last_synced_at":"2025-07-10T07:04:59.452Z","repository":{"id":214395243,"uuid":"736419959","full_name":"Devan-Kerman/Orchestrator","owner":"Devan-Kerman","description":null,"archived":false,"fork":false,"pushed_at":"2023-12-30T17:34:17.000Z","size":152,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-17T06:41:03.993Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Devan-Kerman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-27T21:24:51.000Z","updated_at":"2023-12-27T21:25:10.000Z","dependencies_parsed_at":"2024-11-07T04:28:45.042Z","dependency_job_id":null,"html_url":"https://github.com/Devan-Kerman/Orchestrator","commit_stats":null,"previous_names":["devan-kerman/orchestrator"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Devan-Kerman/Orchestrator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Devan-Kerman%2FOrchestrator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Devan-Kerman%2FOrchestrator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Devan-Kerman%2FOrchestrator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Devan-Kerman%2FOrchestrator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Devan-Kerman","download_url":"https://codeload.github.com/Devan-Kerman/Orchestrator/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Devan-Kerman%2FOrchestrator/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264545017,"owners_count":23625387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T04:25:28.849Z","updated_at":"2025-07-10T07:04:59.430Z","avatar_url":"https://github.com/Devan-Kerman.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Orchestrator\n\nOptimal Distributed Computing Schedules using Integer Programming.\n\n## What\n\nThe goal of orchestrator is to take a set of functions with\ndependencies and outputs, and to distribute them across a cluster\nin the most optimal way.\n\n## How\n\nWe can use linear programming and existing solvers to find the best\nsolution given a set of constraints (memory, threadcount, latency, execution time, etc.).\nSee the basic formulas [here](docs/Linear%20Constraints%20Formulation.pdf).\nUnfortunately linear programs have a really high time\ncomplexity: ![img.png](docs/linear_programming_solver_complexity.png)\nSo through variable reduction and incremental solving we can cache prior solutions\nand use them to solve real-time problems efficiently.\n\n## Roadmap\n - [X] Reduce to Linear Programming problem\n - [ ] Basic Solver\n - [ ] Advanced API\n   - [ ] Memory Constraints\n   - [ ] Device Constraints (whether job J can execute on device D)\n   - [ ] Real-Time Systems\n   - [ ] Pause-Resume\n   - [ ] Exception Handling\n - [ ] Incremental Solver\n - [ ] Clustering Variable Reduction Optimization\n - [ ] JAX-Conductor\n   - [ ] Auto recurrence to convolution\n   - [ ] Auto chunking and orchestrated distribution\n   - [ ] Auto kernel fusion\n\nJAX-Conductor\n---\nI noticed papers such as\nMamba ([Gu. A, \u0026 Dao T. (2023)](https://arxiv.org/abs/2312.00752)),\nFlash Attention 2 ([Dao T. (2023)](https://arxiv.org/abs/2307.08691)),\nand Ring Attention ([Liu, H., Zaharia, M., \u0026 Abbeel, P. (2023)](https://arxiv.org/abs/2310.01889)),\nGEMM, etc. that the optimizations and methods for distributing computation often boiled down to\nthe same kinds of transformations of the original code.\n\n- Recurrence to convolution for parallelism (Mamba)\n- Chunking for io-awareness and distribution (Ring Attn \u0026 Flash Attn)\n- Kernel fusion for reduced intermediate state (Mamba \u0026 Flash Attn)\n\nKnowing what technique to apply when seems like an easy algorithm:\n\n- In the recurrence case, is it faster throughput-wise to perform the recurrence sequentially\n  or recompute the recurrence for every token separately?\n    - In cases where the GPU memory can't simply hold more batches, it's better to use a convolution even if there is\n      many redundant computations\n- For Kernel-fusion does the next step reduce dimensionality, and is the reduction in size worth any redundant\n  operations?\n- And chunking is a matter of what dimensions should I chunk along and by how much?\n\nJAX-Conductor will work in conjunction with orchestrator to determine which devices should do what, when in what order.\nSince each TPU or GPU-core-sram-pair can be thought of as its own device, and moving memory between them involves\ncomplex coordination.\n\nIt will be a submodule of orchestrator that introduces a jaxpr interpretter\nthat automatically distributes any function across a cluster and across devices\nand blocks using pallas. The goal will be to make it as simple\nand modular as possible with integrated escape hatches in cases where the automated\nprocess fails. (eg. precision issues with flash-attn)\n\nA milestone would be to write a traditional transformer or vanilla mamba model and have JAX-Conductor\nautomatically implement optimizations such as Ring Attention, while also recognizing what part\nof the sequence should go to what devices to minimize latency. (Eg. store the sequence in a ring order so the latency is\nminimized)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevan-kerman%2Forchestrator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevan-kerman%2Forchestrator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevan-kerman%2Forchestrator/lists"}