{"id":43657195,"url":"https://github.com/rapidfireai/rapidfireai","last_synced_at":"2026-05-30T02:02:06.286Z","repository":{"id":301455384,"uuid":"1009195479","full_name":"RapidFireAI/rapidfireai","owner":"RapidFireAI","description":"RapidFire AI: Rapid AI Customization from RAG to Fine-Tuning","archived":false,"fork":false,"pushed_at":"2026-05-29T19:13:05.000Z","size":106081,"stargazers_count":164,"open_issues_count":37,"forks_count":23,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-05-29T19:15:53.367Z","etag":null,"topics":["ai","artifical-intelligense","context-engineering","deep-learning","experiment-tracking","experimentation","fine-tuning","gpu","llm","mlflow","post-training","rag","rapidfire"],"latest_commit_sha":null,"homepage":"https://rapidfire.ai","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RapidFireAI.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-26T18:22:40.000Z","updated_at":"2026-05-27T19:48:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"2efdbb49-7112-40c4-a616-22c92cfd778a","html_url":"https://github.com/RapidFireAI/rapidfireai","commit_stats":null,"previous_names":["rapidfireai/rapidfireai"],"tags_count":75,"template":false,"template_full_name":null,"purl":"pkg:github/RapidFireAI/rapidfireai","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RapidFireAI%2Frapidfireai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RapidFireAI%2Frapidfireai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RapidFireAI%2Frapidfireai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RapidFireAI%2Frapidfireai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RapidFireAI","download_url":"https://codeload.github.com/RapidFireAI/rapidfireai/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RapidFireAI%2Frapidfireai/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33677261,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-30T02:00:06.278Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","artifical-intelligense","context-engineering","deep-learning","experiment-tracking","experimentation","fine-tuning","gpu","llm","mlflow","post-training","rag","rapidfire"],"created_at":"2026-02-04T21:03:20.130Z","updated_at":"2026-05-30T02:02:06.280Z","avatar_url":"https://github.com/RapidFireAI.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cdiv align=\"center\"\u003e\n\n\u003ca href=\"https://rapidfire.ai\"\u003e\n    \u003cpicture\u003e\n        \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/RapidFire-logo-for-dark-theme.svg\"\u003e\n        \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/RapidFire-logo-for-light-theme.svg\"\u003e\n        \u003cimg alt=\"RapidFire AI\" src=\"https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/RapidFire-logo-for-light-theme.svg\"\u003e\n    \u003c/picture\u003e\n\u003c/a\u003e\n\n\u003ch3\u003eRapid AI Customization from RAG to Fine-Tuning\u003c/h3\u003e\n\u003cp\u003e20x experimentation throughput of LLM pipelines faster, more systematic.\u003c/p\u003e\n\n\u003ca href=\"https://colab.research.google.com/github/RapidFireAI/rapidfireai/blob/main/tutorial_notebooks/rag-contexteng/rf-colab-rag-fiqa-tutorial.ipynb\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/colab-rag-button.svg\" alt=\"Try RAG on Colab\"\u003e\u003c/a\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u003ca href=\"https://colab.research.google.com/github/RapidFireAI/rapidfireai/blob/main/tutorial_notebooks/fine-tuning/rf-colab-tensorboard-tutorial.ipynb\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/colab-finetuning-button.svg\" alt=\"Try Fine-Tuning on Colab\"\u003e\u003c/a\u003e\n\n\u003c/div\u003e\n\n[![PyPI version](https://img.shields.io/pypi/v/rapidfireai)](https://pypi.org/project/rapidfireai/)\n\n# RapidFire AI\n\nRapid experimentation for easier, faster, and more impactful AI customization. \nBuilt for agentic RAG, context engineering, fine-tuning, and post-training of LLMs and other DL models. \nDelivers 16-24x higher throughput without extra resources.\n\n## Overview\n\nRapidFire AI is a new experiment execution framework that transforms your AI customization experimentation from slow, sequential processes into rapid, intelligent workflows with hyperparallelized execution, dynamic real-time experiment control, and automatic system optimization.\n\n![Usage workflow of RapidFire AI](https://raw.githubusercontent.com/RapidFireAI/rapidfireai/main/docs/images/rf-usage-both.png)\n\nRapidFire AI's adaptive execution engine allows interruptible, shard-based scheduling so you can compare many configurations concurrently, even on a single GPU (for self-hosted models) or a CPU-only machine (for closed model APIs) with dynamic real-time control over runs.\n\n- **Hyperparallelized Execution**: Higher throughput, simultaneous, data shard-at-a-time execution to show side-by-side differences.\n- **Interactive Control (IC Ops)**: Stop, Resume, Clone-Modify, and optionally warm start runs in real-time from the dashboard.\n- **Automatic Optimization**: Intelligent single and multi-GPU orchestration to optimize utilization with minimal overhead for self-hosted models; intelligent token spend and rate limit apportioning for closed model APIs.\n\n![Shard-based concurrent execution (1 GPU)](https://oss-docs.rapidfire.ai/en/latest/_images/gantt-1gpu.png)\n\nFor additional context, see the overview: [RapidFire AI Overview](https://oss-docs.rapidfire.ai/en/latest/overview.html)\n\n## Getting Started\n\n### Prerequisites\n\n- [NVIDIA GPU using the 7.x or 8.x Compute Capability](https://developer.nvidia.com/cuda-gpus)\n- [NVIDIA CUDA Toolkit 11.8+](https://developer.nvidia.com/cuda-toolkit-archive)\n- [Python 3.12.x](https://www.python.org/downloads/)\n- [PyTorch 2.8.0+](https://pytorch.org/get-started/previous-versions/) with corresponding forward compatible prebuilt CUDA binaries\n\n### Install and Get Started\n\n\n```bash\n# Ensure that python3 resolves to python3.12 if needed\npython3 --version  # must be 3.12.x\n\npython3 -m venv .venv\nsource .venv/bin/activate\n\npip install rapidfireai\n\nrapidfireai --version\n# Verify it prints the following:\n# RapidFire AI 0.16.0\n\n# Replace YOUR_TOKEN with your actual Hugging Face token\n# https://huggingface.co/docs/hub/en/security-tokens\nhf auth login --token YOUR_TOKEN\n\n# Due to current issue: https://github.com/huggingface/xet-core/issues/527\npip uninstall -y hf-xet\n\n# Depending on whether you want Fine-tuning/Post-Training or RAG/context eng., pick one of the remaining series of commands\n\n\n# For Fine-tuning/Post-Training: Install specific dependencies and initialize rapidfireai\n\nrapidfireai init\nrapidfireai start\n\n# It should print about 50 lines, including the following:\n# ...\n# RapidFire Frontend is ready\n# Open your browser and navigate to: http://0.0.0.0:8853\n# ...\n# Press Ctrl+C to stop all services\n\n# Forward this port if you installed rapidfireai on a remote machine\nssh -L 8853:localhost:8853 username@remote-machine\n\n# Open an example notebook from ./tutorial_notebooks/[fine-tuning | post-training] and start experiment\n\n\n# [OR]\n\n\n# For RAG/Context Engineering Evals: Install specific dependencies and initialize rapidfireai\nrapidfireai init --evals\nrapidfireai start\n\n# It should print about 50 lines, including the following:\n# ...\n# RapidFire Frontend is ready\n# Open your browser and navigate to: http://0.0.0.0:8853\n# ...\n# Press Ctrl+C to stop all services\n\n# For the RAG/context eng. notebooks, only jupyter is supported for now and must be started as follows\nrapidfireai jupyter\n\n# Forward these ports if you installed rapidfireai on a remote machine\nssh -L 8850:localhost:8850 -L 8851:localhost:8851 -L 8853:localhost:8853 -L 8852:localhost:8852 username@remote-machine\n\n# Open the URL provided by the jupyter notebook command above via your browser\n# Open an example notebook from ./tutorial_notebooks/rag-contexteng/ and start experiment\n\n```\n\n\n\n### Troubleshooting\n\nFor a quick system diagnostics report (Python env, relevant packages, GPU/CUDA, and key environment variables), run:\n\n```bash\nrapidfireai doctor\n```\n\nIf you encounter port conflicts, you can kill existing processes:\n\n```bash\nlsof -t -i:8850 | xargs kill -9  # jupyter server\nlsof -t -i:8851 | xargs kill -9  # dispatcher\nlsof -t -i:8852 | xargs kill -9  # mlflow\nlsof -t -i:8853 | xargs kill -9  # frontend server\nlsof -t -i:8855 | xargs kill -9  # ray dashboard\n```\n\n## Documentation\n\nBrowse or reference the full documentation, example use case tutorials, all API details, dashboard details, and more in the [RapidFire AI Documentation](https://oss-docs.rapidfire.ai).\n\n## Key Features\n\n### MLflow Integration\n\nFull MLflow support for experiment tracking and metrics visualization. A named RapidFire AI experiment corresponds to an MLflow experiment for comprehensive governance\n\n### Interactive Control Operations (IC Ops)\n\nFirst-of-its-kind dynamic real-time control over runs in flight. Can be invoked through the dashboard:\n\n- Stop active runs; puts them in a dormant state\n- Resume stopped runs; makes them active again\n- Clone and modify existing runs, with or without warm starting from parent's weights\n- Delete unwanted or failed runs\n\n### Multi-GPU Support\n\nThe Scheduler automatically handles multiple GPUs on the machine and divides resources across all running configs for optimal resource utilization.\n\n### Search and AutoML Support\n\nBuilt-in procedures for searching over configuration knob combinations, including Grid Search and Random Search. Easy to integrate with AutoML procedures. Native support for some popular AutoML procedures and customized automation of IC Ops coming soon.\n\n## Directory Structure\n\n```text\nrapidfireai/\n├── automl/              # Search and AutoML algorithms for knob tuning\n├── cli.py               # CLI script\n├── evals\n    ├── actors/          # Ray-based workers for doc and query processing  \n    ├── data/            # Data sharding and handling\n    ├── db/              # Database interface and SQLite operations\n    ├── dispatcher/      # Flask-based web API for UI communication\n    ├── metrics/         # Online aggregation logic and metrics handling\n    ├── rag/             # Stages of RAG pipeline\n    ├── scheduling/      # Fair scheduler for multi-config resource sharing\n    └── utils/           # Utility functions and helper modules\n├── experiment.py        # Main experiment lifecycle management\n├── fit\n    ├── backend/         # Core backend components (controller, scheduler, worker)\n    ├── db/              # Database interface and SQLite operations\n    ├── dispatcher/      # Flask-based web API for UI communication\n    ├── frontend/        # Frontend components (dashboard, IC Ops implementation)\n    ├── ml/              # ML training utilities and trainer classes\n    └── utils/           # Utility functions and helper modules\n└── utils.py             # Utility functions and helper modules\n```\n\n## Architecture\n\nRapidFire AI adopts a microservices-inspired loosely coupled distributed architecture with:\n\n- **Dispatcher**: Web API layer for UI communication\n- **Database**: SQLite for state persistence\n- **Controller**: Central orchestrator running in user process\n- **Workers**: GPU-based training processes (for SFT/RFT) or Ray-based Actors for doc and query processing (for RAG/context engineering)\n- **Dashboard**: Experiment tracking and visualization dashboard\n\nThis design enables efficient resource utilization while providing a seamless user experience for AI experimentation.\n\n## Components\n\n### Dispatcher\n\nThe dispatcher provides a REST API interface for the web UI. \nIt can be run via Flask as a single app or via Gunicorn to have it load balanced. \nHandles interactive control features and displays the current state of the runs in the experiment.\n\n### Database\n\nUses SQLite for persistent storage of metadata of experiments, runs, and artifacts. \nThe Controller also uses it to talk with Workers on scheduling state. \nA clean asynchronous interface for all DB operations, including experiment lifecycle management and run tracking.\n\n### Controller\n\nRuns as part of the user’s console or Notebook process. \nOrchestrates the entire training lifecycle including model creation, worker management, and scheduling, \nas well as the entire RAG/context engineering pipeline for evals. \nThe `run_fit` logic handles sample preprocessing, model creation for given knob configurations, \nworker initialization, and continuous monitoring of training progress across distributed workers. \nThe `run_evals` logic handles data chunking, embedding, retrieval, reranking, context construction, and \ngeneration for inference evals.\n\n### Worker\n\nHandles the actual model training and inference on the GPUs for `run_fit` and the data preprocessing and \nRAG inference evals for `run_evals`. \nWorkers poll the Database for tasks, load dataset shards, and execute config-specific tasks: \ntraining runs with checkpointing (for SFT/RFT) and doc processing followed by query processing with \nonline aggregation (for RAG/context eng. evals). Both also handle progress reporting.\nCurrently expects any given model for given batch size to fit on a single GPU (for self-hosted models).\nLikewise, currently expects OpenAI API key provided to have sufficient balance for given evals workload.\n\n### Experiment\n\nManages the complete experiment lifecycle, including creation, naming conventions, and cleanup. \nExperiments are automatically named with unique suffixes if conflicts exist, \nand all experiment metadata is tracked in the Database. \nAn experiment's running tasks are automatically cancelled when the process ends abruptly.\n\n### Dashboard\n\nA fork of MLflow that enables full tracking and visualization of all experiments and runs for `run_fit`. \nIt features a new panel for Interactive Control Ops that can be performed on any active runs.\nFor `run_evals` the metrics are displayed in an auot-updated table on the notebook itself, \nwhile IC Ops panel also appears on the notebook itself.\n\n## Developing with RapidFire AI\n\n### Development prerequisites\n#### TODO: This section needs updating\n\n- Python 3.12.x\n- Git\n- Ubuntu/Debian system (for apt package manager)\n\n```bash\n# Run these commands one after the other on a fresh Ubuntu machine\n\n# install dependencies\nsudo apt update -y\n\n# clone the repository\ngit clone https://github.com/RapidFireAI/rapidfireai.git\n\n# navigate to the repository\ncd ./rapidfireai\n\n# install basic dependencies\nsudo apt install -y python3.12-venv\npython3 -m venv .venv\nsource .venv/bin/activate\npip3 install ipykernel\npip3 install jupyter\npip3 install \"huggingface-hub[cli]\"\nexport PATH=\"$HOME/.local/bin:$PATH\"\nhf auth login --token \u003cyour_token\u003e\n\n# Due to current issue: https://github.com/huggingface/xet-core/issues/527\npip uninstall -y hf-xet\n\n# checkout the main branch\ngit checkout main\n\n# install the repository as a python package\npip3 install -r requirements.txt\n\n# install node\ncurl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash - \u0026\u0026 sudo apt-get install -y nodejs\n\n# Install correct version of vllm and flash-attn\n# uv pip install vllm=0.10.1.1 --torch-backend=cu126 or cu118\n# uv pip install flash-attn==1.0.9 --no-build-isoloation or 2.8.3\n\n# if running into node versioning errors, remove the previous version of node then run the lines above again\nsudo apt-get remove --purge nodejs libnode-dev libnode72 npm\nsudo apt autoremove --purge\n\n# check installations\nnode -v # 22.x\n\n# still inside venv, run the start script to begin all 3 servers\nchmod +x ./rapidfireai/start_dev.sh\n./rapidfireai/start_dev.sh start\n\n# run the notebook from within your IDE\n# make sure the notebook is running in the .venv virtual environment\n# head to settings in Cursor/VSCode and search for venv and add the path - $HOME/rapidfireai/.venv\n# we cannot run a Jupyter notebook directly since there are restrictions on Jupyter being able to create child processes\n\n# VSCode can port-forward localhost:8853 where the rf-frontend server will be running\n\n# for port clash issues -\nlsof -t -i:8850 | xargs kill -9 # jupyter server\nlsof -t -i:8851 | xargs kill -9 # dispatcher\nlsof -t -i:8852 | xargs kill -9 # mlflow\nlsof -t -i:8853 | xargs kill -9 # frontend\nlsof -t -i:8855 | xargs kill -9 # ray console\n```\n\n## RapidFireAI Environment Variables\n\nRapidFire AI has sane defaults for most installations, if customization is needed the following operating system variables can be\nused to overwrite the defaults.\n\n- `RF_HOME` - Base RapidFire AI home directory (default: ${HOME}/rapidfireai on Non-Google Colab and /content/rapidfireai on Google Colab)\n- `RF_LOG_PATH` - Base directory to store log files (default: ${RF_HOME}/logs)\n- `RF_EXPERIMENT_PATH` - Base directory to store experiment work files (default: ${RF_HOME}/rapidfire_experiments)\n- `RF_TENSORBOARD_LOG_DIR` - Base directory for TensorBoard logs (default: ${RF_EXPERIMENT_PATH}/tensorboard_logs))\n- `RF_LOG_FILENAME` - Default log file name (default: rapidfire.log)\n- `RF_TRAINING_LOG_FILENAME` - Default training log file name (default: training.log)\n- `RF_DB_PATH` - Base directory for database files (default: ${RF_HOME}/db)\n- `RF_MLFLOW_ENABLED` - Enable MLflow tracking backend\n- `RF_TENSORBOARD_ENABLED` - Enable TensorBoard tracking backend\n- `RF_TRACKIO_ENABLED` - Enable Trackio tracking backend\n- `RF_COLAB_MODE` - Whether running on colab (default: false on Non-Google Colab and true on Google Colab)\n- `RF_TUTORIAL_PATH` - Location that `rapidfireai init` copies `tutorial_notebooks` to (default: ./tutorial_notebooks)\n- `RF_TEST_PATH` - Location that `rapidfireai --test-notebooks` copies test notebooks to (default: ./tutorial_notebooks/tests)\n- `RF_JUPYTER_HOST` - Host that `rapidfireai jupyter` creates a Jupyter listener for (default: 0.0.0.0)\n- `RF_JUPYTER_PORT` - Port that `rapidfireai jupyter` creates a Jupyter listener for (default: 8850)\n- `RF_API_HOST` - Host that `rapidfireai start` or Experiment creates an API listener for (default: 0.0.0.0)\n- `RF_API_PORT` - Port that `rapidfireai start` or Experiment creates an API listener for (default: 8851)\n- `RF_MLFLOW_HOST` - Host that `rapidfireai start` creates a MLflow listener for (default: 0.0.0.0)\n- `RF_MLFLOW_PORT` - Port that `rapidfireai start` creates a MLflow listener for (default: 8852)\n- `RF_FRONTEND_HOST` - Host that `rapidfireai start` creates a Frontend listener for (default: 0.0.0.0)\n- `RF_FRONTEND_PORT` - Port that `rapidfireai start` creates a Frontend listener for (default: 8853)\n- `RF_RAY_HOST` - Host that Experiment creates a Ray dashboard listener for (default: 0.0.0.0)\n- `RF_RAY_PORT` - Port that Experiment creates a Ray dashboard listener for (default: 8855)\n- `RF_TIMEOUT_TIME` - Time in seconds that services wait to start (default: 30)\n- `RF_PID_FILE` - File to store process ids of started services (default: ${RF_HOME}/rapidfire_pids.txt)\n- `RF_PYTHON_EXECUTABLE` - Python executable (default: python3 falls back to python if not found)\n- `RF_PIP_EXECUTABLE` - pip executable (default: pip3 falls back to pip if not found)\n- `RF_CONVERGE_MODE` - Whether to use Rapidfire AI Converge frontend and backend if available (default: all)\n- `RF_NO_FRONTEND` - Option to disable starting the frontend\n\n## Community \u0026 Governance\n\n- Docs: [oss-docs.rapidfire.ai](https://oss-docs.rapidfire.ai)\n- Discord: [Join our Discord](https://discord.gg/6vSTtncKNN)\n- Contributing: [`CONTRIBUTING.md`](CONTRIBUTING.md)\n- License: [`LICENSE`](LICENSE)\n- Issues: use GitHub Issues for bug reports and feature requests\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frapidfireai%2Frapidfireai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frapidfireai%2Frapidfireai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frapidfireai%2Frapidfireai/lists"}