{"id":50901205,"url":"https://github.com/roblox/fai-rl","last_synced_at":"2026-06-16T03:00:48.973Z","repository":{"id":322901295,"uuid":"1066696086","full_name":"Roblox/FAI-RL","owner":"Roblox","description":"Foundation AI - Reinforcement Learning Library","archived":false,"fork":false,"pushed_at":"2026-05-26T18:12:52.000Z","size":673,"stargazers_count":8,"open_issues_count":2,"forks_count":4,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-26T20:10:01.222Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Roblox.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-29T20:51:58.000Z","updated_at":"2026-05-26T18:11:40.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Roblox/FAI-RL","commit_stats":null,"previous_names":["roblox/fai-rl"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/Roblox/FAI-RL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Roblox%2FFAI-RL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Roblox%2FFAI-RL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Roblox%2FFAI-RL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Roblox%2FFAI-RL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Roblox","download_url":"https://codeload.github.com/Roblox/FAI-RL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Roblox%2FFAI-RL/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34388669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-16T02:00:06.860Z","response_time":126,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-16T03:00:36.780Z","updated_at":"2026-06-16T03:00:48.953Z","avatar_url":"https://github.com/Roblox.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FAI-RL: Foundation AI - Reinforcement Learning Library\n\n\u003cdiv align=\"center\" style=\"line-height: 1;\"\u003e\n  \u003ca href=\"https://www.apache.org/licenses/LICENSE-2.0\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-Apache_2.0-green\" alt=\"License\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\nA production-ready framework for training, inference, evaluation using advanced reinforcement learning techniques. Built for researchers and practitioners who need a flexible, scalable solution for LLM fine-tuning.\n\n## Overview\n\nFAI-RL provides a unified, extensible framework for fine-tuning language models with the state-of-the-art algorithms:\n\n- 🎯 **Supports Multiple RL Algorithms**: DPO, PPO, GRPO, GSPO implementations as well as support for Supervised Fine-Tuning and Continuous Pre-Training.\n- 🚀 **Production Ready**: Validated on AWS p4d instances with 8x A100 GPUs\n- 📦 **Simple Configuration**: YAML-based configs with CLI override support\n- ⚡ **Memory Efficient**: Full support for LoRA, QLoRA, and DeepSpeed ZeRO-3\n- 🔧 **Highly Extensible**: Custom reward functions, dataset templates, and API integrations\n\n## Table of Contents\n\n- [Installation](#-installation)\n- [Authentication \u0026 Setup](#-authentication--setup)\n- [Quick Start](#-quick-start)\n  - [Training](#training)\n  - [Inference](#inference)\n  - [Evaluation](#evaluation)\n- [Supported Methods](#supported-methods)\n- [Key Features](#key-features)\n- [Project Structure](#-project-structure)\n- [S3 Checkpoint Upload](#-s3-checkpoint-upload)\n- [Memory Optimization](#memory-optimization)\n- [System Requirements](#-system-requirements)\n- [License](#-license)\n\n## 📦 Installation\n\n### Install the Package\n\nWe assume the user environment already has the necessary ML libraries\ninstalled (notably `torch`, with a CUDA build matching the host).\n\n```bash\nuv pip install FAI-RL\n```\n\n### Clone the Repository for Configuration Recipes\n\n```bash\ngit clone https://github.com/Roblox/FAI-RL.git\ncd FAI-RL\n```\n\n\u003e **Package**: [https://pypi.org/project/FAI-RL/](https://pypi.org/project/FAI-RL/)\n\n## 🔑 Authentication \u0026 Setup\n\nBefore training or using models, you'll need to authenticate with HuggingFace and optionally set up experiment tracking with Weights \u0026 Biases.\n\n### HuggingFace Authentication\n\nLogin to HuggingFace to access models and datasets:\n\n```bash\nhuggingface-cli login\n```\n\nYou'll be prompted to enter your HuggingFace access token. You can create a token at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).\n\n**What this enables:**\n- Access gated models (if you have permission)\n\n\n### Weights \u0026 Biases (Optional)\n\nLogin to Weights \u0026 Biases for experiment tracking and visualization:\n\n```bash\nwandb login\n```\n\nYou'll be prompted to enter your W\u0026B API key. Get your API key at [https://wandb.ai/authorize](https://wandb.ai/authorize).\n\nFor self-hosted or private W\u0026B deployments, set `WANDB_BASE_URL` before training:\n\n```bash\nexport WANDB_BASE_URL=\"https://your-wandb-instance.com\"\n```\n\nThe default value (`https://api.wandb.ai`) points to the public W\u0026B cloud. This can also be set directly in the recipe under `wandb.base_url`.\n\n\u003e **Note**: W\u0026B integration is optional. If not logged in, training will proceed without experiment tracking.\n\n## 🚀 Quick Start\n\n### Training\n\nTrain a model using any of the supported algorithms (CPT, SFT, DPO, PPO, GRPO, GSPO):\n\n```bash\n# Single GPU training with LoRA\nfai-rl-train --recipe recipes/training/sft/llama3_3B_lora.yaml --num-gpus 1\n\n# Multi-GPU training with DeepSpeed\nfai-rl-train --recipe recipes/training/dpo/llama3_3B_lora.yaml --num-gpus 8\n\n# Override parameters from CLI\nfai-rl-train --recipe recipes/training/sft/llama3_3B_lora.yaml --num-gpus 4 \\\n  training.learning_rate=5e-5 \\\n  training.num_train_epochs=3\n```\n\n📖 **[Complete Training Guide →](./trainers/README.md)**\n\n### Inference\n\nGenerate text completions from trained or base models:\n\n```bash\n# Run inference on a trained model\nfai-rl-inference --recipe recipes/inference/llama3_3B.yaml\n\n# Use debug mode for detailed logging\nfai-rl-inference --recipe recipes/inference/llama3_3B.yaml --debug\n```\n\n📖 **[Complete Inference Guide →](./inference/README.md)**\n\n### Evaluation\n\nEvaluate model performance on academic benchmarks (MMLU, GSM8K):\n\n```bash\n# Evaluate on MMLU benchmark\nfai-rl-eval --recipe recipes/evaluation/mmlu/llama3_3B.yaml --debug\n```\n\n📖 **[Complete Evaluation Guide →](./evaluations/README.md)**\n\n## Supported Algorithms\n\nFAI-RL supports six training algorithms for language model fine-tuning:\n\n| Algorithm | Full Name | Description | Best For |\n|-----------|-----------|-------------|----------|\n| **CPT** | Continuous Pre-Training | Next-token prediction on raw text; no chat template | Domain adaptation, corpus ingestion |\n| **SFT** | Supervised Fine-Tuning | Direct supervised learning from labeled examples | Instruction fine-tuning and foundational model fine-tuning |\n| **DPO** | Direct Preference Optimization | Alignment via preference learning without explicit reward models | Human preference alignment, chat model training |\n| **PPO** | Proximal Policy Optimization | Policy gradient method with value function and reward model | Complex reward functions, multi-objective optimization |\n| **GRPO** | Group Relative Policy Optimization | Efficient preference learning with group-based comparison | Reasoning tasks, competitive response generation |\n| **GSPO** | Group Sequence Policy Optimization | Advanced sequence-level policy optimization | Complex multi-step reasoning, mathematical problem-solving |\n\n### Training Configurations\n\nAll algorithms support three efficiency modes:\n\n| Mode | Memory Usage | Training Speed | Best For |\n|------|-------------|---------------|----------|\n| **Full Fine-tuning** | High (baseline) | Fastest | Small models (\u003c3B params), maximum performance |\n| **LoRA** | Low (~10% of full) | Fast | Most use cases, balanced efficiency |\n| **QLoRA** | Very Low (~3-4GB for 7B model) | Moderate | Large models on consumer GPUs |\n\nAdditional features supported across all algorithms:\n- ✅ Multi-GPU training with DeepSpeed ZeRO-3\n- ✅ Gradient checkpointing for memory efficiency\n- ✅ Custom reward functions and dataset templates\n- ✅ Weights \u0026 Biases integration for experiment tracking\n- ✅ Automatic S3 checkpoint upload (supports S3-compatible stores)\n\n## Key Features\n\n### 🎯 Flexible Configuration System\n- **YAML-based recipes** with comprehensive inline documentation for all parameters\n- **CLI overrides** for runtime parameter changes without editing files\n- **Pre-configured templates** for popular models (Llama 3, Qwen 3, etc.)\n- **Easy experimentation** with hyperparameter tuning\n\n### 🔧 Extensible Architecture\n\n**Custom Reward Functions:**\n- `exact_match_reward_func` - Accuracy-based rewards for verifiable tasks\n- `structured_xml_reward_func` - Format-based rewards for structured outputs\n- Easy to add your custom reward function\n\n**Dataset Templates:**\n- `GSM8KTemplate` - Math problem formatting with chain-of-thought\n- `OpenMathInstructTemplate` - Mathematical instruction formatting\n\n**Pluggable Components:**\n- Extensible trainer base classes for new algorithms\n- HuggingFace Transformers and TRL integration\n- Custom dataset processing pipelines\n\n### 🌐 Multi-Provider API Support\n\nNative support for commercial LLM APIs with automatic provider detection for inference and evaluation:\n\n**Supported Providers:**\n- 🤖 **OpenAI** (GPT-5, GPT-4.5, GPT-4.1, etc.)\n- 🧠 **Google** (Gemini Pro, Gemini Flash)\n- 💬 **Anthropic** (Claude 4.5 Sonnet, Opus, etc.)\n- 🏠 **Hosted LLM** (self-hosted or custom endpoints)\n\n**Configuration Example:**\n\n```yaml\n# OpenAI ChatGPT - provider detected from endpoint URL\ninference:\n  api_endpoint: \"https://api.openai.com/v1/chat/completions\"\n  api_key: \"sk-...\"\n  model: \"gpt-4.1\"  # Just the model name, no prefix needed!\n\n# Google Gemini - provider detected from endpoint URL\ninference:\n  api_endpoint: \"https://generativelanguage.googleapis.com/v1/models/gemini-pro:generateContent\"\n  api_key: \"AIza...\"\n  model: \"gemini-2.5-pro\"\n\n# Anthropic Claude - provider detected from endpoint URL\ninference:\n  api_endpoint: \"https://api.anthropic.com/v1/messages\"\n  api_key: \"sk-ant-...\"\n  model: \"claude-sonnet-4-5-20250929\"\n\n# Hosted LLM - any custom or self-hosted model endpoint\ninference:\n  api_endpoint: \"https://your-hosted-endpoint.com/v1/chat\"\n  api_key: \"your-api-key\"\n  model: \"your-model-name\"\n```\n\n**Customization for Custom APIs:**\n\nIf your hosted LLM uses a non-OpenAI format, customize `utils/hosted_llm_config.py`:\n- `build_hosted_llm_request()` - Modify request payload format\n- `parse_hosted_llm_response()` - Customize response parsing\n- `build_hosted_llm_headers()` - Adjust authentication headers\n\nEach function includes detailed examples and inline documentation.\n\n\n## 📁 Project Structure\n\n```\nFAI-RL/\n├── core/                      # Core framework components\n├── trainers/                  # Algorithm implementations\n│   ├── rewards/               # Custom reward functions\n│   │   ├── accuracy_rewards.py\n│   │   └── format_rewards.py\n│   └── templates/             # Dataset formatting templates\n│       ├── gsm8k_template.py\n│       └── openmathinstruct_template.py\n├── inference/                 # Inference system\n├── evaluations/               # Evaluation system\n│   └── eval_datasets/         # Dataset-specific evaluation logic\n│       ├── mmlu.py\n│       └── gsm8k.py\n├── recipes/                   # YAML configuration files\n│   ├── training/              # Training recipes (cpt/, sft/, dpo/, ppo/, grpo/, gspo/)\n│   ├── inference/             # Inference recipes\n│   └── evaluation/            # Evaluation recipes (mmlu/, gsm8k/)\n├── configs/                   # DeepSpeed configurations\n│   └── deepspeed/             # ZeRO-3 configs for 1/2/4/8 GPUs\n├── utils/                     # Shared utilities\n│   ├── s3_utils.py            # S3 checkpoint upload callback\n│   └── hosted_llm_config.py   # Custom API endpoint configuration\n└── [auto-generated]\n    ├── models/                # Trained model checkpoints\n    ├── outputs/               # Inference and evaluation results\n    └── logs/                  # Training logs\n```\n\n## ☁️ S3 Checkpoint Upload\n\nFAI-RL can automatically upload checkpoints and the final fine-tuned model to Amazon S3 (or any S3-compatible store such as MinIO). Uploads run in background threads so they never block training.\n\n### Prerequisites\n\nConfigure AWS credentials using any standard method (environment variables, `~/.aws/credentials`, IAM role, etc.):\n\n```bash\n# Option 1: Environment variables\nexport AWS_ACCESS_KEY_ID=\"...\"\nexport AWS_SECRET_ACCESS_KEY=\"...\"\nexport AWS_DEFAULT_REGION=\"us-east-1\"\n\n# Option 2: AWS CLI\naws configure\n```\n\n### Configuration\n\nAdd an `s3` section to your training recipe YAML:\n\n```yaml\ns3:\n  enabled: true                                          # Enable S3 upload\n  bucket: \"your-s3-bucket\"                               # S3 bucket name\n  prefix: \"your-s3-prefix\"                               # Key prefix (folder path inside bucket)\n  region: null                                           # AWS region (null = use default)\n  endpoint_url: null                                     # Custom S3-compatible endpoint (e.g. MinIO)\n  upload_checkpoints: true                               # Upload intermediate checkpoints (at every save_steps)\n  upload_final_model: true                               # Upload the final model at end of training\n  delete_local_after_upload: false                       # Delete local files after successful upload\n```\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `enabled` | bool | `false` | Master switch for the S3 upload feature |\n| `bucket` | string | `\"\"` | Target S3 bucket name (required when enabled) |\n| `prefix` | string | `\"\"` | Key prefix under which all uploads are stored |\n| `region` | string | `null` | AWS region; falls back to `AWS_DEFAULT_REGION` or boto3 default |\n| `endpoint_url` | string | `null` | Custom endpoint for S3-compatible stores (e.g. `http://minio:9000`) |\n| `upload_checkpoints` | bool | `true` | Upload each intermediate checkpoint saved at `save_steps` intervals |\n| `upload_final_model` | bool | `true` | Upload the final model directory at the end of training |\n| `delete_local_after_upload` | bool | `false` | Remove local checkpoint directory after a successful upload |\n\n### How It Works\n\n1. **Intermediate checkpoints** -- When the trainer saves a checkpoint (every `training.save_steps` steps), the S3 callback uploads the entire checkpoint directory to `s3://\u003cbucket\u003e/\u003cprefix\u003e/checkpoint-\u003cstep\u003e/` in a background thread.\n2. **Final model** -- At the end of training, the output directory is uploaded to `s3://\u003cbucket\u003e/\u003cprefix\u003e/final/`.\n3. **Non-blocking** -- All uploads happen on daemon threads. Training continues while files are being transferred. At the end of training, the callback waits for any remaining uploads to finish before the process exits.\n\n### S3 Upload Structure\n\nGiven the example config above, the resulting S3 layout would be:\n\n```\ns3://your-s3-bucket/\n└── checkpoints/qwen3-4B-inst-dpo-lora-150k/\n    ├── checkpoint-100/\n    │   ├── adapter_config.json\n    │   ├── adapter_model.safetensors\n    │   └── ...\n    ├── checkpoint-200/\n    │   └── ...\n    └── final/\n        ├── adapter_config.json\n        ├── adapter_model.safetensors\n        └── ...\n```\n\n## Memory Optimization\n\nFAI-RL provides multiple techniques for efficient training of large models on limited hardware:\n\n### Optimization Techniques\n\n| Technique | Memory Savings | Speed Impact | Configuration |\n|-----------|---------------|--------------|---------------|\n| **LoRA** | ~90% reduction | Minimal | `use_lora: true` + LoRA params |\n| **QLoRA** | ~95% reduction | Moderate | `load_in_4bit: true` + LoRA params |\n| **8-bit Quantization** | ~50% reduction | Minimal | `load_in_8bit: true` |\n| **Gradient Checkpointing** | ~30-50% reduction | 20% slower | `gradient_checkpointing: true` |\n| **DeepSpeed ZeRO-3** | Distributed across GPUs | Varies | Auto-enabled for multi-GPU |\n\n\n### Optimization Strategy\n\n1. **Start with QLoRA** if GPU memory is limited (\u003c16GB)\n2. **Use LoRA** for balanced efficiency on mid-range GPUs (16-40GB)\n3. **Full fine-tuning** only for small models or high-end GPUs (80GB+)\n4. **Enable gradient checkpointing** if still encountering OOM errors\n5. **Use DeepSpeed ZeRO-3** for multi-GPU setups to distribute memory load\n\n## 🧪 System Requirements\n\n### Validated on Hardware\n\nThis framework has been validated on:\n\n* **Instance:** AWS EC2 p4d.24xlarge\n* **GPUs:** 8 x NVIDIA A100-SXM4-80GB (80GB VRAM each)\n* **CPU:** 96 vCPUs\n* **Memory:** 1152 GiB\n* **Storage:** 8TB NVMe SSD\n* **Network:** 400 Gbps\n\n## 📄 License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n\n## For Maintainers\n\n\u003cdetails\u003e\n\u003csummary\u003ePublishing a New Release\u003c/summary\u003e\n\n1. **Update version** in `pyproject.toml`:\n```toml\n[project]\nname = \"FAI-RL\"\nversion = \"X.Y.Z\"  # Increment version\n```\n\n2. **Build and publish**:\n```bash\n# Install build tools\npip install --upgrade pip build twine\n\n# Clean previous builds\nrm -rf dist/ build/ *.egg-info\n\n# Build the package\npython -m build\n\n# Upload to PyPI (requires credentials)\npython -m twine upload dist/*\n\n# Or upload to test PyPi (requires credentials)\npython -m twine upload --repository testpypi dist/*\n```\n\n\u003c/details\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froblox%2Ffai-rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froblox%2Ffai-rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froblox%2Ffai-rl/lists"}