{"id":16197051,"url":"https://github.com/microsoft/RD-Agent","last_synced_at":"2025-10-24T16:31:16.078Z","repository":{"id":252383252,"uuid":"781261349","full_name":"microsoft/RD-Agent","owner":"microsoft","description":"Research and development (R\u0026D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R\u0026D are mainly focused on data and models. We are committed to automating these high-value generic R\u0026D processes through our open source R\u0026D automation tool RD-Agent, which lets AI drive data-driven AI.","archived":false,"fork":false,"pushed_at":"2025-02-07T14:00:47.000Z","size":30327,"stargazers_count":1476,"open_issues_count":29,"forks_count":134,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-02-07T17:17:04.813Z","etag":null,"topics":["agent","ai","automation","data-mining","data-science","development","llm","research"],"latest_commit_sha":null,"homepage":"https://rdagent.azurewebsites.net/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["MIIC-finance"]}},"created_at":"2024-04-03T03:39:33.000Z","updated_at":"2025-02-07T14:00:50.000Z","dependencies_parsed_at":"2024-08-26T03:21:07.506Z","dependency_job_id":"962df2ee-7ac9-442c-8ea3-cb049429e2ba","html_url":"https://github.com/microsoft/RD-Agent","commit_stats":{"total_commits":401,"total_committers":20,"mean_commits":20.05,"dds":0.8104738154613467,"last_synced_commit":"af6af116edd69a6e3cff15f771173b76be8395ff"},"previous_names":["microsoft/rd-agent"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRD-Agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRD-Agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRD-Agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRD-Agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/RD-Agent/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237999676,"owners_count":19399920,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","ai","automation","data-mining","data-science","development","llm","research"],"created_at":"2024-10-10T09:02:10.030Z","updated_at":"2025-10-24T16:31:16.065Z","avatar_url":"https://github.com/microsoft.png","language":"Python","readme":"\u003ch4 align=\"center\"\u003e\n  \u003cimg src=\"docs/_static/logo.png\" alt=\"RA-Agent logo\" style=\"width:70%; \"\u003e\n  \n  \u003ca href=\"https://rdagent.azurewebsites.net\" target=\"_blank\"\u003e🖥️ Live Demo\u003c/a\u003e |\n  \u003ca href=\"https://rdagent.azurewebsites.net/factor_loop\" target=\"_blank\"\u003e🎥 Demo Video\u003c/a\u003e \u003ca href=\"https://www.youtube.com/watch?v=JJ4JYO3HscM\u0026list=PLALmKB0_N3_i52fhUmPQiL4jsO354uopR\" target=\"_blank\"\u003e▶️YouTube\u003c/a\u003e   |\n  \u003ca href=\"https://rdagent.readthedocs.io/en/latest/index.html\" target=\"_blank\"\u003e📖 Documentation\u003c/a\u003e |\n  \u003ca href=\"https://aka.ms/RD-Agent-Tech-Report\" target=\"_blank\"\u003e📄 Tech Report\u003c/a\u003e |\n  \u003ca href=\"#-paperwork-list\"\u003e 📃 Papers \u003c/a\u003e\n\u003c/h3\u003e\n\n\n[![CI](https://github.com/microsoft/RD-Agent/actions/workflows/ci.yml/badge.svg)](https://github.com/microsoft/RD-Agent/actions/workflows/ci.yml)\n[![CodeQL](https://github.com/microsoft/RD-Agent/actions/workflows/github-code-scanning/codeql/badge.svg)](https://github.com/microsoft/RD-Agent/actions/workflows/github-code-scanning/codeql)\n[![Dependabot Updates](https://github.com/microsoft/RD-Agent/actions/workflows/dependabot/dependabot-updates/badge.svg)](https://github.com/microsoft/RD-Agent/actions/workflows/dependabot/dependabot-updates)\n[![Lint PR Title](https://github.com/microsoft/RD-Agent/actions/workflows/pr.yml/badge.svg)](https://github.com/microsoft/RD-Agent/actions/workflows/pr.yml)\n[![Release.yml](https://github.com/microsoft/RD-Agent/actions/workflows/release.yml/badge.svg)](https://github.com/microsoft/RD-Agent/actions/workflows/release.yml)\n[![Platform](https://img.shields.io/badge/platform-Linux-blue)](https://pypi.org/project/rdagent/#files)\n[![PyPI](https://img.shields.io/pypi/v/rdagent)](https://pypi.org/project/rdagent/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/rdagent)](https://pypi.org/project/rdagent/)\n[![Release](https://img.shields.io/github/v/release/microsoft/RD-Agent)](https://github.com/microsoft/RD-Agent/releases)\n[![GitHub](https://img.shields.io/github/license/microsoft/RD-Agent)](https://github.com/microsoft/RD-Agent/blob/main/LICENSE)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)\n[![Checked with mypy](https://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![Chat](https://img.shields.io/badge/chat-discord-blue)](https://discord.gg/ybQ97B6Jjy)\n[![Documentation Status](https://readthedocs.org/projects/rdagent/badge/?version=latest)](https://rdagent.readthedocs.io/en/latest/?badge=latest)\n[![Readthedocs Preview](https://github.com/microsoft/RD-Agent/actions/workflows/readthedocs-preview.yml/badge.svg)](https://github.com/microsoft/RD-Agent/actions/workflows/readthedocs-preview.yml) \u003c!-- this badge is too long, please place it in the last one to make it pretty --\u003e \n[![arXiv](https://img.shields.io/badge/arXiv-2505.14738-00ff00.svg)](https://arxiv.org/abs/2505.14738)\n\n\n# 📰 News\n| 🗞️ News        | 📝 Description                 |\n| --            | ------      |\n| NeurIPS 2025 Acceptance | We are thrilled to announce that our paper [R\u0026D-Agent-Quant](https://arxiv.org/abs/2505.15155) has been accepted to NeurIPS 2025 | \n| [Technical Report Release](#overall-technical-report) | Overall framework description and results on MLE-bench | \n| [R\u0026D-Agent-Quant Release](#deep-application-in-diverse-scenarios) | Apply R\u0026D-Agent to quant trading | \n| MLE-Bench Results Released | R\u0026D-Agent currently leads as the [top-performing machine learning engineering agent](#-the-best-machine-learning-engineering-agent) on MLE-bench |\n| Support LiteLLM Backend | We now fully support **[LiteLLM](https://github.com/BerriAI/litellm)** as our default backend for integration with multiple LLM providers. |\n| General Data Science Agent | [Data Science Agent](https://rdagent.readthedocs.io/en/latest/scens/data_science.html) |\n| Kaggle Scenario release | We release **[Kaggle Agent](https://rdagent.readthedocs.io/en/latest/scens/data_science.html)**, try the new features!                  |\n| Official WeChat group release  | We created a WeChat group, welcome to join! (🗪[QR Code](https://github.com/microsoft/RD-Agent/issues/880)) |\n| Official Discord release  | We launch our first chatting channel in Discord (🗪[![Chat](https://img.shields.io/badge/chat-discord-blue)](https://discord.gg/ybQ97B6Jjy)) |\n| First release | **R\u0026D-Agent** is released on GitHub |\n\n\n\n# 🏆 The Best Machine Learning Engineering Agent!\n\n[MLE-bench](https://github.com/openai/mle-bench) is a comprehensive benchmark evaluating the performance of AI agents on machine learning engineering tasks. Utilizing datasets from 75 Kaggle competitions, MLE-bench provides robust assessments of AI systems' capabilities in real-world ML engineering scenarios.\n\nR\u0026D-Agent currently leads as the top-performing machine learning engineering agent on MLE-bench:\n\n| Agent | Low == Lite (%) | Medium (%) | High (%) | All (%) |\n|---------|--------|-----------|---------|----------|\n| R\u0026D-Agent o3(R)+GPT-4.1(D) | 51.52 ± 6.9 | 19.3 ± 5.5 | 26.67 ± 0 | 30.22 ± 1.5 |\n| R\u0026D-Agent o1-preview | 48.18 ± 2.49 | 8.95 ± 2.36 | 18.67 ± 2.98 | 22.4 ± 1.1 |\n| AIDE o1-preview | 34.3 ± 2.4 | 8.8 ± 1.1 | 10.0 ± 1.9 | 16.9 ± 1.1 |\n\n**Notes:**\n- **O3(R)+GPT-4.1(D)**: This version is designed to both reduce average time per loop and leverage a cost-effective combination of backend LLMs by seamlessly integrating Research Agent (o3) with Development Agent (GPT-4.1).\n- **AIDE o1-preview**: Represents the previously best public result on MLE-bench as reported in the original MLE-bench paper.\n- Average and standard deviation results for R\u0026D-Agent o1-preview is based on a independent of 5 seeds and for R\u0026D-Agent o3(R)+GPT-4.1(D) is based on 6 seeds.\n- According to MLE-Bench, the 75 competitions are categorized into three levels of complexity: **Low==Lite** if we estimate that an experienced ML engineer can produce a sensible solution in under 2 hours, excluding the time taken to train any models; **Medium** if it takes between 2 and 10 hours; and **High** if it takes more than 10 hours.\n\nYou can inspect the detailed runs of the above results online.\n- [R\u0026D-Agent o1-preview detailed runs](https://aka.ms/RD-Agent_MLE-Bench_O1-preview)\n- [R\u0026D-Agent o3(R)+GPT-4.1(D) detailed runs](https://aka.ms/RD-Agent_MLE-Bench_O3_GPT41)\n\nFor running R\u0026D-Agent on MLE-bench, refer to **[MLE-bench Guide: Running ML Engineering via MLE-bench](https://rdagent.readthedocs.io/en/latest/scens/data_science.html)**\n\n# 🥇 The First Data-Centric Quant Multi-Agent Framework!\n\nR\u0026D-Agent for Quantitative Finance, in short **RD-Agent(Q)**, is the first data-centric, multi-agent framework designed to automate the full-stack research and development of quantitative strategies via coordinated factor-model co-optimization.\n\n![image](https://github.com/user-attachments/assets/3198bc10-47ba-4ee0-8a8e-46d5ce44f45d)\n\nExtensive experiments in real stock markets show that, at a cost under $10, RD-Agent(Q) achieves approximately 2× higher ARR than benchmark factor libraries while using over 70% fewer factors. It also surpasses state-of-the-art deep time-series models under smaller resource budgets. Its alternating factor–model optimization further delivers excellent trade-off between predictive accuracy and strategy robustness.\n\nYou can learn more details about **RD-Agent(Q)** through the [paper](https://arxiv.org/abs/2505.15155) and reproduce it through the [documentation](https://rdagent.readthedocs.io/en/latest/scens/quant_agent_fin.html).\n\n# Data Science Agent Preview\nCheck out our demo video showcasing the current progress of our Data Science Agent under development:\n\nhttps://github.com/user-attachments/assets/3eccbecb-34a4-4c81-bce4-d3f8862f7305\n\n# 🌟 Introduction\n\u003cdiv align=\"center\"\u003e\n      \u003cimg src=\"docs/_static/scen.png\" alt=\"Our focused scenario\" style=\"width:80%; \"\u003e\n\u003c/div\u003e\n\nR\u0026D-Agent aims to automate the most critical and valuable aspects of the industrial R\u0026D process, and we begin with focusing on the data-driven scenarios to streamline the development of models and data. \nMethodologically, we have identified a framework with two key components: 'R' for proposing new ideas and 'D' for implementing them.\nWe believe that the automatic evolution of R\u0026D will lead to solutions of significant industrial value.\n\n\n\u003c!-- Tag Cloud --\u003e\nR\u0026D is a very general scenario. The advent of R\u0026D-Agent can be your\n- 💰 **Automatic Quant Factory** ([🎥Demo Video](https://rdagent.azurewebsites.net/factor_loop)|[▶️YouTube](https://www.youtube.com/watch?v=X4DK2QZKaKY\u0026t=6s))\n- 🤖 **Data Mining Agent:** Iteratively proposing data \u0026 models ([🎥Demo Video 1](https://rdagent.azurewebsites.net/model_loop)|[▶️YouTube](https://www.youtube.com/watch?v=dm0dWL49Bc0\u0026t=104s)) ([🎥Demo Video 2](https://rdagent.azurewebsites.net/dmm)|[▶️YouTube](https://www.youtube.com/watch?v=VIaSTZuoZg4))  and implementing them by gaining knowledge from data.\n- 🦾 **Research Copilot:** Auto read research papers ([🎥Demo Video](https://rdagent.azurewebsites.net/report_model)|[▶️YouTube](https://www.youtube.com/watch?v=BiA2SfdKQ7o)) / financial reports ([🎥Demo Video](https://rdagent.azurewebsites.net/report_factor)|[▶️YouTube](https://www.youtube.com/watch?v=ECLTXVcSx-c)) and implement model structures or building datasets.\n- 🤖 **Kaggle Agent:** Auto Model Tuning and Feature Engineering([🎥Demo Video Coming Soon...]()) and implementing them to achieve more in competitions.\n- ...\n\nYou can click the links above to view the demo. We're continuously adding more methods and scenarios to the project to enhance your R\u0026D processes and boost productivity. \n\nAdditionally, you can take a closer look at the examples in our **[🖥️ Live Demo](https://rdagent.azurewebsites.net/)**.\n\n\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://rdagent.azurewebsites.net/\" target=\"_blank\"\u003e\n        \u003cimg src=\"docs/_static/demo.png\" alt=\"Watch the demo\" width=\"80%\"\u003e\n    \u003c/a\u003e\n\u003c/div\u003e\n\n\n# ⚡ Quick start\n\n### RD-Agent currently only supports Linux.\n\nYou can try above demos by running the following command:\n\n### 🐳 Docker installation.\nUsers must ensure Docker is installed before attempting most scenarios. Please refer to the [official 🐳Docker page](https://docs.docker.com/engine/install/) for installation instructions.\nEnsure the current user can run Docker commands **without using sudo**. You can verify this by executing `docker run hello-world`.\n\n### 🐍 Create a Conda Environment\n- Create a new conda environment with Python (3.10 and 3.11 are well-tested in our CI):\n  ```sh\n  conda create -n rdagent python=3.10\n  ```\n- Activate the environment:\n  ```sh\n  conda activate rdagent\n  ```\n\n### 🛠️ Install the R\u0026D-Agent\n\n#### For Users\n- You can directly install the R\u0026D-Agent package from PyPI:\n  ```sh\n  pip install rdagent\n  ```\n\n#### For Developers\n- If you want to try the latest version or contribute to RD-Agent, you can install it from the source and follow the development setup:\n  ```sh\n  git clone https://github.com/microsoft/RD-Agent\n  cd RD-Agent\n  make dev\n  ```\n\nMore details can be found in the [development setup](https://rdagent.readthedocs.io/en/latest/development.html).\n\n### 💊 Health check\n- rdagent provides a health check that currently checks two things.\n  - whether the docker installation was successful.\n  - whether the default port used by the [rdagent ui](https://github.com/microsoft/RD-Agent?tab=readme-ov-file#%EF%B8%8F-monitor-the-application-results) is occupied.\n  ```sh\n  rdagent health_check --no-check-env\n  ```\n\n\n### ⚙️ Configuration\n- The demos requires following ability:\n  - ChatCompletion\n  - json_mode\n  - embedding query\n\n  You can set your Chat Model and Embedding Model in the following ways:\n\n  \u003e **🔥 Attention**: We now provide experimental support for **DeepSeek** models! You can use DeepSeek's official API for cost-effective and high-performance inference. See the configuration example below for DeepSeek setup.\n\n- **Using LiteLLM (Default)**: We now support LiteLLM as a backend for integration with multiple LLM providers. You can configure in multiple ways:\n\n  **Option 1: Unified API base for both models**\n\n  *Configuration Example: `OpenAI` Setup :*\n\n  ```bash\n  cat \u003c\u003c EOF  \u003e .env\n  # Set to any model supported by LiteLLM.\n  CHAT_MODEL=gpt-4o \n  EMBEDDING_MODEL=text-embedding-3-small\n  # Configure unified API base\n  OPENAI_API_BASE=\u003cyour_unified_api_base\u003e\n  OPENAI_API_KEY=\u003creplace_with_your_openai_api_key\u003e\n  ```\n\n  *Configuration Example: `Azure OpenAI` Setup :*\n\n  \u003e Before using this configuration, please confirm in advance that your `Azure OpenAI API key` supports `embedded models`.\n\n  ```bash\n  cat \u003c\u003c EOF  \u003e .env\n  EMBEDDING_MODEL=azure/\u003cModel deployment supporting embedding\u003e\n  CHAT_MODEL=azure/\u003cyour deployment name\u003e\n  AZURE_API_KEY=\u003creplace_with_your_openai_api_key\u003e\n  AZURE_API_BASE=\u003cyour_unified_api_base\u003e\n  AZURE_API_VERSION=\u003cazure api version\u003e\n  ```\n\n  **Option 2: Separate API bases for Chat and Embedding models**\n  ```bash\n  cat \u003c\u003c EOF  \u003e .env\n  # Set to any model supported by LiteLLM.\n  # Configure separate API bases for chat and embedding\n  \n  # CHAT MODEL:\n  CHAT_MODEL=gpt-4o \n  OPENAI_API_BASE=\u003cyour_chat_api_base\u003e\n  OPENAI_API_KEY=\u003creplace_with_your_openai_api_key\u003e\n\n  # EMBEDDING MODEL:\n  # TAKE siliconflow as an example, you can use other providers.\n  # Note: embedding requires litellm_proxy prefix\n  EMBEDDING_MODEL=litellm_proxy/BAAI/bge-large-en-v1.5\n  LITELLM_PROXY_API_KEY=\u003creplace_with_your_siliconflow_api_key\u003e\n  LITELLM_PROXY_API_BASE=https://api.siliconflow.cn/v1\n  ```\n\n  *Configuration Example: `DeepSeek` Setup :*\n\n  \u003eSince many users encounter configuration errors when setting up DeepSeek. Here's a complete working example for DeepSeek Setup:\n  ```bash\n  cat \u003c\u003c EOF  \u003e .env\n  # CHAT MODEL: Using DeepSeek Official API\n  CHAT_MODEL=deepseek/deepseek-chat \n  DEEPSEEK_API_KEY=\u003creplace_with_your_deepseek_api_key\u003e\n\n  # EMBEDDING MODEL: Using SiliconFlow for embedding since deepseek has no embedding model.\n  # Note: embedding requires litellm_proxy prefix\n  EMBEDDING_MODEL=litellm_proxy/BAAI/bge-m3\n  LITELLM_PROXY_API_KEY=\u003creplace_with_your_siliconflow_api_key\u003e\n  LITELLM_PROXY_API_BASE=https://api.siliconflow.cn/v1\n  ```\n\n  Notice: If you are using reasoning models that include thought processes in their responses (such as \\\u003cthink\u003e tags), you need to set the following environment variable:\n  ```bash\n  REASONING_THINK_RM=True\n  ```\n\n  You can also use a deprecated backend if you only use `OpenAI API` or `Azure OpenAI` directly. For this deprecated setting and more configuration information, please refer to the [documentation](https://rdagent.readthedocs.io/en/latest/installation_and_configuration.html). \n\n\n\n- If your environment configuration is complete, please execute the following commands to check if your configuration is valid. This step is necessary.\n\n  ```bash\n  rdagent health_check\n  ```\n\n### 🚀 Run the Application\n\nThe **[🖥️ Live Demo](https://rdagent.azurewebsites.net/)** is implemented by the following commands(each item represents one demo, you can select the one you prefer):\n\n- Run the **Automated Quantitative Trading \u0026 Iterative Factors Model Joint Evolution**:  [Qlib](http://github.com/microsoft/qlib) self-loop factor \u0026 model proposal and implementation application\n  ```sh\n  rdagent fin_quant\n  ```\n\n- Run the **Automated Quantitative Trading \u0026 Iterative Factors Evolution**:  [Qlib](http://github.com/microsoft/qlib) self-loop factor proposal and implementation application\n  ```sh\n  rdagent fin_factor\n  ```\n\n- Run the **Automated Quantitative Trading \u0026 Iterative Model Evolution**: [Qlib](http://github.com/microsoft/qlib) self-loop model proposal and implementation application\n  ```sh\n  rdagent fin_model\n  ```\n\n- Run the **Automated Quantitative Trading \u0026 Factors Extraction from Financial Reports**:  Run the [Qlib](http://github.com/microsoft/qlib) factor extraction and implementation application based on financial reports\n  ```sh\n  # 1. Generally, you can run this scenario using the following command:\n  rdagent fin_factor_report --report-folder=\u003cYour financial reports folder path\u003e\n\n  # 2. Specifically, you need to prepare some financial reports first. You can follow this concrete example:\n  wget https://github.com/SunsetWolf/rdagent_resource/releases/download/reports/all_reports.zip\n  unzip all_reports.zip -d git_ignore_folder/reports\n  rdagent fin_factor_report --report-folder=git_ignore_folder/reports\n  ```\n\n- Run the **Automated Model Research \u0026 Development Copilot**: model extraction and implementation application\n  ```sh\n  # 1. Generally, you can run your own papers/reports with the following command:\n  rdagent general_model \u003cYour paper URL\u003e\n\n  # 2. Specifically, you can do it like this. For more details and additional paper examples, use `rdagent general_model -h`:\n  rdagent general_model  \"https://arxiv.org/pdf/2210.09789\"\n  ```\n\n- Run the **Automated Medical Prediction Model Evolution**: Medical self-loop model proposal and implementation application\n\n  ```bash\n  # Generally, you can run the data science program with the following command:\n  rdagent data_science --competition \u003cyour competition name\u003e\n\n  # Specifically, you need to create a folder for storing competition files (e.g., competition description file, competition datasets, etc.), and configure the path to the folder in your environment. In addition, you need to use chromedriver when you download the competition descriptors, which you can follow for this specific example:\n\n  # 1. Download the dataset, extract it to the target folder.\n  wget https://github.com/SunsetWolf/rdagent_resource/releases/download/ds_data/arf-12-hours-prediction-task.zip\n  unzip arf-12-hours-prediction-task.zip -d ./git_ignore_folder/ds_data/\n\n  # 2. Configure environment variables in the `.env` file\n  dotenv set DS_LOCAL_DATA_PATH \"$(pwd)/git_ignore_folder/ds_data\"\n  dotenv set DS_CODER_ON_WHOLE_PIPELINE True\n  dotenv set DS_IF_USING_MLE_DATA False\n  dotenv set DS_SAMPLE_DATA_BY_LLM False\n  dotenv set DS_SCEN rdagent.scenarios.data_science.scen.DataScienceScen\n\n  # 3. run the application\n  rdagent data_science --competition arf-12-hours-prediction-task\n  ```\n\n  **NOTE:** For more information about the dataset, please refer to the [documentation](https://rdagent.readthedocs.io/en/latest/scens/data_science.html).\n\n- Run the **Automated Kaggle Model Tuning \u0026 Feature Engineering**:  self-loop model proposal and feature engineering implementation application \u003cbr /\u003e\n  \u003e Using **tabular-playground-series-dec-2021** as an example. \u003cbr /\u003e\n  \u003e 1. Register and login on the [Kaggle](https://www.kaggle.com/) website. \u003cbr /\u003e\n  \u003e 2. Configuring the Kaggle API. \u003cbr /\u003e\n  \u003e (1) Click on the avatar (usually in the top right corner of the page) -\u003e `Settings` -\u003e `Create New Token`, A file called `kaggle.json` will be downloaded. \u003cbr /\u003e\n  \u003e (2) Move `kaggle.json` to `~/.config/kaggle/` \u003cbr /\u003e\n  \u003e (3) Modify the permissions of the kaggle.json file. Reference command: `chmod 600 ~/.config/kaggle/kaggle.json` \u003cbr /\u003e\n  \u003e 3. Join the competition: Click `Join the competition` -\u003e `I Understand and Accept` at the bottom of the [competition details page](https://www.kaggle.com/competitions/tabular-playground-series-dec-2021/data).\n  ```bash\n  # Generally, you can run the Kaggle competition program with the following command:\n  rdagent data_science --competition \u003cyour competition name\u003e\n\n  # 1. Configure environment variables in the `.env` file\n  mkdir -p ./git_ignore_folder/ds_data\n  dotenv set DS_LOCAL_DATA_PATH \"$(pwd)/git_ignore_folder/ds_data\"\n  dotenv set DS_CODER_ON_WHOLE_PIPELINE True\n  dotenv set DS_IF_USING_MLE_DATA True\n  dotenv set DS_SAMPLE_DATA_BY_LLM True\n  dotenv set DS_SCEN rdagent.scenarios.data_science.scen.KaggleScen\n\n  # 2. run the application\n  rdagent data_science --competition tabular-playground-series-dec-2021\n  ```\n\n### 🖥️ Monitor the Application Results\n- You can run the following command for our demo program to see the run logs.\n\n  ```sh\n  rdagent ui --port 19899 --log-dir \u003cyour log folder like \"log/\"\u003e --data-science\n  ```\n\n- About the `data_science` parameter: If you want to see the logs of the data science scenario, set the `data_science` parameter to `True`; otherwise set it to `False`.\n \n- Although port 19899 is not commonly used, but before you run this demo, you need to check if port 19899 is occupied. If it is, please change it to another port that is not occupied.\n\n  You can check if a port is occupied by running the following command.\n\n  ```sh\n  rdagent health_check --no-check-env --no-check-docker\n  ```\n\n# 🏭 Scenarios\n\nWe have applied R\u0026D-Agent to multiple valuable data-driven industrial scenarios.\n\n\n## 🎯 Goal: Agent for Data-driven R\u0026D\n\nIn this project, we are aiming to build an Agent to automate Data-Driven R\\\u0026D that can\n+ 📄 Read real-world material (reports, papers, etc.) and **extract** key formulas, descriptions of interested **features** and **models**, which are the key components of data-driven R\u0026D .\n+ 🛠️ **Implement** the extracted formulas (e.g., features, factors, and models) in runnable codes.\n   + Due to the limited ability of LLM in implementing at once, build an evolving process for the agent to improve performance by learning from feedback and knowledge.\n+ 💡 Propose **new ideas** based on current knowledge and observations.\n\n\u003c!-- ![Data-Centric R\u0026D Overview](docs/_static/overview.png) --\u003e\n\n## 📈 Scenarios/Demos\n\nIn the two key areas of data-driven scenarios, model implementation and data building, our system aims to serve two main roles: 🦾Copilot and 🤖Agent. \n- The 🦾Copilot follows human instructions to automate repetitive tasks. \n- The 🤖Agent, being more autonomous, actively proposes ideas for better results in the future.\n\nThe supported scenarios are listed below:\n\n| Scenario/Target | Model Implementation                   | Data Building                                                                      |\n| --              | --                                     | --                                                                                 |\n| **💹 Finance**      | 🤖 [Iteratively Proposing Ideas \u0026 Evolving](https://rdagent.azurewebsites.net/model_loop)[▶️YouTube](https://www.youtube.com/watch?v=dm0dWL49Bc0\u0026t=104s) |  🤖 [Iteratively Proposing Ideas \u0026 Evolving](https://rdagent.azurewebsites.net/factor_loop) [▶️YouTube](https://www.youtube.com/watch?v=X4DK2QZKaKY\u0026t=6s) \u003cbr/\u003e   🦾 [Auto reports reading \u0026 implementation](https://rdagent.azurewebsites.net/report_factor)[▶️YouTube](https://www.youtube.com/watch?v=ECLTXVcSx-c)  |\n| **🩺 Medical**      | 🤖 [Iteratively Proposing Ideas \u0026 Evolving](https://rdagent.azurewebsites.net/dmm)[▶️YouTube](https://www.youtube.com/watch?v=VIaSTZuoZg4) | -                                                                                  |\n| **🏭 General**      | 🦾 [Auto paper reading \u0026 implementation](https://rdagent.azurewebsites.net/report_model)[▶️YouTube](https://www.youtube.com/watch?v=BiA2SfdKQ7o) \u003cbr/\u003e 🤖 Auto Kaggle Model Tuning   | 🤖Auto Kaggle feature Engineering |\n\n- **[RoadMap](https://rdagent.readthedocs.io/en/latest/scens/data_science.html#roadmap)**: Currently, we are working hard to add new features to the Kaggle scenario.\n\nDifferent scenarios vary in entrance and configuration. Please check the detailed setup tutorial in the scenarios documents.\n\nHere is a gallery of [successful explorations](https://github.com/SunsetWolf/rdagent_resource/releases/download/demo_traces/demo_traces.zip) (5 traces showed in **[🖥️ Live Demo](https://rdagent.azurewebsites.net/)**). You can download and view the execution trace using [this command](https://github.com/microsoft/RD-Agent?tab=readme-ov-file#%EF%B8%8F-monitor-the-application-results) from the documentation.\n\nPlease refer to **[📖readthedocs_scen](https://rdagent.readthedocs.io/en/latest/scens/catalog.html)** for more details of the scenarios.\n\n# ⚙️ Framework\n\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"docs/_static/Framework-RDAgent.png\" alt=\"Framework-RDAgent\" width=\"85%\"\u003e\n\u003c/div\u003e\n\n\nAutomating the R\u0026D process in data science is a highly valuable yet underexplored area in industry. We propose a framework to push the boundaries of this important research field.\n\nThe research questions within this framework can be divided into three main categories:\n| Research Area | Paper/Work List |\n|--------------------|-----------------|\n| **Benchmark the R\u0026D abilities** | [Benchmark](#benchmark) |\n| **Idea proposal:** Explore new ideas or refine existing ones | [Research](#research) |\n| **Ability to realize ideas:** Implement and execute ideas | [Development](#development) |\n\nWe believe that the key to delivering high-quality solutions lies in the ability to evolve R\u0026D capabilities. Agents should learn like human experts, continuously improving their R\u0026D skills.\n\nMore documents can be found in the **[📖 readthedocs](https://rdagent.readthedocs.io/)**.\n\n# 📃 Paper/Work list\n\n## Overall Technical Report\n- [R\u0026D-Agent: An LLM-Agent Framework Towards Autonomous Data Science](https://arxiv.org/abs/2505.14738)\n```BibTeX\n@misc{yang2025rdagentllmagentframeworkautonomous,\n      title={R\u0026D-Agent: An LLM-Agent Framework Towards Autonomous Data Science}, \n      author={Xu Yang and Xiao Yang and Shikai Fang and Yifei Zhang and Jian Wang and Bowen Xian and Qizheng Li and Jingyuan Li and Minrui Xu and Yuante Li and Haoran Pan and Yuge Zhang and Weiqing Liu and Yelong Shen and Weizhu Chen and Jiang Bian},\n      year={2025},\n      eprint={2505.14738},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https://arxiv.org/abs/2505.14738}, \n}\n```\n![image](https://github.com/user-attachments/assets/28b0488d-a546-4fef-8dc5-563ed64a9b4d)\n\n## 📊 Benchmark\n- [Towards Data-Centric Automatic R\u0026D](https://arxiv.org/abs/2404.11276)\n```BibTeX\n@misc{chen2024datacentric,\n    title={Towards Data-Centric Automatic R\u0026D},\n    author={Haotian Chen and Xinjie Shen and Zeqi Ye and Wenjun Feng and Haoxue Wang and Xiao Yang and Xu Yang and Weiqing Liu and Jiang Bian},\n    year={2024},\n    eprint={2404.11276},\n    archivePrefix={arXiv},\n    primaryClass={cs.AI}\n}\n```\n![image](https://github.com/user-attachments/assets/494f55d3-de9e-4e73-ba3d-a787e8f9e841)\n\n## 🔍 Research\n\nIn a data mining expert's daily research and development process, they propose a hypothesis (e.g., a model structure like RNN can capture patterns in time-series data), design experiments (e.g., finance data contains time-series and we can verify the hypothesis in this scenario), implement the experiment as code (e.g., Pytorch model structure), and then execute the code to get feedback (e.g., metrics, loss curve, etc.). The experts learn from the feedback and improve in the next iteration.\n\nBased on the principles above, we have established a basic method framework that continuously proposes hypotheses, verifies them, and gets feedback from the real-world practice. This is the first scientific research automation framework that supports linking with real-world verification.\n\nFor more detail, please refer to our **[🖥️ Live Demo page](https://rdagent.azurewebsites.net)**.\n\n## 🛠️ Development\n\n- [Collaborative Evolving Strategy for Automatic Data-Centric Development](https://arxiv.org/abs/2407.18690)\n```BibTeX\n@misc{yang2024collaborative,\n    title={Collaborative Evolving Strategy for Automatic Data-Centric Development},\n    author={Xu Yang and Haotian Chen and Wenjun Feng and Haoxue Wang and Zeqi Ye and Xinjie Shen and Xiao Yang and Shizhao Sun and Weiqing Liu and Jiang Bian},\n    year={2024},\n    eprint={2407.18690},\n    archivePrefix={arXiv},\n    primaryClass={cs.AI}\n}\n```\n![image](https://github.com/user-attachments/assets/75d9769b-0edd-4caf-9d45-57d1e577054b)\n\n## Deep Application in Diverse Scenarios\n\n- [R\u0026D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization](https://arxiv.org/abs/2505.15155)\n```BibTeX\n@misc{li2025rdagentquantmultiagentframeworkdatacentric,\n      title={R\u0026D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization}, \n      author={Yuante Li and Xu Yang and Xiao Yang and Minrui Xu and Xisen Wang and Weiqing Liu and Jiang Bian},\n      year={2025},\n      eprint={2505.15155},\n      archivePrefix={arXiv},\n      primaryClass={q-fin.CP},\n      url={https://arxiv.org/abs/2505.15155}, \n}\n```\n![image](https://github.com/user-attachments/assets/3186f67a-c2f8-4b6b-8bb9-a9b959c13866)\n\n\n# 🤝 Contributing\n\nWe welcome contributions and suggestions to improve R\u0026D-Agent. Please refer to the [Contributing Guide](CONTRIBUTING.md) for more details on how to contribute.\n\nBefore submitting a pull request, ensure that your code passes the automatic CI checks.\n\n## 📝 Guidelines\nThis project welcomes contributions and suggestions.\nContributing to this project is straightforward and rewarding. Whether it's solving an issue, addressing a bug, enhancing documentation, or even correcting a typo, every contribution is valuable and helps improve R\u0026D-Agent.\n\nTo get started, you can explore the issues list, or search for `TODO:` comments in the codebase by running the command `grep -r \"TODO:\"`.\n\n\u003cimg src=\"https://img.shields.io/github/contributors-anon/microsoft/RD-Agent\"/\u003e\n\n\u003ca href=\"https://github.com/microsoft/RD-Agent/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=microsoft/RD-Agent\u0026max=100\u0026columns=15\" /\u003e\n\u003c/a\u003e\n\nBefore we released R\u0026D-Agent as an open-source project on GitHub, it was an internal project within our group. Unfortunately, the internal commit history was not preserved when we removed some confidential code. As a result, some contributions from our group members, including Haotian Chen, Wenjun Feng, Haoxue Wang, Zeqi Ye, Xinjie Shen, and Jinhui Li, were not included in the public commits.\n\n# ⚖️ Legal disclaimer\n\u003cp style=\"line-height: 1; font-style: italic;\"\u003eThe RD-agent is provided “as is”, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. The RD-agent is aimed to facilitate research and development process in the financial industry and not ready-to-use for any financial investment or advice. Users shall independently assess and test the risks of the RD-agent in a specific use scenario, ensure the responsible use of AI technology, including but not limited to developing and integrating risk mitigation measures, and comply with all applicable laws and regulations in all applicable jurisdictions. The RD-agent does not provide financial opinions or reflect the opinions of Microsoft, nor is it designed to replace the role of qualified financial professionals in formulating, assessing, and approving finance products. The inputs and outputs of the RD-agent belong to the users and users shall assume all liability under any theory of liability, whether in contract, torts, regulatory, negligence, products liability, or otherwise, associated with use of the RD-agent and any inputs and outputs thereof.\u003c/p\u003e\n","funding_links":["https://github.com/sponsors/MIIC-finance"],"categories":["5.4 代码编程方向","App","🤖 AI \u0026 Machine Learning","H. Quantitative Open Sourced Framework","Tools","Python","🔬 Research Agents","Agent Frameworks","A01_文本生成_文本对话","Repos","AutoML Agents","Autonomous Research \u0026 Content Generation","1. Local Agents"],"sub_categories":["5.4.4 微软 RD-Agent","Application","Research","🟩 Development Tools 🛠️","大语言对话模型及数据","Prompt Libraries","Research \u0026 Knowledge Agents"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FRD-Agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2FRD-Agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FRD-Agent/lists"}