{"id":19329781,"url":"https://github.com/outerbounds/dolly-metaflow","last_synced_at":"2026-03-17T21:04:22.313Z","repository":{"id":160223548,"uuid":"630606013","full_name":"outerbounds/dolly-metaflow","owner":"outerbounds","description":null,"archived":false,"fork":false,"pushed_at":"2023-05-02T04:52:58.000Z","size":57,"stargazers_count":6,"open_issues_count":1,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-22T21:49:15.677Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/outerbounds.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-20T18:39:35.000Z","updated_at":"2024-06-13T17:58:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"b38db24d-6b30-40dc-a8b6-f471ea357ca8","html_url":"https://github.com/outerbounds/dolly-metaflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/outerbounds/dolly-metaflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fdolly-metaflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fdolly-metaflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fdolly-metaflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fdolly-metaflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/outerbounds","download_url":"https://codeload.github.com/outerbounds/dolly-metaflow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fdolly-metaflow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275232726,"owners_count":25428227,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-15T02:00:09.272Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T02:29:55.789Z","updated_at":"2026-03-17T21:04:17.289Z","avatar_url":"https://github.com/outerbounds.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n### For context, read this blog article: [Training a Large Language Model With Metaflow, Featuring Dolly](https://outerbounds.com/blog/train-dolly-metaflow/)\n\n## Background\nThis repository trains [Dolly](https://github.com/databrickslabs/dolly), a large language model [recently announced](https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html) by Databricks Labs.\n\nPlease visit the original [repository](https://github.com/databrickslabs/dolly) to learn more about Dolly's origins, and fair use.\n\u003cbr\u003e\n\nThe main contributions of this repository are:\n- Reproduce the Dolly training process on [Outerbounds platform](https://outerbounds.com/blog/announcing-outerbounds-platform/). \n- A [Metaflow](https://metaflow.org/) flow that runs Dolly training on multiple GPUs using Outerbounds platform, or your own Metaflow deployment. \n- A `@gpu_profile` decorator that you can reuse for any Metaflow task, to monitor GPU utilization.\n- A [Streamlit](https://streamlit.io/) app that lets you interact with different versions of Dolly when testing.\n\n**Dolly - and therefore this repository - is intended exclusively for research purposes and is not licensed for commercial use due to its dependency on the Alpaca Dataset. You will need to create your own instruction tuning dataset to use this repository for commercial applications.**\n\n## Infrastructure \u0026 Environment ⚙️\n\n### GPU Environment\nThis code should be run on a GPU. We tested it in two environments: \n- AWS `p3dn.24xlarge` EC2 instance with 8 NVIDIA [V100 GPUs](https://www.nvidia.com/en-us/data-center/v100/)\n    -  Deep Learning AMI (Ubuntu 20.04) with id `ami-0a39ed2b865d65970` (release notes [here](https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html))\n    - smaller `p3` instances worked too, but less reliably and efficiently\n- [Coreweave](https://www.coreweave.com/) node with 3 NVIDIA [A100 GPUs](https://www.nvidia.com/en-us/data-center/a100/) \n    - Ubuntu 22.04\n    - NVIDIA driver version 515.105.01\n    - CUDA Version 11.7\n\nIn general, we found it is best to have at least 3 A100 GPUs.\nYou will also need a large amount of CPU memory, as the training process requires significant RAM as [deepspeed](https://github.com/microsoft/DeepSpeed) shares the model state across GPUs.\n\n### Python Environment 📦\n```\npython -m venv env \nsource env/bin/activate\npip install -r requirements.txt\n```\n\n### Option 1: Outerbounds platform users\nIf you have access to the Outerbounds platform, install the `outerbounds` package to connect to your organization's deployment.\n\n```\npip install -U outerbounds\n```\nAfter installing `outerbounds`, find and run the command like `outerbounds configure \u003cYOUR KEY\u003e` in your platform onboarding documentation.\n\u003cbr\u003e\n\u003cimg src='static/ob-config.png' width=400\u003e\u003c/img\u003e\n\n### Option 2: Open-source Metaflow users\nIf you do not have access to the Outerbounds platform and want to run on a Metaflow deployment you manage, you can install open-source Metaflow normally (or add it to the `requirements.txt` file).\n```\npip install metaflow\n```\n\u003e To get started with your own deployment, follow our [guides for engineers](https://outerbounds.com/engineering/welcome/) and/or reach out in our [community Slack](http://slack.outerbounds.co/) for help.\n\n## Run the `TrainDolly` flow ▶️\n```\npython train_dolly.py run\n```\n\n### View the GPU profiling results\n```\npython train_dolly.py card view train\n```\nIf you want to look at where this information comes from, you can look at the `my_decorators.py` file, which defines the `@gpu_profile` decorator.\nThis decorator currently assumes that you have [`nvidia-smi`](https://developer.nvidia.com/nvidia-system-management-interface) installed on the machine where the train step runs, and therefore that you are running on NVIDIA GPUs.\n\n## Generate responses with Dolly 🤖\n\nNow you can make a prediction using the trained model. To do this, you can run the `app.py` Streamlit app.\nThis will launch a web app that allows you to interact with Dolly.\n\n```\nstreamlit run app.py\n```\n\n### Interacting with the model on a remote instance\nAlthough you can run the above streamlit app locally if you have the GPUs to make inference times reasonable, you may want to run it on a remote instance.\n\nTo set up the Streamlit server on a remote instance, and interact with it from your laptop, you can:\n1. Set up a remote instance, such as on AWS EC2. Similar to during model training, you will want to select an instance with GPUs, such as a `p3.16xlarge` instance on AWS. As during training, we use the `ami-0a39ed2b865d65970` deep learning AMI. Add a security role allowing your IP address to read from TCP port 8501, which is where Streamlit runs.\n2. Make a `ssh` connection to your EC2 instance.\n3. Install GitHub CLI tool by copy and pasting this in the terminal.\n```bash\ntype -p curl \u003e/dev/null || (sudo apt update \u0026\u0026 sudo apt install curl -y)\ncurl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \\\n\u0026\u0026 sudo chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg \\\n\u0026\u0026 echo \"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main\" | sudo tee /etc/apt/sources.list.d/github-cli.list \u003e /dev/null \\\n\u0026\u0026 sudo apt update \\\n\u0026\u0026 sudo apt install gh -y\n```\n4. Log in with `gh auth login`\n5. Clone this repository with `gh repo clone outerbounds/dolly-ops \u0026\u0026 cd dolly-ops`\n6. Create an environment with `mamba create -n dolly python=3.9 -y \u0026\u0026 mamba init \u0026\u0026 source ~/.bashrc \u0026\u0026 mamba activate dolly`\n7. Install the `requirements.txt` file with `pip install -r requirements.txt`\n8. Install Metaflow with `pip install metaflow` if you are not on Outerbounds platform. If you are on Outerbounds platform, install Metaflow with `pip install -U outerbounds` and then use your `outerbounds configure \u003cYOUR KEY\u003e` to connect to your organization's deployment. Make sure your Metaflow config matches the one used during training.\n9. Run the Streamlit app with `streamlit run app.py`.\n10. From your a web browser on your laptop, open the `External URL` that is printed in the terminal. Then you can interact with the models. Note it takes a few minutes to download the models the first time you try to load each one, since the models are 10s of GBs. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fouterbounds%2Fdolly-metaflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fouterbounds%2Fdolly-metaflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fouterbounds%2Fdolly-metaflow/lists"}