{"id":15291444,"url":"https://github.com/azure/doazureparallel","last_synced_at":"2026-01-14T21:03:12.386Z","repository":{"id":51233005,"uuid":"82002495","full_name":"Azure/doAzureParallel","owner":"Azure","description":"A R package that allows users to submit parallel workloads in Azure","archived":true,"fork":false,"pushed_at":"2021-05-19T16:23:13.000Z","size":645,"stargazers_count":107,"open_issues_count":38,"forks_count":49,"subscribers_count":41,"default_branch":"master","last_synced_at":"2024-12-22T18:04:55.557Z","etag":null,"topics":["azure-batch","cluster","dsvm","foreach","mran","parallel","r"],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Azure.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"Contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-02-15T00:17:52.000Z","updated_at":"2023-10-12T06:25:46.000Z","dependencies_parsed_at":"2022-09-13T08:31:59.640Z","dependency_job_id":null,"html_url":"https://github.com/Azure/doAzureParallel","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure%2FdoAzureParallel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure%2FdoAzureParallel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure%2FdoAzureParallel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure%2FdoAzureParallel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Azure","download_url":"https://codeload.github.com/Azure/doAzureParallel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235586525,"owners_count":19014035,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure-batch","cluster","dsvm","foreach","mran","parallel","r"],"created_at":"2024-09-30T16:12:31.067Z","updated_at":"2025-10-07T04:31:08.813Z","avatar_url":"https://github.com/Azure.png","language":"R","readme":"[![Build Status](https://travis-ci.org/Azure/doAzureParallel.svg?branch=master)](https://travis-ci.org/Azure/doAzureParallel)\r\n\r\n# This repo is no longer maintained and no new features will be added.\r\n\r\n## doAzureParallel\r\n\r\n## Introduction\r\n\r\nThe *doAzureParallel* package is a parallel backend for the widely popular *foreach* package. With *doAzureParallel*, each iteration of the *foreach* loop runs in parallel on an Azure Virtual Machine (VM), allowing users to scale up their R jobs to tens or hundreds of machines.\r\n\r\n*doAzureParallel* is built to support the *foreach* parallel computing package. The *foreach* package supports parallel execution - it can execute multiple processes across some parallel backend. With just a few lines of code, the *doAzureParallel* package helps create a cluster in Azure, register it as a parallel backend, and seamlessly connects to the *foreach* package.\r\n\r\nNOTE: The terms *pool* and *cluster* are used interchangably throughout this document.\r\n\r\n## Notable Features\r\n- Ability to use low-priority VMs for an 80% discount [(link)](./docs/31-vm-sizes.md#low-priority-vms)\r\n- Users can bring their own Docker Image\r\n- AAD and VNets Support\r\n- Built in support for Azure Blob Storage\r\n\r\n## Dependencies\r\n\r\n- R (\u003e= 3.3.1)\r\n- httr (\u003e= 1.2.1)\r\n- rjson (\u003e= 0.2.15)\r\n- RCurl (\u003e= 1.95-4.8)\r\n- digest (\u003e= 0.6.9)\r\n- foreach (\u003e= 1.4.3)\r\n- iterators (\u003e= 1.0.8)\r\n- bitops (\u003e= 1.0.5)\r\n\r\n## Setup \r\n\r\n1) Install doAzureParallel directly from Github.\r\n\r\n```R\r\n# install the package devtools\r\ninstall.packages(\"devtools\")\r\n\r\n# install the doAzureParallel and rAzureBatch package\r\ndevtools::install_github(\"Azure/rAzureBatch\")\r\ndevtools::install_github(\"Azure/doAzureParallel\")\r\n```\r\n\r\n2) Create an doAzureParallel's credentials file\r\n``` R\r\nlibrary(doAzureParallel)\r\ngenerateCredentialsConfig(\"credentials.json\")\r\n```\r\n\r\n3) Login or register for an Azure Account, navigate to [Azure Cloud Shell](https://shell.azure.com)\r\n\r\n``` sh \r\nwget -q https://raw.githubusercontent.com/Azure/doAzureParallel/master/account_setup.sh \u0026\u0026\r\nchmod 755 account_setup.sh \u0026\u0026\r\n/bin/bash account_setup.sh\r\n```\r\n4) Follow the on screen prompts to create the necessary Azure resources and copy the output into your credentials file. For more information, see [Getting Started Scripts](./docs/02-getting-started-script.md).\r\n\r\nTo Learn More:\r\n- [Azure Account Requirements for doAzureParallel](./docs/04-azure-requirements.md)\r\n\r\n## Getting Started\r\n\r\nImport the package\r\n```R\r\nlibrary(doAzureParallel)\r\n```\r\n\r\nSet up your parallel backend with Azure. This is your set of Azure VMs.\r\n```R\r\n# 1. Generate your credential and cluster configuration files.  \r\ngenerateClusterConfig(\"cluster.json\")\r\ngenerateCredentialsConfig(\"credentials.json\")\r\n\r\n# 2. Fill out your credential config and cluster config files.\r\n# Enter your Azure Batch Account \u0026 Azure Storage keys/account-info into your credential config (\"credentials.json\") and configure your cluster in your cluster config (\"cluster.json\")\r\n\r\n# 3. Set your credentials - you need to give the R session your credentials to interact with Azure\r\nsetCredentials(\"credentials.json\")\r\n\r\n# 4. Register the pool. This will create a new pool if your pool hasn't already been provisioned.\r\ncluster \u003c- makeCluster(\"cluster.json\")\r\n\r\n# 5. Register the pool as your parallel backend\r\nregisterDoAzureParallel(cluster)\r\n\r\n# 6. Check that your parallel backend has been registered\r\ngetDoParWorkers()\r\n```\r\n\r\nRun your parallel *foreach* loop with the *%dopar%* keyword. The *foreach* function will return the results of your parallel code.\r\n\r\n```R\r\nnumber_of_iterations \u003c- 10\r\nresults \u003c- foreach(i = 1:number_of_iterations) %dopar% {\r\n  # This code is executed, in parallel, across your cluster.\r\n  myAlgorithm()\r\n}\r\n```\r\n\r\nAfter you finish running your R code in Azure, you may want to shut down your cluster of VMs to make sure that you are not being charged anymore.\r\n\r\n```R\r\n# shut down your pool\r\nstopCluster(cluster)\r\n```\r\n\r\n## Table of Contents \r\nThis section will provide information about how Azure works, how best to take advantage of Azure, and best practices when using the doAzureParallel package.\r\n\r\n1. **Azure Introduction** [(link)](./docs/00-azure-introduction.md)\r\n\r\n   Using *Azure Batch*\r\n\r\n2. **Getting Started** [(link)](./docs/01-getting-started.md)\r\n\r\n    Using the *Getting Started* to create credentials\r\n    \r\n    i. **Generate Credentials Script** [(link)](./docs/02-getting-started-script.md)\r\n\r\n    - Pre-built bash script for getting Azure credentials without Azure Portal\r\n\r\n    ii. **National Cloud Support** [(link)](./docs/03-national-clouds.md)\r\n\r\n    - How to run workload in Azure national clouds\r\n\r\n3. **Customize Cluster** [(link)](./docs/30-customize-cluster.md)\r\n\r\n    Setting up your cluster to user's specific needs\r\n\r\n    i. **Virtual Machine Sizes** [(link)](./docs/31-vm-sizes.md)\r\n    \r\n    - How do you choose the best VM type/size for your workload?\r\n\r\n    ii. **Autoscale** [(link)](./docs/32-autoscale.md)\r\n  \r\n    - Automatically scale up/down your cluster to save time and/or money.\r\n  \r\n    iii. **Building Containers** [(link)](./docs/33-building-containers.md)\r\n    \r\n      - Creating your own Docker containers for reproducibility\r\n\r\n4. **Managing Cluster** [(link)](./docs/40-clusters.md)\r\n\r\n    Managing your cluster's lifespan\r\n\r\n5. **Customize Job**\r\n\r\n    Setting up your job to user's specific needs\r\n    \r\n    i. **Asynchronous Jobs** [(link)](./docs/51-long-running-job.md)\r\n    \r\n    - Best practices for managing long running jobs\r\n  \r\n    ii. **Foreach Azure Options** [(link)](./docs/52-azure-foreach-options.md)\r\n        \r\n    - Use Azure package-defined foreach options to improve performance and user experience\r\n  \r\n    iii. **Error Handling** [(link)](./docs/53-error-handling.md)\r\n    \r\n    - How Azure handles errors in your Foreach loop? \r\n    \r\n6. **Package Management** [(link)](./docs/20-package-management.md)\r\n\r\n    Best practices for managing your R packages in code. This includes installation at the cluster or job level as well as how to use different package providers.\r\n\r\n7. **Storage Management**\r\n    \r\n    i. **Distributing your Data** [(link)](./docs/71-distributing-data.md)\r\n    \r\n    - Best practices and limitations for working with distributed data.\r\n\r\n    ii. **Persistent Storage** [(link)](./docs/72-persistent-storage.md)\r\n\r\n    - Taking advantage of persistent storage for long-running jobs\r\n   \r\n    iii. **Accessing Azure Storage through R** [(link)](./docs/73-managing-storage.md)\r\n    \r\n    - Manage your Azure Storage files via R \r\n\r\n8. **Performance Tuning** [(link)](./docs/80-performance-tuning.md)\r\n\r\n    Best practices on optimizing your Foreach loop\r\n\r\n9. **Debugging and Troubleshooting** [(link)](./docs/90-troubleshooting.md)\r\n    \r\n    Best practices on diagnosing common issues\r\n\r\n10. **Azure Limitations** [(link)](./docs/91-quota-limitations.md)\r\n\r\n    Learn about the limitations around the size of your cluster and the number of foreach jobs you can run in Azure.\r\n   \r\n## Additional Documentation\r\nRead our [**FAQ**](./docs/92-faq.md) for known issues and common questions.\r\n\r\n## Next Steps\r\n\r\nFor more information, please visit [our documentation](./docs/README.md).\r\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fazure%2Fdoazureparallel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fazure%2Fdoazureparallel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fazure%2Fdoazureparallel/lists"}