{"id":26395309,"url":"https://github.com/banyan-team/banyan-julia","last_synced_at":"2025-03-17T11:17:04.221Z","repository":{"id":37076956,"uuid":"323387785","full_name":"banyan-team/banyan-julia","owner":"banyan-team","description":"A suite of familiar Julia APIs for bigger datasets with less cloud and lower costs.","archived":false,"fork":false,"pushed_at":"2022-09-17T11:42:55.000Z","size":18330,"stargazers_count":17,"open_issues_count":22,"forks_count":1,"subscribers_count":2,"default_branch":"v22.03.06","last_synced_at":"2024-04-25T05:02:09.959Z","etag":null,"topics":["banyan","distributed-computing","julia","parallel","parallel-computing"],"latest_commit_sha":null,"homepage":"https://banyancomputing.com","language":"Julia","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/banyan-team.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-21T16:17:06.000Z","updated_at":"2023-05-03T00:59:37.000Z","dependencies_parsed_at":"2022-08-01T07:49:14.143Z","dependency_job_id":null,"html_url":"https://github.com/banyan-team/banyan-julia","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banyan-team%2Fbanyan-julia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banyan-team%2Fbanyan-julia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banyan-team%2Fbanyan-julia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banyan-team%2Fbanyan-julia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/banyan-team","download_url":"https://codeload.github.com/banyan-team/banyan-julia/tar.gz/refs/heads/v22.03.06","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244022725,"owners_count":20385134,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["banyan","distributed-computing","julia","parallel","parallel-computing"],"created_at":"2025-03-17T11:17:02.767Z","updated_at":"2025-03-17T11:17:04.212Z","avatar_url":"https://github.com/banyan-team.png","language":"Julia","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Banyan Julia\n\nBanyan Julia is a suite of libraries for processing big data with massive parallelism in the cloud. The difference between Banyan Julia and alternatives (such as Dagger.jl, Distributed, MPI.jl) is that anything you can [`compute`](https://www.banyancomputing.com/computing-futures) can be instantly [`sample`](https://www.banyancomputing.com/sampling-data)d.\n\n- [BanyanArrays.jl](https://www.banyancomputing.com/banyan-arrays-jl-docs) for reading/writing large HDF5 datasets and distributed map-reduce computation\n- [BanyanImages.jl](https://www.banyancomputing.com/banyan-images-jl-docs) for massively parallel image processing\n- [BanyanDataFrames.jl](https://www.banyancomputing.com/banyan-data-frames-jl-docs) for distributed reading/writing Parquet/CSV/Arrow datasets and selecting, aggregating, and transforming data\n- [BanyanONNXRunTime.jl](https://www.banyancomputing.com/banyan-onnx-run-time-jl-docs) for high-performance ML inference (bring your own PyTorch and TensorFlow models!)\n- BanyanDBInterface.jl (WIP - please contact support@banyancomputing.com) for extracting from and loading to your database\n- [Banyan.jl's Custom Scripting](https://www.banyancomputing.com/custom-scripting) for running single-worker or many-worker Julia scripts with easy access to MPI, parallel HDF5, and Amazon S3 (with the mounted `s3/` directory)\n\nYou can effectively be able to use these libraries as drop-in replacements of the standard library Arrays and the DataFrames.jl library. By changing an import statement, you can run your code as is with Banyan scaling to arbitrary data or compute needs and read in\narray/image/table data from S3 or the Internet (e.g., GitHub or public APIs).\n\nVisit [Banyan Computing](https://www.banyancomputing.com/resources/) for full documentation.\n\n## Getting Started\n\nBanyan is the best way to unleash Julia on big data in the cloud! To get started:\n\n1. Follow the [getting started steps](banyancomputing.com/getting-started) (15 minutes)\n2. Create a cluster on the [dashboard](banyancomputing.com/dashboard)\n3. Start a cluster session wherever you are running Julia with `start_session` (between 15s and 30min)\n4. Use functions in [BanyanArrays.jl](https://www.banyancomputing.com/banyan-arrays-jl-docs) or [BanyanDataFrames.jl](https://www.banyancomputing.com/banyan-data-frames-jl-docs) for big data processing!\n5. End the cluster session with `end_session`\n6. Destroy the cluster on the [dashboard](banyancomputing.com/dashboard)\n\n## Contributing\n\nPlease create branches named according the the author name and the feature name\nlike `{author-name}/{feature-name}`. For example: `caleb/add-tests-for-hdf5`.\nThen, submit a pull request on GitHub to merge your branch into the branch with\nthe latest version number.\n\nWhen pulling/pushing code, you may need to add the appropriate SSH key. Look\nup GitHub documentation for how to generate an SSH key, then make sure to add\nit. You may need to do this repeatedly if you have multiple SSH keys for\ndifferent GitHub accounts. For example, on Windows you may need to:\n\n```\neval `ssh-agent`\nssh-add -D\nssh-add /c/Users/Claris/.ssh/id_rsa_clarisw\ngit remote set-url origin git@github.com:banyan-team/banyan-website.git\n```\n\n## Testing\n\nTo see an example of how to add tests, see `BanyanArrays/test/runtests.jl` and `BanyanArrays/test/hdf5.jl`.\n\nTo run tests, ensure that you have a Banyan account connected to an AWS account.\nThen, `cd` into the directory with the Banyan Julia project you want to run\ntests for (e.g., `Banyan` for Banyan.jl or `BanyanDataFrames` for\nBanyanDataFrames.jl) and run `julia --project=. -e \"using Pkg; Pkg.test()\"`.\nTo filter and run a subset of test sets (where each test set is defined with\n`@testset`) with names matching a given pattern, run\n`julia --project=. -e \"using Pkg; Pkg.test(test_args=[\\\"{pattern 1}\\\", \\\"{pattern 2}\\\"])\"` where\nthe pattern could be, for example, `Sampl(.*)parquet` (a regular expression)\nor `Sample collection`.\n\nYou must then specify the cluster name with the `BANYAN_CLUSTER_NAME`\nenvironment variable. You must also specify the relevant `BANYAN_*`\nand `AWS_*` environment variables to provide credentials. AWS\ncredentials are specified in the same way as they would be if using\nthe AWS CLI (either use environment variables or use the relevant\nAWS configuration files) and the Banyan environment variables\nare saved in `banyanconfig.toml` so you don't need to specify it\nevery time.\n\nYou must also specify the branch you would like to test with the `BANYAN_JULIA_BRANCH`\nenvironment variables.\n\nFor example, if you have previously specified your Banyan API key, user ID, and AWS credentials, you could:\n\n```\ncd BanyanDataFrames\nBANYAN_CLUSTER_NAME=pumpkincluster0 BANYAN_JULIA_BRANCH=v0.1.3 julia --project=. -e \"using Pkg; Pkg.test(test_args=[\\\"ample\\\"])\n```\n\nIf your AWS credentials are saved under a profile named `banyan-testing`, you could use `AWS_DEFAULT_PROFILE=banyan-testing`.\n\n## Development\n\nMake sure to use the `] dev ...` command or `Pkg.dev(...)` to ensure that when you\nare using BanyanArrays.jl or BanyanDataFrames.jl you are using the local version.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbanyan-team%2Fbanyan-julia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbanyan-team%2Fbanyan-julia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbanyan-team%2Fbanyan-julia/lists"}