{"id":34091644,"url":"https://github.com/asgersvenning/pyremotedata","last_synced_at":"2026-04-08T19:32:10.267Z","repository":{"id":206372605,"uuid":"716101050","full_name":"asgersvenning/pyremotedata","owner":"asgersvenning","description":"This repository contains the python module \"pyRemoteData\" which handles high-bandwidth data transfer with LFTP.","archived":false,"fork":false,"pushed_at":"2026-02-19T12:10:02.000Z","size":4886,"stargazers_count":3,"open_issues_count":5,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-02-19T16:34:42.214Z","etag":null,"topics":["data-transfer","lftp","python","pytorch","sftp"],"latest_commit_sha":null,"homepage":"https://asgersvenning.com/pyremotedata/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/asgersvenning.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-11-08T12:55:31.000Z","updated_at":"2026-02-19T12:04:12.000Z","dependencies_parsed_at":"2025-07-08T11:27:01.587Z","dependency_job_id":"34370c3d-c5e0-4ff1-9546-e3398db15eea","html_url":"https://github.com/asgersvenning/pyremotedata","commit_stats":null,"previous_names":["asgersvenning/pyremotedata"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/asgersvenning/pyremotedata","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asgersvenning%2Fpyremotedata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asgersvenning%2Fpyremotedata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asgersvenning%2Fpyremotedata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asgersvenning%2Fpyremotedata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/asgersvenning","download_url":"https://codeload.github.com/asgersvenning/pyremotedata/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asgersvenning%2Fpyremotedata/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30864022,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-23T14:38:03.667Z","status":"ssl_error","status_checked_at":"2026-03-23T14:38:01.683Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-transfer","lftp","python","pytorch","sftp"],"created_at":"2025-12-14T14:47:31.305Z","updated_at":"2026-04-08T19:32:10.215Z","avatar_url":"https://github.com/asgersvenning.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `pyRemoteData`\n`pyRemoteData` is a module developed for scientific computation using the remote storage platform [ERDA](https://erda.au.dk/) (Electronic Research Data Archive) provided by Aarhus University IT, as part of my PhD at the Department of Ecoscience at Aarhus University.\n\nIt can be used with **any** storage facility that supports SFTP and LFTP, but is only tested on a minimal SFTP server found at [atmoz/sftp](https://hub.docker.com/r/atmoz/sftp) and on the live AU ERDA service which runs on MiG (Minimum intrusion Grid - [SourceForge](https://sourceforge.net/projects/migrid/)/[GitHub](https://github.com/ucphhpc/migrid-sync)) developed by [SCIENCE HPC Centre at Copenhagen University](https://science.ku.dk/english/research/research-e-infrastructure/science-hpc-centre/).\n\n## Capabilities\nIn order to facility high-throughput computation in a cross-platform setting, `pyRemoteData` handles data transfer with multithreading and asynchronous data streaming using thread-safe buffers.\n\n## Use-cases\nIf your storage facility supports SFTP and LFTP, and you need high-bandwidth data streaming for analysis, data migration or other purposes such as model-training, then this module may be of use to you.\nExperience with SFTP or LFTP is not necessary, but you must be able to setup the required SSH configurations.\n\nSee **Automated** for details on how to avoid having to set up SSH configuration.\n\n## Setup\nA more user-friendly setup process, which facilitates both automated as well as interactive setup is currently in development. (**TODO**: Finish and describe the setup process)\n\n### Installation\nThe package is available on PyPI, and can be installed using pip:\n```bash\npip install pyremotedata\n```\n\n### Interactive\nSimply follow the popup instructions that appear once you load the package for the first time.\n\n### Automated\nThe automatic configuration setup relies on setting the correct environment variables **BEFORE LOADING THE PACKAGE**:\n\n* `PYREMOTEDATA_REMOTE_USERNAME` : Should be set to your username on your remote service.\n* `PYREMOTEDATA_REMOTE_URI` : Should be set to the URI of the endpoint for your remote service (e.g. for ERDA it is \"io.erda.au.dk\").\n* `PYREMOTEDATA_REMOTE_DIRECTORY` : If you would like to set a default working directory, that is not the root of your remote storage, then set this to that (e.g. \"/MY_PROJECT/DATASETS\") otherwise simply set this to \"/\".\n* `PYREMOTEDATA_AUTO` : Should be **set to \"yes\"** to disable interactive mode. If this is not set, or set to anything other than \"yes\" (not case-sensitive), while any of the prior environment variables are unset an error will be thrown.\n\nThe recommended way to avoid any SSH or environment variables setup is to use:\n```py\nfrom pyremotedata.implicit_mount import IOHandler\nwith IOHandler(lftp_settings = {'sftp:connect-program' : 'ssh -a -x -i \u003ckeyfile\u003e'}, user = \u003cUSER\u003e, remote = \u003cREMOTE\u003e) as io:\n    ...\n```\nHere `keyfile` is probably something like `~/.ssh/id_rsa`. \n\n### Example\nIf you want to test against a mock server simply follow the instructions in tests/README.\n\nIf you have a remote storage facility that supports SFTP and LFTP, then you can use the following example to test the functionality of the module:\n```python\n# Set the environment variables (only necessary in a non-interactive setting)\n# If you are simply running this as a Python script, \n# you can omit these lines and you will be prompted to set them interactively\nimport os\nos.environ[\"PYREMOTEDATA_REMOTE_USERNAME\"] = \"username\"\nos.environ[\"PYREMOTEDATA_REMOTE_URI\"] = \"storage.example.com\"\nos.environ[\"PYREMOTEDATA_REMOTE_DIRECTORY\"] = \"/MY_PROJECT/DATASETS\"\nos.environ[\"PYREMOTEDATA_AUTO\"] = \"yes\"\n\nfrom pyremotedata.implicit_mount import IOHandler\n\nhandler = IOHandler()\n\nwith handler as io:\n    print(io.ls())\n    local_file = io.download(\"/remote/file/or/directory\")\n\n# The configuration is persistent, but can be removed using the following:\nfrom pyremotedata.config import remove_config\nremove_config()\n```\n\n## Issues\nThis module is certainly not maximally efficient, and you may run into network- or OS-specific issues. Any and all feedback and contributions is highly appreciated.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasgersvenning%2Fpyremotedata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fasgersvenning%2Fpyremotedata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasgersvenning%2Fpyremotedata/lists"}