{"id":13680905,"url":"https://github.com/wfondrie/ppx","last_synced_at":"2025-04-09T13:04:27.650Z","repository":{"id":33003253,"uuid":"146941886","full_name":"wfondrie/ppx","owner":"wfondrie","description":"A Python interface to proteomics data repositories","archived":false,"fork":false,"pushed_at":"2025-03-07T18:00:44.000Z","size":229,"stargazers_count":34,"open_issues_count":3,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-02T12:11:19.882Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://ppx.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wfondrie.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-08-31T21:01:24.000Z","updated_at":"2025-03-23T00:57:49.000Z","dependencies_parsed_at":"2025-03-07T19:29:20.975Z","dependency_job_id":null,"html_url":"https://github.com/wfondrie/ppx","commit_stats":null,"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wfondrie%2Fppx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wfondrie%2Fppx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wfondrie%2Fppx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wfondrie%2Fppx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wfondrie","download_url":"https://codeload.github.com/wfondrie/ppx/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248045230,"owners_count":21038553,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T13:01:23.609Z","updated_at":"2025-04-09T13:04:27.630Z","avatar_url":"https://github.com/wfondrie.png","language":"Python","funding_links":[],"categories":["2. Databases"],"sub_categories":["Table of Contents"],"readme":"\u003cimg src=\"static/ppx_light.svg\" width=300\u003e\n\n# A Python interface to proteomics data repositories\n\n[![conda](https://img.shields.io/conda/vn/bioconda/ppx?color=green)](http://bioconda.github.io/recipes/ppx/README.html)\n[![PyPI](https://img.shields.io/pypi/v/ppx?color=green)](https://pypi.org/project/ppx/)\n[![tests](https://github.com/wfondrie/ppx/actions/workflows/tests.yml/badge.svg?branch=master)](https://github.com/wfondrie/ppx/actions/workflows/tests.yml)\n[![Documentation Status](https://readthedocs.org/projects/ppx/badge/?version=latest)](https://ppx.readthedocs.io/en/latest/?badge=latest)\n\nhttps://github.com/wfondrie/ppx/workflows/tests/badge.svg?branch=master\n\n## Overview\nppx provides a simple, programmatic means to access proteomics data that are\npublicly available in [ProteomeXchange](http://www.proteomexchange.org) partner\nrepositories. ppx allows users to easily find and download files associated\nwith projects in [PRIDE](https://www.ebi.ac.uk/pride/archive/) and\n[MassIVE](https://massive.ucsd.edu/ProteoSAFe/static/massive.jsp). In doing so,\nppx promotes the reproducible analysis of proteomics data.\n\nFor full documentation and examples, visit: https://ppx.readthedocs.io\n\n## Installation\nppx requires Python 3.6+ and depends upon the\n[requests](https://docs.python-requests.org/en/master/) and\n[tqdm](https://tqdm.github.io/) Python packages. ppx and any missing\ndependencies are easily installed with `pip` or with `conda` through the\n[bioconda](https://bioconda.github.io/index.html) channel.\n\nInstall with `conda`:\n\n``` shell\nconda install -c bioconda ppx\n```\n\nOr install with `pip`:\n\n```shell\npip3 install ppx\n```\n\n## Configuration\n\nBy default, ppx will download project files in the `.ppx` directory under the\ncurrent user's home directory (`~/.ppx` on Linux and MacOS). There are several\nways to specify different data directories:\n\n1. Change the ppx data directory for all future Python sessions by setting the\n`PPX_DATA_DIR` environment variable to your preferred directory.\n\n2. Change the ppx data directory for a Python session using the\n`ppx.set_data_dir()` function.\n\n3. Specify a data directory for a project using the `local` argument:\n\n``` Python\n\u003e\u003e\u003e import ppx\n\n\u003e\u003e\u003e proj = ppx.find_project(\"PXD000001\", local=\"my/data/dir\")\n```\n\nWhy does ppx set a default data directory? We found that this makes it easier\nto reuse the same proteomics data files in multiple tasks that we're working\non.\n\nAs of ppx v1.3.0, cloud paths can also be used as the data directory. This\nallows you to stream downloaded files to AWS S3, Google Cloud Storage, or Azure\nBlob Storage. To use a cloud storage provider, simply set the data directory to\na cloud URI, such as :code:`s3://my-data-bucket/ppx` using any of the methods\nabove. Please note that you'll also need to setup credentials for your cloud\nprovider---see the `CloudPathLib documentation\n\u003chttps://cloudpathlib.drivendata.org/v0.6/authentication/\u003e_` for details.\n\n## Examples\nFirst, find a project using its ProteomeXchange or MassIVE identifier:\n\n``` Python\n\u003e\u003e\u003e import ppx\n\n\u003e\u003e\u003e proj = ppx.find_project(\"PXD000001\")\n```\n\nWe can then view the files associated with the project in the repository\n(PRIDE in this case):\n\n``` Python\n\u003e\u003e\u003e proj.remote_files()\n#['F063721.dat',\n# 'F063721.dat-mztab.txt',\n# 'PRIDE_Exp_Complete_Ac_22134.xml.gz',\n# 'PRIDE_Exp_mzData_Ac_22134.xml.gz',\n# 'PXD000001_mztab.txt',\n# 'README.txt',\n# 'TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML',\n# 'TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzXML',\n# 'TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML',\n# 'TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.raw',\n# 'erwinia_carotovora.fasta',\n# 'generated/PRIDE_Exp_Complete_Ac_22134.pride.mgf.gz',\n# 'generated/PRIDE_Exp_Complete_Ac_22134.pride.mztab.gz']\n```\n\nWe can also [glob](https://en.wikipedia.org/wiki/Glob_(programming)) for\nspecific types of files:\n\n``` Python\n\u003e\u003e\u003e proj.remote_files(\"*.mzML\")\n# ['TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML']\n```\n\nThen we can download one or more files to the projects local data directory:\n\n``` Python\n\u003e\u003e\u003e proj.download(\"README.txt\")\n# [PosixPath('/Users/wfondrie/.ppx/PXD000001/README.txt')]\n```\n\nOnce we've downloaded files, ppx no longer needs an internet connection to\nretrieve a project's local data. However, you will need to specify the\nrepository in which the project data resides. If we start a new Python\nsession, we can find our previous file easily:\n\n``` Python\n\u003e\u003e\u003e import ppx\n\n\u003e\u003e\u003e proj = ppx.find_project(\"PXD000001\", repo=\"PRIDE\")\n\u003e\u003e\u003e proj.local_files()\n# [PosixPath('/Users/wfondrie/.ppx/PXD000001/README.txt')]\n```\n\n### Downloading to cloud storage backend\n\nWe use [CloudPathlib](https://cloudpathlib.drivendata.org/stable/) to power\nsupport for AWS S3, Google Cloud Storage, and Azure Blob Storage. To use a\ncloud storage provider, create the bucket for ppx to use and set it as the ppx data\ndirectory.\n\n\nFor example using AWS S3, we can save the files of a project to an S3 bucket:\n``` python\n\u003e\u003e\u003e proj = ppx.find_project(\"PXD000001\", local=\"s3://my-bucket/PXD000001\")\n\u003e\u003e\u003e proj.download(\"README.txt\")\n# [S3Path('s3://my-bucket/PXD000001/README.txt')]\n```\n\nCloudPathLib then provides methods to download files from S3 when you need them:\n\n``` Python\n\u003e\u003e\u003e readme_on_s3 = proj.local_files(\"README.txt\")[0]\n\u003e\u003e\u003e readme_on_s3.download_to(\"README.txt\")\n# [PosixPath(README.txt)]\n```\n\n## If you are an R user...\n\nppx was inspired the rpx R package by Laurent Gatto. Check it out on\n[Bioconductor](http://bioconductor.org/packages/release/bioc/html/rpx.html) and\n[GitHub](https://github.com/lgatto/rpx).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwfondrie%2Fppx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwfondrie%2Fppx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwfondrie%2Fppx/lists"}