{"id":45764888,"url":"https://github.com/fairdataihub/pysodafair","last_synced_at":"2026-03-20T00:22:43.960Z","repository":{"id":317724773,"uuid":"938358831","full_name":"fairdataihub/pysodafair","owner":"fairdataihub","description":null,"archived":false,"fork":false,"pushed_at":"2026-02-05T23:03:31.000Z","size":518,"stargazers_count":3,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-02-26T01:25:37.671Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fairdataihub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-02-24T20:38:46.000Z","updated_at":"2026-02-05T23:03:35.000Z","dependencies_parsed_at":null,"dependency_job_id":"b9679a8d-8965-4713-928b-5f078e7a8897","html_url":"https://github.com/fairdataihub/pysodafair","commit_stats":null,"previous_names":["fairdataihub/pysodafair","fairdataihub/pysoda"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fairdataihub/pysodafair","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fairdataihub%2Fpysodafair","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fairdataihub%2Fpysodafair/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fairdataihub%2Fpysodafair/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fairdataihub%2Fpysodafair/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fairdataihub","download_url":"https://codeload.github.com/fairdataihub/pysodafair/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fairdataihub%2Fpysodafair/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30336300,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T12:41:07.687Z","status":"ssl_error","status_checked_at":"2026-03-10T12:41:06.728Z","response_time":106,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-26T00:42:22.306Z","updated_at":"2026-03-10T14:13:06.816Z","avatar_url":"https://github.com/fairdataihub.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pysoda\n\n## Overview\n\nPysoda is a tool for your python workflows that can help you create datasets in compliance with your favorite FAIR(Findable, Accessible, Interoperable, Reusable) data standards. At the moment, pysoda is primarily focused on neuromodulation, neurophysiology, and related data according to the SPARC guidelines that are aimed at making data FAIR. However, we are envisioning to extend the tool to support other standards such as BIDS, FHIR, etc, in the future.\n\nPysoda stems from SODA, a desktop software that simplifies the organization and sharing of data that needs to comply to a FAIR data standard. While using the SODA app can be convenient for most investigators, others with coding proficiency may find it more convenient to implement automated workflows. Given that the backend of SODA contains many functions necessary for preparing and submitting a dataset that is compliant with the SPARC Data Structure (SDS) such as:\n\nCreating standard metadata files\nGenerating manifest files\nAutomatically complying with the file/folder naming conventions\nValidating against the offical SDS validator\nUploading dataset to Pennsieve with SDS compliance (ignoring empty folders and non-allowed files, avoiding duplicate files and folders, etc.)\nAnd many more\n\nPysoda makes these functions, which have been thoroughtly tested and validated, easily integratable in automated workflows such that the investigators do not have to re-write them. This will be very similar to the [pyfairdatatools](https://github.com/AI-READI/pyfairdatatools) Python package we are developing for our [AI-READI](https://aireadi.org/) project as part of the NIH Bridge2AI program.\n\n## Workflow\n\n### Import the pysoda package into your project and initialize the soda object with the supported standard of your choosing\n\n```python\nfrom pysoda import soda_create\n# initialize the soda_create object\n# Internal note: soda_create returns the typical sodaJSONObj with additional methods for adding data and metadata [not in version 1]\n# It is passed into the module functions just like our sodaJSONObj is passed to the backend of our api\n\nsoda = soda_create(standard='sds')\n\n# add a dataset name to the soda object\nsoda.set_dataset_name('my_dataset')\n\n```\n\n### Structure your data\n\n```python\n\n\n# get your base dataset files and folders structure\ndataset_structure = soda.get_dataset_structure()\n\n# fill out your dataset structure.\n# NOTE: YOu will want to reference the\n# dataset_structure key in the soda_schema.json file to understand the structure\n# and what is required.\ndataset_structure['folders'] = {\n    'data': {\n        'files': {\n            'file1': {\n                'path': '/home/user/file1.txt', 'relativePath': '/data/file1.txt', 'action': 'new'\n            }, \n            'file2': {\n                'path': '/home/user/file2.txt', 'relativePath': '/data/file2.txt', 'action': 'new'\n            }\n        }, \n        'folders': {\n            'primary': {\n                'files': {\n                    'file3': {\n                        'path': '/home/user/file3.txt', 'relativePath': '/data/primary/file3.txt', 'action': 'new'\n                    }\n                }\n            }\n        },\n        'relativePath': '/data'\n    },\n    'files': {},\n    'relativePath': '/'\n}\n\n\n# map your imported data files to the entity structure defined in the soda schema [here](soda_schema.py)\nentity_structure = soda.get_entity_structure()\n\n# fill out your entity structure using the schema as a reference\n# NOTE: data model not finalized\nentity = {'subjectId': 'sub-1', 'metadata': {'age': '1 year', 'sex': 'female'}, 'data-file': '/data/file1.txt'}\nentity_structure['subjects'].append(entity)\n\n\n\n```\n\n### Create your dataset metadata\n\n```python\n\n# import the metadata module from the soda_create package\nfrom pysoda import metadata\n\n# define your submission metadata\nsubmission = soda.get_submission_metadata()\n\nsubmission['consortium-data-standard'] = 'standard'\nsubmission['funding-consortium'] = 'SPARC'\nsubmission['award-number'] = '12345'\nsubmission['milestone-acheieved'] = ['one', 'two', 'three']\nsubmission['filepath'] = 'path/to/destination'\n\n# create the excel file for the submission metadata\nmetadata.submission.create(soda, file_output_location='path/to/output')\n\n\n# repeat\nmetadata.subjects.create(soda, file_output_location='path/to/output')\nmetadata.samples.create(soda, file_output_location='path/to/output')\nmetadata.performances.create(soda, file_output_location='path/to/output')\nmetadata.sites.create(soda, file_output_location='path/to/output')\nmetadata.code.create(soda, file_output_location='path/to/output')\nmetadata.manifest.create(soda, file_output_location='path/to/output')\n\n```\n\n### Generate your dataset\n\n#### Generate locally\n\n```python\n\nfrom pysoda import generate\n\n# set the generation options\nsoda.set_generate_dataset_options(destination='local', path='path/to/destination', dataset_name='my_dataset')\n\n# generate the dataset\ngenerate(soda)\n\n```\n\n#### Generate on Pennsieve\n\n```python\nfrom pysoda import generate\n\n# provide the Pennsieve API Key and secret\nsoda.upload.auth(api_key='api, api_secret='api_secret)\n\n# upload new dataset\n# NOTE: You will need to download and start the Pennsieve Agent [here](https://app.pennsieve.io) to upload data to Pennsieve\ndataset_id = generate(soda) # returns dataset_id\n\n# OR upload to an existing pennsieve dataset\n# set the generate options in the soda object\nsoda.set_generate_dataset_options(destination='existing-ps', if_existing=\"merge\", if_existing_files=\"replace\", dataset_id=dataset_id)\nupdate_existing(soda)\n```\n\n## Utilities\n\n### Compare a dataset on Pennsieve and a local dataset for differences\n\n```python\nfrom pysoda import compare\n\n# provide the Pennsieve API Key and secret\nsoda.upload.auth(api_key='api, api_secret='api_secret)\n\n# import the dataset from Pennsieve\nsoda.import_dataset(dataset_id='dataset_id')\n\n# compare the Pennsieve dataset with the local dataset\nresults = compare(soda, local_dataset_location='path/to/local/dataset')\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffairdataihub%2Fpysodafair","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffairdataihub%2Fpysodafair","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffairdataihub%2Fpysodafair/lists"}