{"id":47246692,"url":"https://github.com/ywatanabe1989/scitex-dataset","last_synced_at":"2026-04-30T23:00:25.951Z","repository":{"id":335908198,"uuid":"1145110262","full_name":"ywatanabe1989/scitex-dataset","owner":"ywatanabe1989","description":"Multi-domain scientific dataset fetcher — neuroscience, biology, pharmacology, medical. Part of SciTeX.","archived":false,"fork":false,"pushed_at":"2026-04-23T18:46:33.000Z","size":511,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-23T20:23:12.226Z","etag":null,"topics":["ai-research","bids","dandi","data-discovery","datasets","eeg","mcp","mcp-server","metadata","mri","neuroimaging","neuroscience","nwb","openneuro","physionet","python","research-automation","scientific-data","scitex","zenodo"],"latest_commit_sha":null,"homepage":"https://scitex.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ywatanabe1989.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":"CLA.md"}},"created_at":"2026-01-29T12:44:29.000Z","updated_at":"2026-03-28T04:41:16.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ywatanabe1989/scitex-dataset","commit_stats":null,"previous_names":["ywatanabe1989/scitex-dataset"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/ywatanabe1989/scitex-dataset","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ywatanabe1989%2Fscitex-dataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ywatanabe1989%2Fscitex-dataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ywatanabe1989%2Fscitex-dataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ywatanabe1989%2Fscitex-dataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ywatanabe1989","download_url":"https://codeload.github.com/ywatanabe1989/scitex-dataset/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ywatanabe1989%2Fscitex-dataset/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32479448,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"ssl_error","status_checked_at":"2026-04-30T13:12:06.837Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-research","bids","dandi","data-discovery","datasets","eeg","mcp","mcp-server","metadata","mri","neuroimaging","neuroscience","nwb","openneuro","physionet","python","research-automation","scientific-data","scitex","zenodo"],"created_at":"2026-03-14T07:19:36.194Z","updated_at":"2026-04-30T23:00:25.922Z","avatar_url":"https://github.com/ywatanabe1989.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SciTeX Dataset (\u003ccode\u003escitex-dataset\u003c/code\u003e)\n\n\u003c!-- scitex-badges:start --\u003e\n[![PyPI](https://img.shields.io/pypi/v/scitex-dataset.svg)](https://pypi.org/project/scitex-dataset/)\n[![Python](https://img.shields.io/pypi/pyversions/scitex-dataset.svg)](https://pypi.org/project/scitex-dataset/)\n[![Tests](https://github.com/ywatanabe1989/scitex-dataset/actions/workflows/test.yml/badge.svg)](https://github.com/ywatanabe1989/scitex-dataset/actions/workflows/test.yml)\n[![Install Test](https://github.com/ywatanabe1989/scitex-dataset/actions/workflows/install-test.yml/badge.svg)](https://github.com/ywatanabe1989/scitex-dataset/actions/workflows/install-test.yml)\n[![Coverage](https://codecov.io/gh/ywatanabe1989/scitex-dataset/graph/badge.svg)](https://codecov.io/gh/ywatanabe1989/scitex-dataset)\n[![Docs](https://readthedocs.org/projects/scitex-dataset/badge/?version=latest)](https://scitex-dataset.readthedocs.io/en/latest/)\n[![License: AGPL v3](https://img.shields.io/badge/license-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)\n\u003c!-- scitex-badges:end --\u003e\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://scitex.ai\"\u003e\n    \u003cimg src=\"docs/scitex-logo-blue-cropped.png\" alt=\"SciTeX\" width=\"400\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003cb\u003eUnified access to neuroscience and scientific datasets\u003c/b\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://badge.fury.io/py/scitex-dataset\"\u003e\u003cimg src=\"https://badge.fury.io/py/scitex-dataset.svg\" alt=\"PyPI version\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://scitex-dataset.readthedocs.io/\"\u003e\u003cimg src=\"https://readthedocs.org/projects/scitex-dataset/badge/?version=latest\" alt=\"Documentation\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/ywatanabe1989/scitex-dataset/actions/workflows/test.yml\"\u003e\u003cimg src=\"https://github.com/ywatanabe1989/scitex-dataset/actions/workflows/test.yml/badge.svg\" alt=\"Tests\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://www.gnu.org/licenses/agpl-3.0\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-AGPL--3.0-blue.svg\" alt=\"License: AGPL-3.0\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://scitex-dataset.readthedocs.io/\"\u003eFull Documentation\u003c/a\u003e · \u003ccode\u003epip install scitex-dataset\u003c/code\u003e\n\u003c/p\u003e\n\n---\n\n\u003e **Interfaces:** Python ⭐⭐⭐ (primary) · CLI ⭐ · MCP ⭐⭐ · Skills ⭐⭐ · Hook — · HTTP —\n\n## Problem and Solution\n\n\n| # | Problem | Solution |\n|---|---------|----------|\n| 1 | **Public dataset repositories balkanized** -- OpenNeuro (BIDS) + DANDI (NWB) + PhysioNet (WFDB) + Zenodo (generic) + GEO / ChEMBL / ClinicalTrials — different APIs, auth, download tools | **Unified fetcher** -- `stx.dataset.neuroscience.openneuro.fetch_all_datasets()` same call shape across all; local FTS5 search across metadata |\n| 2 | **\"Download this BIDS dataset\" means reading DataLad docs first** -- the barrier is tooling, not knowledge | **One-line fetch** -- no DataLad setup; the module handles auth, resumption, checksums transparently |\n\n## Problem\n\nNeuroscience datasets are scattered across multiple repositories -- OpenNeuro, DANDI Archive, PhysioNet, Zenodo -- each with its own API, data format, and query interface. Researchers waste time navigating incompatible APIs to discover relevant data. AI agents lack a unified way to search and evaluate datasets programmatically.\n\n## Solution\n\nSciTeX Dataset provides a **single Python API, CLI, and MCP (Model Context Protocol) server** to discover and query metadata from major scientific data repositories. It focuses on fast metadata retrieval without downloading full datasets.\n\n| Repository | Description | Data Types |\n|------------|-------------|------------|\n| **OpenNeuro** | Open platform for sharing neuroimaging data | MRI, EEG, MEG, iEEG, PET |\n| **DANDI** | BRAIN Initiative data archive | Electrophysiology, Ophys |\n| **PhysioNet** | Physiological signal databases | ECG, EEG, clinical data |\n| **Zenodo** | General scientific data repository (CERN) | Any research data |\n\n\u003cp align=\"center\"\u003e\u003csub\u003e\u003cb\u003eTable 1.\u003c/b\u003e Supported data repositories. Each source is queried via its public API; no authentication required for metadata access.\u003c/sub\u003e\u003c/p\u003e\n\n## Installation\n\nRequires Python \u003e= 3.10.\n\n```bash\npip install scitex-dataset\n```\n\n\u003e **MCP support**: `pip install scitex-dataset[mcp]`\n\n## Quick Start\n\n```python\nfrom scitex_dataset import fetch_all_datasets, format_dataset\n\n# Fetch datasets from OpenNeuro\ndatasets = fetch_all_datasets(max_datasets=10)\n\n# Format for analysis\nfor ds in datasets:\n    formatted = format_dataset(ds)\n    print(f\"{formatted['id']}: {formatted['name']} ({formatted['n_subjects']} subjects)\")\n```\n\n## Four Interfaces\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003ePython API\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n```python\nfrom scitex_dataset import fetch_all_datasets, format_dataset, search_datasets, sort_datasets\nfrom scitex_dataset import neuroscience, database\n\n# Fetch from specific sources\ndatasets = fetch_all_datasets(max_datasets=100)                    # OpenNeuro\ndandi_ds = neuroscience.dandi.fetch_all_datasets(max_datasets=50)  # DANDI\nphys_ds = neuroscience.physionet.fetch_all_datasets()              # PhysioNet\n\n# Search and filter\neeg_datasets = search_datasets(datasets, modality=\"eeg\", min_subjects=20)\npopular = sort_datasets(datasets, by=\"downloads\", descending=True)\n\n# Local database for fast full-text search\ndatabase.build()                                        # index all sources\nresults = database.search(\"alzheimer EEG\", min_subjects=20)\n```\n\n\u003e **[Full API reference](https://scitex-dataset.readthedocs.io/)**\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eCLI Commands\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\n```bash\nscitex-dataset --help-recursive             # Show all commands\n\n# Fetch from repositories\nscitex-dataset openneuro -n 100 -o datasets.json -v\nscitex-dataset dandi -n 50 -o dandi.json -v\nscitex-dataset physionet -n 50 -v\nscitex-dataset zenodo -q \"neuroscience\" -n 20\n\n# Local database\nscitex-dataset db build                     # index all sources\nscitex-dataset db search \"epilepsy EEG\"     # full-text search\nscitex-dataset db stats                     # show statistics\n\n# Introspection\nscitex-dataset list-python-apis -v          # list Python API tree\nscitex-dataset mcp list-tools -v            # list MCP tools\n```\n\n\u003e **[Full CLI reference](https://scitex-dataset.readthedocs.io/)**\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eMCP Server -- for AI Agents\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\nAI agents can discover and query neuroscience datasets autonomously.\n\n| Tool | Description |\n|------|-------------|\n| `dataset_openneuro_fetch` | Fetch datasets from OpenNeuro |\n| `dataset_dandi_fetch` | Fetch datasets from DANDI Archive |\n| `dataset_physionet_fetch` | Fetch datasets from PhysioNet |\n| `dataset_zenodo_fetch` | Fetch datasets from Zenodo |\n| `dataset_search` | Filter datasets by modality, subjects, etc. |\n| `dataset_list_sources` | List available data repositories |\n| `dataset_db_build` | Build local search database |\n| `dataset_db_search` | Full-text search across all sources |\n| `dataset_db_stats` | Database statistics |\n\n\u003csub\u003e\u003cb\u003eTable 2.\u003c/b\u003e Nine MCP tools available for AI-assisted dataset discovery. All tools accept JSON parameters and return JSON results.\u003c/sub\u003e\n\n```bash\nscitex-dataset mcp start\n```\n\n\u003e **[Full MCP specification](https://scitex-dataset.readthedocs.io/)**\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eSkills — for AI Agent Discovery\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cbr\u003e\n\nSkills provide workflow-oriented guides that AI agents query to discover capabilities and usage patterns.\n\n```bash\nscitex-dataset skills list              # List available skill pages\nscitex-dataset skills get SKILL         # Show main skill page\nscitex-dev skills export --package scitex-dataset  # Export to Claude Code\n```\n\n| Skill | Content |\n|-------|---------|\n| `quick-start` | Basic usage |\n| `data-sources` | OpenNeuro, DANDI, PhysioNet |\n| `cli-reference` | CLI commands |\n| `mcp-tools` | MCP tools for AI agents |\n\n\u003c/details\u003e\n\n## Part of SciTeX\n\nSciTeX Dataset is part of [**SciTeX**](https://scitex.ai). When used inside the SciTeX framework, dataset discovery integrates with reproducible research sessions:\n\n```python\nimport scitex\nfrom scitex_dataset import fetch_all_datasets, format_dataset\n\n@scitex.session\ndef main(logger=scitex.INJECTED):\n    datasets = fetch_all_datasets(max_datasets=100, logger=logger)\n    formatted = [format_dataset(ds) for ds in datasets]\n    scitex.io.save(formatted, \"openneuro_datasets.json\")\n    return 0\n```\n\nThe SciTeX ecosystem follows the Four Freedoms for Research, inspired by [the Free Software Definition](https://www.gnu.org/philosophy/free-sw.en.html):\n\n\u003eFour Freedoms for Research\n\u003e\n\u003e0. The freedom to **run** your research anywhere -- your machine, your terms.\n\u003e1. The freedom to **study** how every step works -- from raw data to final manuscript.\n\u003e2. The freedom to **redistribute** your workflows, not just your papers.\n\u003e3. The freedom to **modify** any module and share improvements with the community.\n\u003e\n\u003eAGPL-3.0 -- because we believe research infrastructure deserves the same freedoms as the software it runs on.\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://scitex.ai\" target=\"_blank\"\u003e\u003cimg src=\"docs/scitex-icon-navy-inverted.png\" alt=\"SciTeX\" width=\"40\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003c!-- EOF --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fywatanabe1989%2Fscitex-dataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fywatanabe1989%2Fscitex-dataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fywatanabe1989%2Fscitex-dataset/lists"}