{"id":24577870,"url":"https://github.com/alertadengue/pysus","last_synced_at":"2026-05-18T21:12:06.994Z","repository":{"id":39541531,"uuid":"63720586","full_name":"AlertaDengue/PySUS","owner":"AlertaDengue","description":"Library to download, clean and analyze openly available datasets from Brazilian Universal health system, SUS.","archived":false,"fork":false,"pushed_at":"2025-09-26T11:22:58.000Z","size":10622,"stargazers_count":196,"open_issues_count":25,"forks_count":71,"subscribers_count":21,"default_branch":"main","last_synced_at":"2025-09-26T13:23:16.989Z","etag":null,"topics":["data-science","geospatial","health"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlertaDengue.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["fccoelho"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2016-07-19T19:03:21.000Z","updated_at":"2025-09-26T11:23:04.000Z","dependencies_parsed_at":"2023-12-11T13:15:15.431Z","dependency_job_id":"caeccda4-b9e6-44e5-8d0b-1ca4a4058a04","html_url":"https://github.com/AlertaDengue/PySUS","commit_stats":null,"previous_names":[],"tags_count":40,"template":false,"template_full_name":null,"purl":"pkg:github/AlertaDengue/PySUS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlertaDengue%2FPySUS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlertaDengue%2FPySUS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlertaDengue%2FPySUS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlertaDengue%2FPySUS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlertaDengue","download_url":"https://codeload.github.com/AlertaDengue/PySUS/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlertaDengue%2FPySUS/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29629647,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T18:02:07.722Z","status":"ssl_error","status_checked_at":"2026-02-19T18:01:46.144Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","geospatial","health"],"created_at":"2025-01-23T23:56:26.518Z","updated_at":"2026-05-18T21:12:06.987Z","avatar_url":"https://github.com/AlertaDengue.png","language":"Python","funding_links":["https://github.com/sponsors/fccoelho"],"categories":[],"sub_categories":[],"readme":"# PySUS 2.0 is now available!\n\n[![DOI](https://zenodo.org/badge/63720586.svg)](https://zenodo.org/badge/latestdoi/63720586)\n[![release](https://github.com/AlertaDengue/PySUS/actions/workflows/release.yaml/badge.svg)](https://github.com/AlertaDengue/PySUS/actions/workflows/release.yaml)\n[![Documentation Status](https://readthedocs.org/projects/pysus/badge/?version=latest)](https://pysus.readthedocs.io/en/latest/?badge=latest)\n[![PyPI version](https://badge.fury.io/py/pysus.svg)](https://pypi.org/project/PySUS/)\n\nPySUS is a Python package for accessing and analyzing Brazil's public health data (DATASUS). It provides tools to download, process, and work with health datasets including SINAN (disease notifications), SIM (mortality), SINASC (births), SIH (hospitalizations), SIA (ambulatory), CIHA, CNES, PNI, and more.\n\n## What's New in PySUS 2.0\n\n- **Simplified API**: New high-level functions for direct DataFrame access\n- **CLI \u0026 TUI**: Launch the text-based user interface from command line\n- **Flexible Schema Modes**: Read multiple parquet files with union, intersection, or strict modes\n- **SQL Query**: Filter catalog queries by dataset, group, state, year, and month\n\n## Installation\n\n```bash\npip install pysus\n```\n\nFor DBC file support (requires libffi):\n```bash\n# Ubuntu/Debian\nsudo apt install libffi-dev\npip install pysus[dbc]\n```\n\nFor the terminal user interface (TUI):\n```bash\npip install pysus[tui]\n```\n\n### Docker\n\nA pre-built JupyterLab image is available on Docker Hub:\n\n```bash\ndocker pull alertadengue/pysus\ndocker run -p 8888:8888 alertadengue/pysus\n```\n\nOr build locally and start the container:\n\n```bash\ndocker compose -f docker/docker-compose.yaml up --build\n```\n\nThen open [http://127.0.0.1:8888/lab](http://127.0.0.1:8888/lab) in your browser.\n\nStop the container:\n\n```bash\ndocker compose -f docker/docker-compose.yaml down\n```\n\n## Quick Start\n\n### Simplified Database Functions (New in 2.0)\n\nThe easiest way to get data as a pandas DataFrame:\n\n```python\nfrom pysus import sinan, sinasc, sim, sih, sia, pni, ibge, cnes, ciha\n\n# Download SINAN Dengue data for 2000\ndf = sinan(disease=\"deng\", year=2000)\n\n# Multiple years\ndf = sinan(disease=\"deng\", year=[2023, 2024])\n\n# SINASC births for São Paulo, 2020-2023\ndf = sinasc(state=\"SP\", year=[2020, 2021, 2022, 2023])\n\n# SIM mortality data\ndf = sim(state=\"SP\", year=2024)\n\n# SIH hospitalizations with month\ndf = sih(state=\"SP\", year=2024, month=[1, 2, 3])\n\n# CNES health facilities\ndf = cnes(state=\"SP\", year=2024, month=1)\n```\n\n### Listing the files\n\nYou can also list the files within the dataset to check which files are available to download\n\n```python\nfrom pysus import list_files\n\nlist_files(\"SINAN\")\n```\n\n### Using the PySUS Client\n\n```python\nfrom pysus import PySUS\n\nasync def main():\n    async with PySUS() as pysus:\n        # Query DuckLake catalog\n        files = await pysus.query(\n            dataset=\"sinan\",\n            group=\"DENG\",\n            state=\"SP\",\n            year=2024,\n        )\n\n        # Download files\n        for f in files:\n            local = await pysus.download(f)\n            print(local.path)\n\n        # Read multiple parquet files\n        import glob\n        paths = glob.glob(\"/cache/sinan/**/*.parquet\")\n        df = pysus.read_parquet(paths, mode=\"union\").df()\n```\n\n### Using the TUI (unstable/under testing)\n\nLaunch the interactive text-based interface:\n\n```bash\npysus tui -l pt\n```\n\nOr from Python:\n\n```python\nfrom pysus.tui.app import PySUS\napp = PySUS(lang=\"pt\")\napp.run()\n```\n\n## Features\n\n- **Automatic Downloads**: Fetch data from FTP, DuckLake (S3), and dados.gov.br API\n- **Parquet Output**: All downloaded data is converted to Apache Parquet format\n- **DuckLake Integration**: S3-compatible cloud storage for parquet catalogs\n- **Local Catalog**: SQLite-based tracking of download history to avoid re-downloads\n- **Type Inference**: Automatic data type conversion from legacy formats (DBF, DBC)\n- **CLI with TUI**: Command-line interface with interactive text-based UI\n\n## Architecture\n\nPySUS 2.0 has a modular architecture:\n\n```\nPySUS\n├── FTP Client         # Traditional FTP-based datasets\n├── DadosGov Client   # dados.gov.br API access\n├── DuckLake Client   # S3 object storage for Parquet catalogs\n└── Database Functions # High-level functions (sinan, sinasc, sim, etc.)\n```\n\n### Database Functions\n\nNew in PySUS 2.0, these functions provide a simplified interface:\n\n| Function | Dataset | Parameters |\n|----------|---------|------------|\n| `sinan(disease, year)` | Disease Notifications | disease (e.g., \"DENG\", \"ZIKA\"), year |\n| `sinasc(state, year, group)` | Births | state, year, group (optional) |\n| `sim(state, year, group)` | Mortality | state, year, group (optional) |\n| `sih(state, year, month, group)` | Hospitalizations | state, year, month, group (optional) |\n| `sia(state, year, month, group)` | Ambulatory | state, year, month, group (optional) |\n| `pni(state, year, group)` | Immunizations | state, year, group (optional) |\n| `ibge(year, group)` | IBGE | year, group (optional) |\n| `cnes(state, year, month, group)` | Health Facilities | state, year, month, group (optional) |\n| `ciha(state, year, month)` | Hospital Admissions | state, year, month |\n\n### DuckLake Query\n\n```python\nasync with PySUS() as pysus:\n    # Filter by any combination of parameters\n    files = await pysus.query(\n        dataset=\"sinan\",      # dataset name\n        group=\"DENG\",         # disease group\n        state=\"SP\",           # state code\n        year=2024,            # year\n        month=1,              # month (optional)\n    )\n```\n\n### read_parquet Modes\n\n```python\n# Union mode (default) - includes all columns from any file\ndf = pysus.read_parquet(paths, mode=\"union\").df()\n\n# Intersection mode - only common columns across all files\ndf = pysus.read_parquet(paths, mode=\"intersection\").df()\n\n# Strict mode - raises error if schemas don't match\ndf = pysus.read_parquet(paths, mode=\"strict\").df()\n\n# With custom SQL\ndf = pysus.read_parquet(paths, sql=\"SELECT * WHERE column \u003e 100\").df()\n```\n\n## Configuration\n\n### Cache Directory\n\n```python\nfrom pysus import CACHEPATH\nimport os\n\nos.environ['PYSUS_CACHEPATH'] = '/my/custom/path'\n# or\npysus = PySUS(db_path='/my/config.db')\n```\n\n### Environment Variables\n\n- `PYSUS_CACHEPATH`: Directory for cached files\n\n## Data Sources\n\n| Dataset | Description | Source |\n|---------|-------------|--------|\n| SINAN | Disease Notifications | FTP / DuckLake |\n| SIM | Mortality | FTP / DuckLake |\n| SINASC | Births | FTP / DuckLake |\n| SIH | Hospitalizations | FTP / DuckLake |\n| SIA | Ambulatory | FTP / DuckLake |\n| CIHA | Hospital Admissions | FTP / DuckLake |\n| CNES | Health Facilities | FTP / DuckLake |\n| PNI | Immunizations | FTP / DuckLake |\n| IBGE | Geographic Data | FTP / DuckLake |\n\n\n## Development\n\n### Installation\n\n#### Using Conda\n```bash\nconda env create -f conda/dev.yaml\nconda activate pysus\n```\n\n#### Using Poetry\n```bash\npoetry install\n```\n\n### Running Tests\n\nRun code linters:\n```bash\npre-commit run --all-files\n```\n\nRun tests:\n```bash\npytest tests/\n```\n\nRun tests inside the Docker container:\n\n```bash\ndocker compose -f docker/docker-compose.yaml exec -T -w /usr/src jupyter python3 -m pytest pysus/tests/\n```\n\n## License\n\nGPL\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falertadengue%2Fpysus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falertadengue%2Fpysus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falertadengue%2Fpysus/lists"}