https://github.com/alertadengue/pysus
Library to download, clean and analyze openly available datasets from Brazilian Universal health system, SUS.
https://github.com/alertadengue/pysus
data-science geospatial health
Last synced: about 1 month ago
JSON representation
Library to download, clean and analyze openly available datasets from Brazilian Universal health system, SUS.
- Host: GitHub
- URL: https://github.com/alertadengue/pysus
- Owner: AlertaDengue
- License: gpl-3.0
- Created: 2016-07-19T19:03:21.000Z (almost 10 years ago)
- Default Branch: main
- Last Pushed: 2025-09-26T11:22:58.000Z (9 months ago)
- Last Synced: 2025-09-26T13:23:16.989Z (9 months ago)
- Topics: data-science, geospatial, health
- Language: Python
- Homepage:
- Size: 10.1 MB
- Stars: 196
- Watchers: 21
- Forks: 71
- Open Issues: 25
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# PySUS 2.0 is now available!
[](https://zenodo.org/badge/latestdoi/63720586)
[](https://github.com/AlertaDengue/PySUS/actions/workflows/release.yaml)
[](https://pysus.readthedocs.io/en/latest/?badge=latest)
[](https://pypi.org/project/PySUS/)
PySUS is a Python package for accessing and analyzing Brazil's public health data (DATASUS). It provides tools to download, process, and work with health datasets including SINAN (disease notifications), SIM (mortality), SINASC (births), SIH (hospitalizations), SIA (ambulatory), CIHA, CNES, PNI, and more.
## What's New in PySUS 2.0
- **Simplified API**: New high-level functions for direct DataFrame access
- **CLI & TUI**: Launch the text-based user interface from command line
- **Flexible Schema Modes**: Read multiple parquet files with union, intersection, or strict modes
- **SQL Query**: Filter catalog queries by dataset, group, state, year, and month
## Installation
```bash
pip install pysus
```
For DBC file support (requires libffi):
```bash
# Ubuntu/Debian
sudo apt install libffi-dev
pip install pysus[dbc]
```
For the terminal user interface (TUI):
```bash
pip install pysus[tui]
```
### Docker
A pre-built JupyterLab image is available on Docker Hub:
```bash
docker pull alertadengue/pysus
docker run -p 8888:8888 alertadengue/pysus
```
Or build locally and start the container:
```bash
docker compose -f docker/docker-compose.yaml up --build
```
Then open [http://127.0.0.1:8888/lab](http://127.0.0.1:8888/lab) in your browser.
Stop the container:
```bash
docker compose -f docker/docker-compose.yaml down
```
## Quick Start
### Simplified Database Functions (New in 2.0)
The easiest way to get data as a pandas DataFrame:
```python
from pysus import sinan, sinasc, sim, sih, sia, pni, ibge, cnes, ciha
# Download SINAN Dengue data for 2000
df = sinan(disease="deng", year=2000)
# Multiple years
df = sinan(disease="deng", year=[2023, 2024])
# SINASC births for São Paulo, 2020-2023
df = sinasc(state="SP", year=[2020, 2021, 2022, 2023])
# SIM mortality data
df = sim(state="SP", year=2024)
# SIH hospitalizations with month
df = sih(state="SP", year=2024, month=[1, 2, 3])
# CNES health facilities
df = cnes(state="SP", year=2024, month=1)
```
### Listing the files
You can also list the files within the dataset to check which files are available to download
```python
from pysus import list_files
list_files("SINAN")
```
### Using the PySUS Client
```python
from pysus import PySUS
async def main():
async with PySUS() as pysus:
# Query DuckLake catalog
files = await pysus.query(
dataset="sinan",
group="DENG",
state="SP",
year=2024,
)
# Download files
for f in files:
local = await pysus.download(f)
print(local.path)
# Read multiple parquet files
import glob
paths = glob.glob("/cache/sinan/**/*.parquet")
df = pysus.read_parquet(paths, mode="union").df()
```
### Using the TUI (unstable/under testing)
Launch the interactive text-based interface:
```bash
pysus tui -l pt
```
Or from Python:
```python
from pysus.tui.app import PySUS
app = PySUS(lang="pt")
app.run()
```
## Features
- **Automatic Downloads**: Fetch data from FTP, DuckLake (S3), and dados.gov.br API
- **Parquet Output**: All downloaded data is converted to Apache Parquet format
- **DuckLake Integration**: S3-compatible cloud storage for parquet catalogs
- **Local Catalog**: SQLite-based tracking of download history to avoid re-downloads
- **Type Inference**: Automatic data type conversion from legacy formats (DBF, DBC)
- **CLI with TUI**: Command-line interface with interactive text-based UI
## Architecture
PySUS 2.0 has a modular architecture:
```
PySUS
├── FTP Client # Traditional FTP-based datasets
├── DadosGov Client # dados.gov.br API access
├── DuckLake Client # S3 object storage for Parquet catalogs
└── Database Functions # High-level functions (sinan, sinasc, sim, etc.)
```
### Database Functions
New in PySUS 2.0, these functions provide a simplified interface:
| Function | Dataset | Parameters |
|----------|---------|------------|
| `sinan(disease, year)` | Disease Notifications | disease (e.g., "DENG", "ZIKA"), year |
| `sinasc(state, year, group)` | Births | state, year, group (optional) |
| `sim(state, year, group)` | Mortality | state, year, group (optional) |
| `sih(state, year, month, group)` | Hospitalizations | state, year, month, group (optional) |
| `sia(state, year, month, group)` | Ambulatory | state, year, month, group (optional) |
| `pni(state, year, group)` | Immunizations | state, year, group (optional) |
| `ibge(year, group)` | IBGE | year, group (optional) |
| `cnes(state, year, month, group)` | Health Facilities | state, year, month, group (optional) |
| `ciha(state, year, month)` | Hospital Admissions | state, year, month |
### DuckLake Query
```python
async with PySUS() as pysus:
# Filter by any combination of parameters
files = await pysus.query(
dataset="sinan", # dataset name
group="DENG", # disease group
state="SP", # state code
year=2024, # year
month=1, # month (optional)
)
```
### read_parquet Modes
```python
# Union mode (default) - includes all columns from any file
df = pysus.read_parquet(paths, mode="union").df()
# Intersection mode - only common columns across all files
df = pysus.read_parquet(paths, mode="intersection").df()
# Strict mode - raises error if schemas don't match
df = pysus.read_parquet(paths, mode="strict").df()
# With custom SQL
df = pysus.read_parquet(paths, sql="SELECT * WHERE column > 100").df()
```
## Configuration
### Cache Directory
```python
from pysus import CACHEPATH
import os
os.environ['PYSUS_CACHEPATH'] = '/my/custom/path'
# or
pysus = PySUS(db_path='/my/config.db')
```
### Environment Variables
- `PYSUS_CACHEPATH`: Directory for cached files
## Data Sources
| Dataset | Description | Source |
|---------|-------------|--------|
| SINAN | Disease Notifications | FTP / DuckLake |
| SIM | Mortality | FTP / DuckLake |
| SINASC | Births | FTP / DuckLake |
| SIH | Hospitalizations | FTP / DuckLake |
| SIA | Ambulatory | FTP / DuckLake |
| CIHA | Hospital Admissions | FTP / DuckLake |
| CNES | Health Facilities | FTP / DuckLake |
| PNI | Immunizations | FTP / DuckLake |
| IBGE | Geographic Data | FTP / DuckLake |
## Development
### Installation
#### Using Conda
```bash
conda env create -f conda/dev.yaml
conda activate pysus
```
#### Using Poetry
```bash
poetry install
```
### Running Tests
Run code linters:
```bash
pre-commit run --all-files
```
Run tests:
```bash
pytest tests/
```
Run tests inside the Docker container:
```bash
docker compose -f docker/docker-compose.yaml exec -T -w /usr/src jupyter python3 -m pytest pysus/tests/
```
## License
GPL