https://github.com/efrod/bca-survival-analyzer
A tool that helps you performing survival analysis on body composition data.
https://github.com/efrod/bca-survival-analyzer
python statistics survival-analysis
Last synced: 6 months ago
JSON representation
A tool that helps you performing survival analysis on body composition data.
- Host: GitHub
- URL: https://github.com/efrod/bca-survival-analyzer
- Owner: eFroD
- License: mit
- Created: 2025-12-08T10:40:21.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-12-16T22:47:55.000Z (6 months ago)
- Last Synced: 2026-01-13T19:44:54.823Z (6 months ago)
- Topics: python, statistics, survival-analysis
- Language: Python
- Homepage:
- Size: 5.19 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
[](https://github.com/eFroD/bca-survival-analyzer/actions/workflows/ci.yml)
[](https://codecov.io/github/eFroD/bca-survival-analyzer)
[](https://eFroD.github.io/bca-survival-analyzer/)
[](https://pypi.org/project/bca-survival/)
[](https://pypi.org/project/bca-survival/)
# Survival Analysis Package
A Python package for analyzing survival data with a focus on body composition assessment. It was designed to utilize the results obtained by the [BOA - Body and Organ Analysis](https://github.com/UMEssen/Body-and-Organ-Analysis) workflow. In this repository we provide tools to reorganize the result of this algorithm to merge it to the patient table, add tools for data cleaning and a [lifelines](https://zenodo.org/records/10456828) wrapper for automatical explorative anaylsis on survival outcomes given the Body-Composition results.
## Features
- **Survival Analysis**: Cox proportional hazards regression and Kaplan-Meier survival curves
- **Body Composition Analysis**: Tools for processing and analyzing BCA data
- **BOA Extractor**: Command-line tool for extracting measurements from BOA data
- **Data Preprocessing**: Utilities for cleaning and preparing survival data
- **CLI Tools**: Command-line utilities for data merging, format conversion, and PDF encryption
## Installation
```bash
pip install bca-survival
```
## Usage
### Basic Survival Analysis
```python
from bca_survival.analyzer import BCASurvivalAnalyzer
# Load your data, sharing the same identifiers
df_main = pd.read_csv('clinical_data.csv')
df_measurements = pd.read_csv('bca_measurements.csv')
# Initialize the analyzer
analyzer = BCASurvivalAnalyzer(
df_main, df_measurements,
main_id_col='patient_id', measurement_id_col='id',
start_date_col='diagnosis_date', event_date_col='event_date', event_col='event_status'
)
# Perform univariate analysis
columns = ['l5::WL::imat::mean_ml', 'l5::WL::tat::mean_ml', 'age', 'gender']
results = analyzer.univariate_cox_regression(columns)
# Generate Kaplan-Meier plot
analyzer.kaplan_meier_plot('l5::WL::imat::mean_ml', split_strategy='median')
# Perform multivariate analysis
model = analyzer.multivariate_cox_regression(columns)
```
## Command-Line Tools
The package includes several command-line tools for common data processing tasks:
### BOA Extractor
Extract measurements from BOA (Body Composition Assessment) data:
```bash
boa-extract /path/to/data /path/to/output
```
**Purpose**: Processes BOA data files and extracts relevant measurements for survival analysis.
**Arguments**:
- `data_path`: Path to the directory containing BOA data files
- `output_path`: Path where extracted measurements will be saved
---
### BCA Merger
Merge two Excel files based on ID columns:
```bash
bca-merge
```
**Purpose**: Combines clinical data with body composition measurements by matching on ID columns.
**Arguments**:
- `first_file`: Path to the first Excel file (e.g., clinical data)
- `second_file`: Path to the second Excel file (e.g., BCA measurements)
- `id_column_name`: Name of the ID column in the first file to match with 'StudyID' in the second file
**Example**:
```bash
bca-merge clinical_data.xlsx bca_measurements.xlsx patient_id
```
**Output**: Creates a file named `{first_file}_merged.xlsx` with:
- All rows from both files (outer join)
- Matched records combined into single rows
- Date columns formatted as DD.MM.YYYY
- No duplicate StudyID columns
**Notes**:
- The second file must have a column named 'StudyID'
- Uses outer merge to preserve all data from both files
- Automatically removes duplicate ID columns
---
### Survival Result Converter
Convert Excel files to multiple formats (PDF, CSV, TXT):
```bash
survival-result-converter [directory]
```
**Purpose**: Batch converts Excel files to multiple formats for reporting and data sharing.
**Arguments**:
- `directory`: Directory to scan for Excel files (default: current directory)
**Example**:
```bash
# Convert all Excel files in current directory
survival-result-converter
# Convert Excel files in specific directory
survival-result-converter /path/to/results
```
**Output Structure**:
```
directory/
├── PDF/
│ ├── file1.pdf
│ └── file2.pdf
├── CSV/
│ ├── file1.csv
│ ├── file2_sheet1.csv
│ └── file2_sheet2.csv
└── TXT/
├── file1.txt
└── file2.txt
```
**Features**:
- Recursively processes all `.xlsx` files in the directory tree
- Creates separate output folders (PDF, CSV, TXT)
- For multi-sheet Excel files:
- PDF: All sheets in single file
- CSV: Separate file per sheet
- TXT: All sheets in single file with separators
- PDF generation supports two methods:
- Windows: Uses COM automation for high-quality output
- Cross-platform: Uses fpdf library with automatic column sizing
**PDF Features**:
- Landscape orientation for better table visibility
- Automatic column width adjustment
- Fits tables to page width
- Handles large tables (up to 1000 rows per sheet)
- Text wrapping for long content
---
### PDF Report Extractor
Encrypt and organize PDF files from a directory tree:
```bash
pdf-report-extractor
```
**Purpose**: Finds PDF files in a directory structure, copies them with standardized names, and encrypts them for secure distribution.
**Arguments**:
- `input_path`: Root directory to search for PDF files
- `output_path`: Destination directory for encrypted PDFs
- `password`: Password to encrypt the PDFs with
**Example**:
```bash
pdf-report-extractor /data/patient_reports /encrypted_reports MySecureP@ss123
```
**Behavior**:
- Recursively searches for all `.pdf` files
- Copies files to destination with naming pattern: `encrypted_{parent_folder_name}.pdf`
- Encrypts each file using user password protection
- Requires `pdftk` to be installed
**Check pdftk Installation**:
```bash
pdf-report-extractor --check-pdftk
```
**Installing pdftk**:
- Ubuntu/Debian: `sudo apt-get install pdftk`
- macOS: `brew install pdftk-java`
- Windows: Download from [PDFtk website](https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/)
**Output Summary**:
```
Processing: /data/patient_reports/folder1/report.pdf
-> /encrypted_reports/encrypted_folder1.pdf
-> Encrypted successfully
Processing complete:
- Files processed successfully: 15
- Errors: 0
```
**Notes**:
- Original files remain unchanged
- If encryption fails, the unencrypted copy is removed from destination
- Parent folder name is used for output filename (one level up from the PDF)
---
## Documentation
Refer to the documentation in the `docs/` directory for detailed information:
1. Install the package with documentation dependencies:
```bash
pip install -e ".[docs]"
```
2. Build the documentation on Windows:
```bash
cd docs
make.bat html
```
Or on Linux/macOS:
```bash
cd docs
make html
```
3. Open `docs/build/html/index.html` in your browser
## Development
Clone the repository and install in development mode:
```bash
git clone https://gitlab.com/your-group/survival-analysis.git
cd survival-analysis
pip install -e ".[dev]"
```
## Requirements
### Core Dependencies
- pandas
- openpyxl (for Excel file handling)
- lifelines (for survival analysis)
### Optional Dependencies
- **For PDF conversion** (survival-result-converter):
- Windows: pywin32
- Cross-platform: fpdf, openpyxl
- **For PDF encryption** (pdf-report-extractor):
- pdftk (external dependency)
## Common Workflows
### Workflow 1: Complete Data Processing Pipeline
```bash
# 1. Merge clinical and BCA data
bca-merge clinical.xlsx measurements.xlsx PatientID
# 2. Perform survival analysis (Python)
# ... (use BCASurvivalAnalyzer)
# 3. Convert results to multiple formats
survival-result-converter ./results
# 4. Encrypt PDF reports for distribution
pdf-report-extractor ./results/PDF ./encrypted_reports SecurePassword123
```
### Workflow 2: Quick Data Conversion
```bash
# Convert a directory of Excel results to PDF
survival-result-converter /path/to/results
# PDFs are created in /path/to/results/PDF/
```