https://github.com/cincibrainlab/autocleaneeg-icvision

Automated ICA artifact classification and removal for EEG data using OpenAI Vision API. Generates component visualizations, classifies artifacts, and produces cleaned datasets with detailed reports.
https://github.com/cincibrainlab/autocleaneeg-icvision

artifact-removal automation eeg eeglab ica machine-learning mne-python reproducibility vision-api

Last synced: 4 months ago
JSON representation

Automated ICA artifact classification and removal for EEG data using OpenAI Vision API. Generates component visualizations, classifies artifacts, and produces cleaned datasets with detailed reports.

Host: GitHub
URL: https://github.com/cincibrainlab/autocleaneeg-icvision
Owner: cincibrainlab
License: mit
Created: 2025-05-23T12:43:58.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2026-01-14T22:14:24.000Z (6 months ago)
Last Synced: 2026-01-15T04:43:54.527Z (6 months ago)
Topics: artifact-removal, automation, eeg, eeglab, ica, machine-learning, mne-python, reproducibility, vision-api
Language: Python
Homepage: https://github.com/cincibrainlab/autocleaneeg-icvision
Size: 789 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 4
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY_AUDIT_REPORT.md
- Agents: AGENTS.md

Awesome Lists containing this project

README

# Autoclean EEG ICVision (Standalone)

[![PyPI version](https://badge.fury.io/py/autoclean-icvision.svg)](https://badge.fury.io/py/autoclean-icvision)
[![Python versions](https://img.shields.io/pypi/pyversions/autoclean-icvision.svg)](https://pypi.org/project/autoclean-icvision/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

Automated ICA component classification for EEG data using OpenAI's Vision API.

## Overview

ICVision automates the tedious process of classifying ICA components from EEG data by generating component visualizations and sending them to OpenAI's Vision API for intelligent artifact identification.

**Workflow**: Raw EEG + ICA → Generate component plots → OpenAI Vision classification → Automated artifact removal → Clean EEG data

**Key Features**:
- Automated classification of 7 component types (brain, eye, muscle, heart, line noise, channel noise, other)
- **🔄 Drop-in replacement for MNE-ICALabel**: Same API, enhanced with OpenAI Vision
- **⚡ Strip Layout Mode**: Batch 9 components per API call for **88% cost reduction**
- **🌐 Custom Endpoint Support**: Use OpenAI-compatible APIs (e.g., CLIProxy, Azure OpenAI)
- Multi-panel component plots (topography, time series, PSD, ERP-image)
- MNE-Python integration with `.fif` and `.set` file support
- **EEGLAB .set file auto-detection**: Single file input with automatic ICA detection
- **Smart file organization**: Basename-prefixed output files prevent overwrites when processing multiple datasets
- **Continuous data only**: Graceful error handling for epoched data with helpful conversion instructions
- **Enhanced PDF reports**: Professional dual-header layout with color-coded classification results
- **OpenAI cost tracking**: Automatic cost estimation and logging for budget monitoring
- Parallel processing with configurable batch sizes
- Command-line and Python API interfaces
- Comprehensive PDF reports and CSV results

## Installation

```bash
pip install autocleaneeg-icvision
```

**Requirements**: Python 3.8+ and OpenAI API key with vision model access (e.g., `gpt-4.1`)

```bash
export OPENAI_API_KEY='your_api_key_here'
```

## Usage

### Command-Line Interface (CLI)

The primary way to use ICVision is through its command-line interface.

**Basic Usage:**

**Single EEGLAB .set file (Recommended):**
```bash
autoclean-icvision /path/to/your_data.set
# or legacy command: icvision /path/to/your_data.set
```

**Separate files:**
```bash
autoclean-icvision /path/to/your_raw_data.set /path/to/your_ica_decomposition.fif
# or legacy command: icvision /path/to/your_raw_data.set /path/to/your_ica_decomposition.fif
```

ICVision can automatically detect and read ICA data from EEGLAB `.set` files, making single-file usage possible when your `.set` file contains both raw data and ICA decomposition.

This command will:
1. Load the raw EEG data and ICA solution (auto-detected from `.set` file or from separate files).
2. Classify components using the default settings.
3. Create an `autoclean_icvision_results/` directory in your current working directory.
4. Save the following into the output directory (with input filename prefix for organization):
* Cleaned raw data (artifacts removed): `{basename}_icvis_cleaned_raw.{format}`
* Updated ICA object with component labels: `{basename}_icvis_classified_ica.fif`
* `{basename}_icvis_results.csv` detailing classifications for each component.
* `{basename}_icvis_summary.txt` with overall statistics.
* `{basename}_icvis_report_all_comps.pdf` (comprehensive PDF report with visualizations).

**Note**: `{basename}` is extracted from your input filename (e.g., `sub-01_task-rest_eeg.set` → `sub-01_task-rest_eeg` prefix). This prevents file overwrites when processing multiple datasets.

### Recent Improvements

**Enhanced File Organization (v2024.12)**:
- **Shared workspace**: All results now saved to `autoclean_icvision_results/` directory by default
- **Smart naming**: Input filename prefixes (e.g., `sub-01_task-rest_eeg_icvis_results.csv`) prevent conflicts
- **Multi-file friendly**: Process multiple datasets without overwrites - perfect for batch processing subjects

**Improved User Experience**:
- **Epoched data handling**: Clear error messages with EEGLAB conversion instructions for unsupported epoched data
- **Enhanced PDF reports**: Professional layout with IC Component titles and color-coded Vision Classification results
- **Clean logging output**: Professional, user-focused logging with optional verbose mode for debugging
- **Better error messages**: Informative CLI output with suggested solutions

### Strip Layout Mode (New in v0.3.0)

Strip layout batches multiple ICA components into a single image, reducing API calls by ~88%:

```bash
# Single-image mode (default, 1 API call per component)
autoclean-icvision data.set

# Strip mode (9 components per API call, 88% fewer calls)
autoclean-icvision data.set --layout strip
```

**Performance comparison** (24 components):

| Mode | API Calls | Time | Cost |
|------|-----------|------|------|
| Single | 24 | ~85s | ~$0.29 |
| Strip | 3 | ~50s | ~$0.04 |

Strip mode is recommended for production pipelines. Classification accuracy is comparable to single-image mode.

### Custom Endpoint Support (New in v0.2.1)

Use OpenAI-compatible endpoints like CLIProxy or Azure OpenAI:

```bash
# Using environment variables (recommended)
export OPENAI_BASE_URL="https://your-proxy.example.com/v1"
export OPENAI_API_KEY="your-api-key"
autoclean-icvision data.set --layout strip

# Or via CLI flags
autoclean-icvision data.set \
--base-url https://your-proxy.example.com/v1 \
--api-key your-api-key \
--model gpt-5.2 \
--layout strip
```

**Common Options (with defaults):**

* `--api-key YOUR_API_KEY`: Specify OpenAI API key (default: `OPENAI_API_KEY` env variable)
* `--base-url URL`: Custom API endpoint (default: OpenAI, or `OPENAI_BASE_URL` env variable)
* `--output-dir /path/to/output/`: Output directory (default: `./autoclean_icvision_results`)
* `--model MODEL_NAME`: OpenAI model (default: `gpt-4.1`, supports `gpt-5.2`)
* `--layout single|strip`: Classification layout mode (default: `single`)
* `--strip-size 9`: Components per strip image when `--layout=strip` (default: `9`)
* `--reasoning-effort none|low|medium|high`: Reasoning effort for gpt-5.x models (default: proxy default)
* `--confidence-threshold 0.8`: Confidence threshold for auto-exclusion (default: `0.8`)
* `--psd-fmax 45`: Maximum frequency for PSD plots in Hz (default: `45`)
* `--labels-to-exclude eye muscle heart`: Artifact labels to exclude (default: all non-brain types)
* `--batch-size 10`: Components per API request (default: `10`)
* `--max-concurrency 4`: Max parallel requests (default: `4`)
* `--no-auto-exclude`: Disable auto-exclusion (default: auto-exclude enabled)
* `--prompt-file /path/to/prompt.txt`: Custom classification prompt (default: built-in prompt)
* `--no-report`: Disable PDF report (default: report generation enabled)
* `--verbose`: Enable detailed logging (default: standard logging)
* `--version`: Show ICVision version
* `--help`: Show full list of commands and options

**Examples with options:**

Single .set file usage:
```bash
autoclean-icvision data/subject01_eeg.set \
--api-key sk-xxxxxxxxxxxxxxxxxxxx \
--confidence-threshold 0.9 \
--verbose
```

Traditional separate files:
```bash
autoclean-icvision data/subject01_raw.fif data/subject01_ica.fif \
--api-key sk-xxxxxxxxxxxxxxxxxxxx \
--model gpt-4.1 \
--confidence-threshold 0.8 \
--labels-to-exclude eye muscle line_noise channel_noise \
--batch-size 8 \
--verbose
```

For ERP studies with low-pass filtered data:
```bash
autoclean-icvision data/erp_study.set \
--psd-fmax 40 \
--confidence-threshold 0.85 \
--verbose
```

Multi-file batch processing:
```bash
# Process multiple subjects - all results go to shared directory
autoclean-icvision data/sub-01_task-rest_eeg.set --verbose
autoclean-icvision data/sub-02_task-rest_eeg.set --verbose
autoclean-icvision data/sub-03_task-rest_eeg.set --verbose

# Results organized in autoclean_icvision_results/ with prefixed filenames
ls autoclean_icvision_results/
# sub-01_task-rest_eeg_icvis_results.csv
# sub-01_task-rest_eeg_icvis_classified_ica.fif
# sub-02_task-rest_eeg_icvis_results.csv
# sub-02_task-rest_eeg_icvis_classified_ica.fif
# ...
```

### Python API

You can also use ICVision programmatically within your Python scripts.

**Single .set file usage (NEW):**
```python
from pathlib import Path
from icvision.core import label_components

# --- Configuration ---
API_KEY = "your_openai_api_key" # Or set as environment variable OPENAI_API_KEY
DATA_PATH = "path/to/your_data.set" # EEGLAB .set file with ICA
OUTPUT_DIR = Path("icvision_output")

# --- Run ICVision (ICA auto-detected from .set file) ---
try:
raw_cleaned, ica_updated, results_df = label_components(
raw_data=DATA_PATH, # EEGLAB .set file path
# ica_data parameter is optional - auto-detected from .set file
api_key=API_KEY, # Optional if OPENAI_API_KEY env var is set
output_dir=OUTPUT_DIR,
)
```

**Traditional separate files:**
```python
from pathlib import Path
from icvision.core import label_components

# --- Configuration ---
API_KEY = "your_openai_api_key" # Or set as environment variable OPENAI_API_KEY
RAW_DATA_PATH = "path/to/your_raw_data.set"
ICA_DATA_PATH = "path/to/your_ica_data.fif"
OUTPUT_DIR = Path("icvision_output")

# --- Run ICVision with all parameters ---
try:
raw_cleaned, ica_updated, results_df = label_components(
raw_data=RAW_DATA_PATH, # Can be MNE object or path string/Path object
ica_data=ICA_DATA_PATH, # Can be MNE object, path, or None for auto-detection
api_key=API_KEY, # Optional if OPENAI_API_KEY env var is set
output_dir=OUTPUT_DIR,
model_name="gpt-4.1", # Default: "gpt-4.1"
confidence_threshold=0.80, # Default: 0.8
labels_to_exclude=["eye", "muscle", "heart", "line_noise", "channel_noise"], # Default: all non-brain
generate_report=True, # Default: True
batch_size=5, # Default: 10
max_concurrency=3, # Default: 4
auto_exclude=True, # Default: True
custom_prompt=None, # Default: None (uses built-in prompt)
psd_fmax=40.0 # Default: None (uses 80 Hz); useful for ERP studies
)

print("\n--- ICVision Processing Complete ---")
print(f"Cleaned raw data channels: {raw_cleaned.info['nchan']}")
print(f"Updated ICA components: {ica_updated.n_components_}")
print(f"Number of components classified: {len(results_df)}")

if not results_df.empty:
print(f"Number of components marked for exclusion: {results_df['exclude_vision'].sum()}")
print("\nClassification Summary:")
print(results_df[['component_name', 'label', 'confidence', 'exclude_vision']].head())

print(f"\nResults saved in: {OUTPUT_DIR.resolve()}")

except Exception as e:
print(f"An error occurred: {e}")

```

## 🔄 ICLabel Drop-in Replacement

ICVision can serve as a **drop-in replacement** for MNE-ICALabel with identical API and output format. This means you can upgrade existing ICLabel workflows to use OpenAI Vision API without changing any other code.

### Quick Migration

**Before (using MNE-ICALabel):**
```python
from mne_icalabel import label_components

# Classify components with ICLabel
result = label_components(raw, ica, method='iclabel')
print(result['labels']) # ['brain', 'eye blink', 'other', ...]
print(ica.labels_scores_.shape) # (n_components, 7)
```

**After (using ICVision):**
```python
from icvision.compat import label_components # <-- Only line that changes!

# Classify components with ICVision (same API!)
result = label_components(raw, ica, method='icvision')
print(result['labels']) # Same format: ['brain', 'eye blink', 'other', ...]
print(ica.labels_scores_.shape) # Same shape: (n_components, 7)
```

### What You Get

- **🎯 Identical API**: Same function signature, same return format
- **📊 Same Output**: Returns dict with `'y_pred_proba'` and `'labels'` keys
- **⚙️ Same ICA Modifications**: Sets `ica.labels_scores_` and `ica.labels_` exactly like ICLabel
- **🚀 Enhanced Intelligence**: OpenAI Vision API instead of fixed neural network
- **💡 Detailed Reasoning**: Each classification includes explanation (available in full API)

### Why Use ICVision over ICLabel?

| Feature | ICLabel | ICVision |
|---------|---------|----------|
| **Classification Method** | Fixed neural network (2019) | OpenAI Vision API (latest models) |
| **Accuracy** | Good on typical datasets | Enhanced with modern vision AI |
| **Reasoning** | No explanations | Detailed reasoning for each decision |
| **Customization** | Fixed model | Customizable prompts and models |
| **Updates** | Static model | Benefits from OpenAI improvements |
| **API Compatibility** | ✅ Original | ✅ Drop-in replacement |

### Integration Example

The compatibility layer works seamlessly with existing MNE workflows:

```python
def analyze_ica_components(raw, ica, method='icvision'):
"""Generic function that works with both ICLabel and ICVision"""

if method == 'icvision':
from icvision.compat import label_components
else:
from mne_icalabel import label_components

# Same API for both!
result = label_components(raw, ica, method=method)

# Same return format for both
print(f"Classified {len(result['labels'])} components")

# Same ICA object modifications for both
brain_components = ica.labels_['brain']
artifact_components = [idx for key, indices in ica.labels_.items()
if key != 'brain' for idx in indices]

print(f"Brain components: {brain_components}")
print(f"Artifact components: {artifact_components}")

return result

# Works with either classifier
result = analyze_ica_components(raw, ica, method='icvision')
```

### Two APIs, Same Power

ICVision provides **two complementary interfaces**:

1. **Original ICVision API**: Rich output with detailed results and file generation
```python
from icvision.core import label_components
raw_cleaned, ica_updated, results_df = label_components(...)
```

2. **ICLabel-Compatible API**: Simple output matching ICLabel exactly
```python
from icvision.compat import label_components
result = label_components(raw, ica, method='icvision')
```

Choose the API that best fits your workflow - both use the same underlying OpenAI Vision classification.

---

## Configuration Details

### Input File Support

**EEGLAB .set files:**
- **Raw data**: Supports EEGLAB `.set` files for raw EEG data
- **ICA data**: Now supports automatic ICA detection from `.set` files using `mne.preprocessing.read_ica_eeglab()`
- **Single file mode**: Use just a `.set` file when it contains both raw data and ICA decomposition

**MNE formats:**
Other supported formats include:
- **Raw data**: `.fif`, `.edf`, `.raw`
- **ICA data**: `.fif` files containing MNE ICA objects

### Default Parameter Values

| Parameter | Default Value | Description |
|-----------|---------------|-------------|
| `model_name` | `"gpt-4.1"` | OpenAI model for classification (also supports `gpt-5.2`) |
| `base_url` | `None` | Custom API endpoint (uses `OPENAI_BASE_URL` env var if set) |
| `layout` | `"single"` | Classification mode: `"single"` or `"strip"` |
| `strip_size` | `9` | Components per strip image (when `layout="strip"`) |
| `reasoning_effort` | `None` | Reasoning effort for gpt-5.x: `none`, `low`, `medium`, `high` |
| `psd_fmax` | `45.0` | Maximum frequency for PSD plots (Hz) |
| `confidence_threshold` | `0.8` | Minimum confidence for auto-exclusion |
| `auto_exclude` | `True` | Automatically exclude artifact components |
| `labels_to_exclude` | `["eye", "muscle", "heart", "line_noise", "channel_noise", "other_artifact"]` | Labels to exclude (all non-brain) |
| `output_dir` | `"./autoclean_icvision_results"` | Output directory for results |
| `generate_report` | `True` | Generate PDF report |
| `batch_size` | `10` | Components per API request |
| `max_concurrency` | `4` | Maximum parallel API requests |
| `api_key` | `None` | Uses `OPENAI_API_KEY` environment variable |
| `custom_prompt` | `None` | Uses built-in classification prompt |

### Component Labels

The standard set of labels ICVision uses (and expects from the API) are:
- `brain` - Neural brain activity (retained)
- `eye` - Eye movement artifacts
- `muscle` - Muscle artifacts
- `heart` - Cardiac artifacts
- `line_noise` - Electrical line noise
- `channel_noise` - Channel-specific noise
- `other_artifact` - Other artifacts

These are defined in `src/icvision/config.py`.

### Output Files

ICVision creates organized output files with input filename prefixes to prevent overwrites when processing multiple datasets:

* `{basename}_icvis_classified_ica.fif`: MNE ICA object with labels and exclusions
* `{basename}_icvis_results.csv`: Detailed classification results per component
* `{basename}_icvis_cleaned_raw.{format}`: Cleaned EEG data with artifacts removed
* `{basename}_icvis_summary.txt`: Summary statistics by label type
* `{basename}_icvis_report_all_comps.pdf`: Comprehensive PDF report (if enabled)
* `component_IC{N}_vision_analysis.webp`: Individual component plots used for API classification

**Example**: Processing `sub-01_task-rest_eeg.set` creates files like:
- `sub-01_task-rest_eeg_icvis_results.csv`
- `sub-01_task-rest_eeg_icvis_classified_ica.fif`
- `sub-01_task-rest_eeg_icvis_cleaned_raw.set`

**Multi-file Processing**: All results are saved to the same `autoclean_icvision_results/` directory, with basename prefixes ensuring no conflicts:
```bash
autoclean_icvision_results/
├── sub-01_task-rest_eeg_icvis_results.csv
├── sub-01_task-rest_eeg_icvis_classified_ica.fif
├── sub-02_task-rest_eeg_icvis_results.csv
├── sub-02_task-rest_eeg_icvis_classified_ica.fif
└── pilot_data_icvis_results.csv
```

### Custom Classification Prompt

The default prompt is optimized for EEG component classification on EGI128 nets. You can customize it by:
- **CLI**: `--prompt-file /path/to/custom_prompt.txt`
- **Python API**: `custom_prompt="Your custom prompt here"`
- **View default**: Check `src/icvision/config.py`

### OpenAI API Costs

ICVision automatically tracks and estimates OpenAI API costs during processing:

**Typical Costs (2025-05-29 pricing)**:
- **gpt-4.1**: ~$0.0012 per component
- **gpt-4.1-mini**: ~$0.0002 per component (recommended)
- **gpt-4.1-nano**: ~$0.0001 per component (budget option)

**Example costs for full ICA analysis**:
- 10 components: $0.0006-0.012 depending on model
- 30 components: $0.002-0.036 depending on model
- 64 components: $0.004-0.077 depending on model

Cost estimates are automatically logged during processing. Use `--verbose` flag to see detailed per-component cost tracking.

### Logging and Verbosity

ICVision provides two logging modes for different use cases:

**Normal Mode** (Default - Clean output for researchers):
```bash
autoclean-icvision data.set
# Output:
# 2025-05-29 13:33:43 - INFO - Starting ICVision CLI v0.1.0
# 2025-05-29 13:33:44 - INFO - OpenAI classification complete. Processed 20/20 components
# 2025-05-29 13:33:45 - INFO - ICVision workflow completed successfully!
```

**Verbose Mode** (Detailed debugging information):
```bash
autoclean-icvision data.set --verbose
# Output:
# 2025-05-29 13:33:43 - icvision - INFO - Verbose logging enabled - showing module details
# 2025-05-29 13:33:44 - icvision.core - DEBUG - Loading and validating input data...
# 2025-05-29 13:33:45 - icvision.api - DEBUG - Response ID: resp_123..., Tokens: 400/50, Cost: $0.001200
# 2025-05-29 13:33:45 - icvision.plotting - DEBUG - Plotting progress: 10/20 components completed
```

**Verbose mode provides**:
- Module-level debugging information
- Detailed OpenAI API cost tracking per component
- Progress indicators for long-running operations
- External library logging (httpx, openai, etc.)
- Full error stack traces for troubleshooting

**Use verbose mode when**:
- Debugging processing issues
- Monitoring API costs in detail
- Contributing to development
- Troubleshooting unexpected behavior

## Development

Contributions are welcome! Please see `CONTRIBUTING.md` for guidelines.

## License

This project is licensed under the MIT License - see the `LICENSE` file for details.

## Citation

If you use ICVision in your research, please consider citing it (details to be added upon publication/DOI generation).

## Acknowledgements

* This project relies heavily on the [MNE-Python](https://mne.tools/) library.
* Utilizes the [OpenAI API](https://openai.com/api/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cincibrainlab/autocleaneeg-icvision

Awesome Lists containing this project

README