https://github.com/zeiss/things_eeg2_dataset
🧠⚡A CLI to access and process the THINGS-EEG2 dataset by Gifford et al. (2022)
https://github.com/zeiss/things_eeg2_dataset
cli eeg electroencephalography mne neuroscience
Last synced: 23 days ago
JSON representation
🧠⚡A CLI to access and process the THINGS-EEG2 dataset by Gifford et al. (2022)
- Host: GitHub
- URL: https://github.com/zeiss/things_eeg2_dataset
- Owner: ZEISS
- License: other
- Created: 2025-11-26T07:38:43.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-03-07T15:02:10.000Z (3 months ago)
- Last Synced: 2026-03-07T17:44:03.632Z (3 months ago)
- Topics: cli, eeg, electroencephalography, mne, neuroscience
- Language: Python
- Homepage:
- Size: 14.2 MB
- Stars: 8
- Watchers: 0
- Forks: 0
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README

[![PyPI][pypi-badge]][pypi]
[![Conda Platform][conda-badge]][conda-url]
[![License][license-badge]][license-url]
[![CI Status][ci-badge]][ci-url]
[pypi-badge]: https://img.shields.io/pypi/v/things_eeg2_dataset?style=flat-square&label=PyPI
[pypi]: https://pypi.org/project/things-eeg2-dataset/
[license-badge]: https://img.shields.io/badge/License-CC%20BY--NC%204.0-yellow.svg?style=flat-square
[license-url]: LICENSE
[ci-badge]: https://img.shields.io/github/actions/workflow/status/zeiss/things_eeg2_dataset/ci.yml?branch=main&style=flat-square&label=CI
[ci-url]: https://github.com/zeiss/things_eeg2_dataset/actions/workflows/ci.yml
[conda-badge]: https://img.shields.io/conda/vn/conda-forge/things_eeg2_dataset?style=flat-square
[conda-url]: https://prefix.dev/channels/conda-forge/packages/things_eeg2_dataset
# Introduction
This package provides tools for downloading, preprocessing the raw THINGS-EEG2 data, and generating image embeddings using various vision models.
> [!WARNING]
> This repository builds upon the original data processing by [Gifford et al (2022)](https://github.com/gifale95/eeg_encoding).
> Please check out their original code and the [corresponding paper](https://www.sciencedirect.com/science/article/pii/S1053811922008758?via%3Dihub).
>
> We are in no way associated with the authors.
> Nonetheless we hope, that this makes things easier (pun intended) to use.
## Installation
### CLI-only
If you only need the CLI functionality, you can run it using one line of code:
#### Using the PyPI package (with uv)
```bash
uvx run --from things_eeg2_dataset things-eeg2
```
#### Using the conda package (with pixi)
```bash
pixi exec --with things_eeg2_dataset things-eeg2
```
### From GitHub
```bash
git clone git@github.com:ZEISS/things_eeg2_dataset.git
cd things_eeg2_dataset
uv sync
uv pip install --editable .
source .venv/bin/activate
things-eeg2 --help
things-eeg2 --install-completion
# Then restart your shell
# Example for zsh:
source ~/.zshrc
```
### From PyPI
```bash
# Using UV
uv init
uv add things_eeg2_dataset
source .venv/bin/activate
things-eeg2 --help
things-eeg2 --install-completion
# Then restart your shell
# Example for zsh:
source ~/.zshrc
```
### Using the conda package
```bash
# Using pixi
pixi init
pixi add things_eeg2_dataset
pixi shell
things-eeg2 --help
things-eeg2 --install-completion
# Then restart your shell
# Example for zsh:
source ~/.zshrc
```
## Usage


## Data Structure
You can understand the data structure that is created by the CLI by referring to [paths.py](src/things_eeg2_dataset//paths.py).
It contains the ground truth data structure used throughout the project.
### Embedding Generation (`embedding_processing/`)
The package supports multiple state-of-the-art vision models for generating image embeddings:
| Model | Embedder Class | Description |
|-------|----------------|-------------|
| `open-clip-vit-h-14` | `OpenClipViTH14Embedder` | OpenCLIP ViT-H/14 (SDXL image encoder) |
| `openai-clip-vit-l-14` | `OpenAIClipVitL14Embedder` | OpenAI CLIP ViT-L/14 |
| `dinov2` | `DinoV2Embedder` | DINOv2 with registers (self-supervised) |
| `ip-adapter` | `IPAdapterEmbedder` | IP-Adapter Plus projections |
Each embedder generates:
- **Pooled embeddings**: Single vector per image (e.g., `(1024,)` for ViT-H-14)
- **Full sequence embeddings**: All tokens (e.g., `(257, 1280)` for ViT-H-14)
- **Text embeddings**: Corresponding text features from image captions
**Output Files:**
```bash
embeddings/
├── ViT-H-14_features_training.safetensors # Pooled embeddings
├── ViT-H-14_features_training_full.safetensors # Full token sequences
├── ViT-H-14_features_test.safetensors
└── ViT-H-14_features_test_full.safetensors
```
### Using the dataloader
```python
from things_eeg2_dataset.dataloader import ThingsEEGDataset
dataset = ThingsEEGDataset(
image_model="ViT-H-14",
data_path="/path/to/processed_data",
img_directory_training="/path/to/images/train",
img_directory_test="/path/to/images/test",
embeddings_dir="/path/to/embeddings",
train=True,
time_window=(0.0, 1.0),
)
```
See `things_eeg2_dataloader/README.md` for detailed usage.
## References & Citation
We are happy users of the [THINGS-EEG2 dataset](https://things-initiative.org/), but not associated with the original authors.
If you use this code, please cite the [THINGS-EEG2 paper](https://www.sciencedirect.com/science/article/pii/S1053811922008758?via%3Dihub):
> Gifford, A. T., Lahner, B., Saba-Sadiya, S., Vilas, M. G., Lascelles, A., Oliva, A., ... & Cichy, R. M. (2022). The THINGS-EEG2 dataset. Scientific Data.