https://github.com/4freye/panelsplit
A tool for performing cross-validation with panel data
https://github.com/4freye/panelsplit
cross-validation pandas panel-data python sklearn-compatible time-series
Last synced: 28 days ago
JSON representation
A tool for performing cross-validation with panel data
- Host: GitHub
- URL: https://github.com/4freye/panelsplit
- Owner: 4Freye
- License: mit
- Created: 2024-01-11T16:31:44.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-04-04T17:09:19.000Z (about 1 year ago)
- Last Synced: 2025-09-02T18:36:48.670Z (9 months ago)
- Topics: cross-validation, pandas, panel-data, python, sklearn-compatible, time-series
- Language: Python
- Homepage:
- Size: 7.66 MB
- Stars: 20
- Watchers: 1
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README

[](https://doi.org/10.5281/zenodo.14933814)
# panelsplit: a tool for panel data analysis
panelsplit is a Python package designed to facilitate time series cross-validation when working with multiple entities (aka panel data). This tool is useful for handling panel data in various stages throughout the data pipeline, including feature engineering, hyper-parameter tuning, and model estimation.
## Installation
panelsplit is tested for compatibility with python versions >= 3.11. You can install panelsplit using pip:
```bash
pip install panelsplit
```
---
## Documentation
To read the documentation, visit [here](https://4freye.github.io/panelsplit/panelsplit.html).
### Example Usage
```python
import pandas as pd
from panelsplit.cross_validation import PanelSplit
# Generate example data
num_countries = 2
years = range(2001, 2004)
num_years = len(years)
data_dict = {
'country_id': [c for c in range(1, num_countries + 1) for _ in years],
'year': [year for _ in range(num_countries) for year in years],
'y': np.random.normal(0, 1, num_countries * num_years),
'x1': np.random.normal(0, 1, num_countries * num_years),
'x2': np.random.normal(0, 1, num_countries * num_years)
}
panel_data = pd.DataFrame(data_dict)
panel_split = PanelSplit(periods = panel_data.year, n_splits =2)
splits = panel_split.split()
for train_idx, test_idx in splits:
print("Train:"); display(panel_data.loc[train_idx])
print("Test:"); display(panel_data.loc[test_idx])
```
### Spatio-Temporal Cross-Validation
panelsplit can also handle combined spatio-temporal holdouts by factoring in entity hierarchies (e.g., states or cities) to prevent cluster-level leakage. You can simultaneously validate on unobserved time periods *and* structurally unobserved groups:
```python
from sklearn.model_selection import StratifiedGroupKFold
# Create spatial splits that evaluate cluster-level combinations robustly:
panel_split = PanelSplit(
periods=panel_data.year,
n_splits=2,
groups=panel_data["country_id"],
group_splitter=StratifiedGroupKFold(n_splits=3) # Use any valid Scikit-Learn group methodology!
)
# You can also pass arbitrarily nested multi-column groups!
# PanelSplit will internally flatten them into a single composite group identifier for KFold slicing.
# e.g., groups = panel_data[["country_id", "city_id"]]
# Lazy Evaluation securely propagates X and y through the StratifiedGroupKFold!
splits = panel_split.split(X=panel_data, y=panel_data["y"])
# Yields 6 total sub-splits (2 temporal cuts x 3 spatial stratified holds)!
```
For more examples and detailed usage instructions, refer to the [examples](examples) directory in this repository. Also feel free to check out [an introductory article on panelsplit](https://towardsdatascience.com/how-to-cross-validate-your-panel-data-in-python-9ad981ddd043).
## Background
Work on panelsplit started at [EconAI](https://www.linkedin.com/company/econ-ai/) in December 2023 and has been under active development since then.
## Contributing
Contributions to panelsplit are welcome! If you encounter any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on GitHub.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.