{"id":30912381,"url":"https://github.com/4freye/panelsplit","last_synced_at":"2026-05-07T01:17:59.635Z","repository":{"id":216685759,"uuid":"742040227","full_name":"4Freye/panelsplit","owner":"4Freye","description":"A tool for performing cross-validation with panel data","archived":false,"fork":false,"pushed_at":"2025-04-04T17:09:19.000Z","size":8032,"stargazers_count":20,"open_issues_count":3,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-02T18:36:48.670Z","etag":null,"topics":["cross-validation","pandas","panel-data","python","sklearn-compatible","time-series"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/4Freye.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-01-11T16:31:44.000Z","updated_at":"2025-05-13T15:17:36.000Z","dependencies_parsed_at":"2024-03-03T12:26:59.406Z","dependency_job_id":"30f60b8d-47d7-4085-9ae1-11b2239ecbdd","html_url":"https://github.com/4Freye/panelsplit","commit_stats":null,"previous_names":["4freye/panelsplit"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/4Freye/panelsplit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4Freye%2Fpanelsplit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4Freye%2Fpanelsplit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4Freye%2Fpanelsplit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4Freye%2Fpanelsplit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/4Freye","download_url":"https://codeload.github.com/4Freye/panelsplit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4Freye%2Fpanelsplit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274367808,"owners_count":25272302,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-09T02:00:10.223Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cross-validation","pandas","panel-data","python","sklearn-compatible","time-series"],"created_at":"2025-09-09T21:51:29.342Z","updated_at":"2026-05-07T01:17:59.628Z","avatar_url":"https://github.com/4Freye.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![PyPI - Version](https://img.shields.io/pypi/v/panelsplit)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.114933814.svg)](https://doi.org/10.5281/zenodo.14933814)\n\n# panelsplit: a tool for panel data analysis\n\npanelsplit is a Python package designed to facilitate time series cross-validation when working with multiple entities (aka panel data). This tool is useful for handling panel data in various stages throughout the data pipeline, including feature engineering, hyper-parameter tuning, and model estimation.\n\n## Installation\n\npanelsplit is tested for compatibility with python versions \u003e= 3.11. You can install panelsplit using pip:\n\n```bash\npip install panelsplit\n```\n\n---\n\n## Documentation\n\nTo read the documentation, visit [here](https://4freye.github.io/panelsplit/panelsplit.html).\n\n### Example Usage\n\n```python\nimport pandas as pd\nfrom panelsplit.cross_validation import PanelSplit\n\n# Generate example data\nnum_countries = 2\nyears = range(2001, 2004)\nnum_years = len(years)\n\ndata_dict = {\n    'country_id': [c for c in range(1, num_countries + 1) for _ in years],\n    'year': [year for _ in range(num_countries) for year in years],\n    'y': np.random.normal(0, 1, num_countries * num_years),\n    'x1': np.random.normal(0, 1, num_countries * num_years),\n    'x2': np.random.normal(0, 1, num_countries * num_years)\n}\n\npanel_data = pd.DataFrame(data_dict)\npanel_split = PanelSplit(periods = panel_data.year, n_splits =2)\n\nsplits = panel_split.split()\n\nfor train_idx, test_idx in splits:\n    print(\"Train:\"); display(panel_data.loc[train_idx])\n    print(\"Test:\"); display(panel_data.loc[test_idx])\n```\n\n### Spatio-Temporal Cross-Validation\n\npanelsplit can also handle combined spatio-temporal holdouts by factoring in entity hierarchies (e.g., states or cities) to prevent cluster-level leakage. You can simultaneously validate on unobserved time periods *and* structurally unobserved groups:\n\n```python\nfrom sklearn.model_selection import StratifiedGroupKFold\n\n# Create spatial splits that evaluate cluster-level combinations robustly:\npanel_split = PanelSplit(\n    periods=panel_data.year,\n    n_splits=2,\n    groups=panel_data[\"country_id\"],\n    group_splitter=StratifiedGroupKFold(n_splits=3) # Use any valid Scikit-Learn group methodology!\n)\n\n# You can also pass arbitrarily nested multi-column groups!\n# PanelSplit will internally flatten them into a single composite group identifier for KFold slicing.\n# e.g., groups = panel_data[[\"country_id\", \"city_id\"]]\n\n# Lazy Evaluation securely propagates X and y through the StratifiedGroupKFold!\nsplits = panel_split.split(X=panel_data, y=panel_data[\"y\"])\n# Yields 6 total sub-splits (2 temporal cuts x 3 spatial stratified holds)!\n```\n\nFor more examples and detailed usage instructions, refer to the [examples](examples) directory in this repository. Also feel free to check out [an introductory article on panelsplit](https://towardsdatascience.com/how-to-cross-validate-your-panel-data-in-python-9ad981ddd043).\n\n## Background\n\nWork on panelsplit started at [EconAI](https://www.linkedin.com/company/econ-ai/) in December 2023 and has been under active development since then.\n\n## Contributing\n\nContributions to panelsplit are welcome! If you encounter any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on GitHub.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4freye%2Fpanelsplit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F4freye%2Fpanelsplit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4freye%2Fpanelsplit/lists"}