{"id":37083646,"url":"https://github.com/crowdcent/numerblox","last_synced_at":"2026-01-14T10:12:28.993Z","repository":{"id":39042253,"uuid":"444391280","full_name":"crowdcent/numerblox","owner":"crowdcent","description":"Solid Numerai pipelines","archived":false,"fork":false,"pushed_at":"2025-09-11T23:38:19.000Z","size":68190,"stargazers_count":116,"open_issues_count":1,"forks_count":12,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-09-14T04:58:09.307Z","etag":null,"topics":["data-science","mlops","numerai"],"latest_commit_sha":null,"homepage":"https://crowdcent.github.io/numerblox","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/crowdcent.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-04T11:19:43.000Z","updated_at":"2025-09-11T23:37:46.000Z","dependencies_parsed_at":"2024-01-02T14:01:47.310Z","dependency_job_id":"78c59f3a-8425-4f52-abec-1c1ecf1af485","html_url":"https://github.com/crowdcent/numerblox","commit_stats":{"total_commits":462,"total_committers":6,"mean_commits":77.0,"dds":0.06493506493506496,"last_synced_commit":"dd0800f7eb6306964722922c60eea8bd5bd789a5"},"previous_names":[],"tags_count":77,"template":false,"template_full_name":"fastai/nbdev_template","purl":"pkg:github/crowdcent/numerblox","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crowdcent%2Fnumerblox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crowdcent%2Fnumerblox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crowdcent%2Fnumerblox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crowdcent%2Fnumerblox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/crowdcent","download_url":"https://codeload.github.com/crowdcent/numerblox/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crowdcent%2Fnumerblox/sbom","scorecard":{"id":309689,"data":{"date":"2025-08-11","repo":{"name":"github.com/crowdcent/numerblox","commit":"27be78480b570097527af9d57caf5dd3c2833b9a"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.8,"checks":[{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":0,"reason":"Found 0/16 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/ci.yml:1","Warn: no topLevel permission defined: .github/workflows/codecov.yml:1","Warn: no topLevel permission defined: .github/workflows/deploy-mkdocs.yml:1","Warn: no topLevel permission defined: .github/workflows/ruff.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/ci.yml:17: update your workflow using https://app.stepsecurity.io/secureworkflow/crowdcent/numerblox/ci.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/ci.yml:20: update your workflow using https://app.stepsecurity.io/secureworkflow/crowdcent/numerblox/ci.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codecov.yml:12: update your workflow using https://app.stepsecurity.io/secureworkflow/crowdcent/numerblox/codecov.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codecov.yml:14: update your workflow using https://app.stepsecurity.io/secureworkflow/crowdcent/numerblox/codecov.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/codecov.yml:34: update your workflow using https://app.stepsecurity.io/secureworkflow/crowdcent/numerblox/codecov.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/deploy-mkdocs.yml:13: update your workflow using https://app.stepsecurity.io/secureworkflow/crowdcent/numerblox/deploy-mkdocs.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/deploy-mkdocs.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/crowdcent/numerblox/deploy-mkdocs.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/deploy-mkdocs.yml:42: update your workflow using https://app.stepsecurity.io/secureworkflow/crowdcent/numerblox/deploy-mkdocs.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/ruff.yml:7: update your workflow using https://app.stepsecurity.io/secureworkflow/crowdcent/numerblox/ruff.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ruff.yml:8: update your workflow using https://app.stepsecurity.io/secureworkflow/crowdcent/numerblox/ruff.yml/master?enable=pin","Warn: downloadThenRun not pinned by hash: .github/workflows/ci.yml:26","Warn: downloadThenRun not pinned by hash: .github/workflows/codecov.yml:19","Warn: downloadThenRun not pinned by hash: .github/workflows/deploy-mkdocs.yml:22","Info:   0 out of   7 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   3 third-party GitHubAction dependencies pinned","Info:   0 out of   3 downloadThenRun dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 16 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-17T22:56:09.622Z","repository_id":39042253,"created_at":"2025-08-17T22:56:09.623Z","updated_at":"2025-08-17T22:56:09.623Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28416672,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T08:38:59.149Z","status":"ssl_error","status_checked_at":"2026-01-14T08:38:43.588Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","mlops","numerai"],"created_at":"2026-01-14T10:12:28.411Z","updated_at":"2026-01-14T10:12:28.986Z","avatar_url":"https://github.com/crowdcent.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![](https://img.shields.io/pypi/v/numerblox.png)\n![Python Version](https://img.shields.io/badge/dynamic/toml?url=https://raw.githubusercontent.com/crowdcent/numerblox/master/pyproject.toml\u0026query=%24.project%5B%22requires-python%22%5D\u0026label=python\u0026color=blue)\n![](https://img.shields.io/github/contributors/crowdcent/numerblox.png)\n[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n![](https://img.shields.io/codecov/c/gh/carlolepelaars/numerblox/master)\n![](https://img.shields.io/pypi/dm/numerblox)\n\n\n\n\n# NumerBlox\n\nNumerBlox offers components that help with developing strong Numerai models and inference pipelines. From downloading data to submitting predictions, NumerBlox has you covered.\n\nAll components can be used standalone and all processors are fully compatible to use within [scikit-learn](https://scikit-learn.org/) pipelines.  \n\n**Documentation:**\n[crowdcent.github.io/numerblox](https://crowdcent.github.io/numerblox)\n\n## 1. Installation\n\n### Recommended (using pip)\nSimply install numerblox from PyPI by running:\n\n```bash\npip install numerblox\n```\n\nIf you prefer to use [uv](https://github.com/astral-sh/uv), you can install numerblox with:\n\n```bash\nuv pip install numerblox\n```\n\n### Development\nTo install for development, clone the repository and use either pip or uv:\n\nUsing pip:\n```bash\ngit clone https://github.com/crowdcent/numerblox.git\ncd numerblox\npip install -e \".[test]\"\n```\n\nUsing [uv](https://github.com/astral-sh/uv):\n```bash\ngit clone https://github.com/crowdcent/numerblox.git\ncd numerblox\nuv venv\nuv pip install -e \".[test]\"\n```\n\nFor installation without dev dependencies, omit the `[test]` extra:\n\n```bash\npip install -e .\n```\nor\n```bash\nuv pip install -e .\n```\n\nTest your installation using one of the education notebooks in\n[examples](https://github.com/crowdcent/numerblox/examples). Good places to start are [quickstart.ipynb](https://github.com/crowdcent/numerblox/examples/quickstart.ipynb) and [numerframe_tutorial.ipynb](https://github.com/crowdcent/numerblox/examples/numerframe_tutorial.ipynb). Run it in your notebook environment to quickly test if your installation has succeeded. The documentation contains examples and explanations for each component of NumerBlox.\n\n## 2. Core functionality\n\nNumerBlox has the following features for both Numerai Classic and Signals:\n\n**[Data Download](https://crowdcent.github.io/numerblox/download/):** Automated retrieval of Numerai datasets.\n\n**[NumerFrame](https://crowdcent.github.io/numerblox/numerframe/):** A custom Pandas DataFrame for easier Numerai data manipulation.\n\n**[Preprocessors](https://crowdcent.github.io/numerblox/preprocessing/):** Customizable techniques for data preprocessing.\n\n**[Target Engineering](https://crowdcent.github.io/numerblox/targets/):** Tools for creating new target variables.\n\n**[Postprocessors](https://crowdcent.github.io/numerblox/neutralization/):** Ensembling, neutralization, and penalization.\n\n**[MetaPipeline](https://crowdcent.github.io/numerblox/meta/):** An era-aware pipeline extension of scikit-learn's Pipeline. Specifically designed to integrate with era-specific Postprocessors such as neutralization and ensembling. Can be optionally bypassed for custom implementations.\n\n**[MetaEstimators](https://crowdcent.github.io/numerblox/meta/):** Era-aware estimators that extend scikit-learn's functionality. Includes features like CrossValEstimator which allow for era-specific, multiple-folds fitting seamlessly integrated into the pipeline.\n\n**[Evaluation](https://crowdcent.github.io/numerblox/evaluation/):** Comprehensive metrics aligned with Numerai's evaluation criteria.\n\n**[Submitters](https://crowdcent.github.io/numerblox/submission/):** Facilitates secure and easy submission of predictions.\n\n**[Model Upload](https://crowdcent.github.io/numerblox/model_upload/):** Assists in the process of uploading trained models to Numerai for automated submissions.\n\nExample notebooks for each of these components can be found in the [examples](https://github.com/crowdcent/numerblox/examples). Also check out [the documentation](https://crowdcent.github.io/numerblox) for more information.\n\n\n## 3. Quick Start\n\nBelow are two examples of how NumerBlox can be used to train and do inference on Numerai data. For a full overview of all components check out the documentation. More advanced examples to leverage NumerBlox to the fullest can be found in the [End-To-End Example section](https://crowdcent.github.io/numerblox/end_to_end/).\n\n### 3.1. Simple example\n\nThe example below shows how NumerBlox can simplify the process of downloading, loading, training, evaluating, inferring and submitting data for Numerai Classic.\n\nNumerBlox is used here for easy downloading, data parsing, evaluation, inference and submission. You can experiment with this setup yourself in the example notebook [quickstart.ipynb](https://github.com/crowdcent/numerblox/examples/quickstart.ipynb).\n\n#### Downloading, loading, and training\n```python\nfrom numerblox.download import NumeraiClassicDownloader\nfrom numerblox.numerframe import create_numerframe\nfrom xgboost import XGBRegressor\n\ndownloader = NumeraiClassicDownloader(\"data\")\ndownloader.download_training_data(\"train_val\", version=\"5.0\")\ndf = create_numerframe(\"data/train_val/train.parquet\")\n\nX, y = df.get_feature_target_pair(multi_target=False)\nmodel = XGBRegressor()\nmodel.fit(X.values, y.values)\n```\n\n#### Evaluation\n```python\nfrom numerblox.prediction_loaders import ExamplePredictions\nfrom numerblox.evaluation import NumeraiClassicEvaluator\n\nval_df = create_numerframe(\"data/train_val/validation.parquet\")\nval_df['prediction'] = model.predict(val_df.get_feature_data)\nval_df['example_preds'] = ExamplePredictions(\"v5.0/validation_example_preds.parquet\").fit_transform(None)['prediction'].values\nevaluator = NumeraiClassicEvaluator()\nmetrics = evaluator.full_evaluation(val_df, \n                                    example_col=\"example_preds\", \n                                    pred_cols=[\"prediction\"], \n                                    target_col=\"target\")\n```\n\n#### Live Inference\n```python\ndownloader.download_live_data(\"current_round\", version=\"5.0\")\nlive_df = create_numerframe(file_path=\"data/current_round/live.parquet\")\nlive_X, live_y = live_df.get_feature_target_pair(multi_target=False)\npreds = model.predict(live_X)\n```\n\n#### Submission\n```python\nfrom numerblox.misc import Key\nfrom numerblox.submission import NumeraiClassicSubmitter\n\nNUMERAI_PUBLIC_ID = \"YOUR_PUBLIC_ID\"\nNUMERAI_SECRET_KEY = \"YOUR_SECRET_KEY\"\nkey = Key(pub_id=NUMERAI_PUBLIC_ID, secret_key=NUMERAI_SECRET_KEY)\nsubmitter = NumeraiClassicSubmitter(directory_path=\"sub_current_round\", key=key)\npred_dataf = pd.DataFrame(preds, index=live_df.index, columns=[\"prediction\"])\nsubmitter.full_submission(dataf=pred_dataf,\n                          cols=\"prediction\",\n                          file_name=\"submission.csv\",\n                          model_name=\"MY_MODEL_NAME\")\n```\n\n#### Model Upload\n```python\nfrom numerblox.submission import NumeraiModelUpload\n\nuploader = NumeraiModelUpload(key=key, max_retries=3, sleep_time=15, fail_silently=True)\nuploader.create_and_upload_model(model=model, \n                                 model_name=\"MY_MODEL_NAME\", \n                                 file_path=\"models/my_model.pkl\")\n```\n\n### 3.2. Advanced NumerBlox modeling\n\nBuilding on the simple example, this advanced setup showcases how to leverage NumerBlox's powerful components to create a sophisticated pipeline that can replace the \"simple\" XGBoost model in the example above. This advanced example creates an extensible scikit-learn pipeline with metadata routing that:\n\n- Approaches Numerai Classic as a classification problem\n- Uses cross-validation with multiple folds\n- Reduces classification probabilities to single values\n- Creates a weighted ensemble favoring recent folds\n- Applies neutralization to the predictions\n\n#### Creating the pipeline\n```python\nfrom xgboost import XGBClassifier\nfrom sklearn.model_selection import TimeSeriesSplit\nfrom numerblox.meta import CrossValEstimator, make_meta_pipeline\nfrom numerblox.ensemble import NumeraiEnsemble, PredictionReducer\nfrom numerblox.neutralizers import FeatureNeutralizer\n\nmodel = XGBClassifier()\ncrossval = CrossValEstimator(estimator=model, cv=TimeSeriesSplit(n_splits=5), predict_func='predict_proba')\npred_rud = PredictionReducer(n_models=5, n_classes=5)\nens = NumeraiEnsemble(donate_weighted=True)\nneut = FeatureNeutralizer(proportion=0.5)\nfull_pipe = make_meta_pipeline(crossval, pred_rud, ens, neut)\n```\n\n#### Training\n```python\n# ... Assume df is already defined as in the simple example ...\nX, y = df.get_feature_target_pair(multi_target=False)\ny_int = (y * 4).astype(int)  # Convert targets to integer classes for classification\nera_series = df.get_era_data\nfeatures = df.get_feature_data\nfull_pipe.fit(X, y_int, era_series=era_series)\n```\n\n#### Inference\n```python\nlive_eras = live_df.get_era_data\nlive_features = live_df.get_feature_data\npreds = full_pipe.predict(live_X, era_series=live_eras, features=live_features)\n```\n\nScikit-learn estimators, pipelines, and metadata routing are used to make sure we pass the correct era and feature information to estimators in the pipeline that require those parameters. It is worth familiarizing yourself with these concepts before using the advanced modeling features of NumerBlox: \n- [scikit-learn pipelines](https://scikit-learn.org/stable/modules/compose.html)\n- [scikit-learn metadata routing](https://scikit-learn.org/stable/metadata_routing.html)\n\n## 4. Contributing\n\nBe sure to read the [How To Contribute section](https://crowdcent.github.io/numerblox/contributing/) section in the documentation for detailed instructions on contributing.\n\nIf you have questions or want to discuss new ideas for NumerBlox, please create a Github issue first.\n\n## 5. Crediting sources\n\nSome of the components in this library may be based on forum posts, notebooks or ideas made public by the Numerai community. We have done our best to ask all parties who posted a specific piece of code for their permission and credit their work in docstrings and documentation. If your code is public and used in this library without credits, please let us know, so we can add a link to your article/code. We want to always give credit where credit is due.\n\nIf you are contributing to NumerBlox and are using ideas posted earlier by someone else, make sure to credit them by posting a link to their article/code in docstrings and documentation.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrowdcent%2Fnumerblox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcrowdcent%2Fnumerblox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrowdcent%2Fnumerblox/lists"}