{"id":15469263,"url":"https://github.com/maxhalford/xam","last_synced_at":"2025-08-19T06:33:53.956Z","repository":{"id":87384670,"uuid":"79950780","full_name":"MaxHalford/xam","owner":"MaxHalford","description":":dart: Personal data science and machine learning toolbox","archived":false,"fork":false,"pushed_at":"2020-02-04T20:37:10.000Z","size":1174,"stargazers_count":365,"open_issues_count":3,"forks_count":76,"subscribers_count":20,"default_branch":"master","last_synced_at":"2025-05-19T23:05:14.298Z","etag":null,"topics":["data-science","machine-learning","preprocessing","python","stacking"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MaxHalford.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-01-24T20:24:58.000Z","updated_at":"2025-02-14T17:19:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"9bba4051-75bd-4935-8e6c-a935db88ec14","html_url":"https://github.com/MaxHalford/xam","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MaxHalford/xam","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxHalford%2Fxam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxHalford%2Fxam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxHalford%2Fxam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxHalford%2Fxam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MaxHalford","download_url":"https://codeload.github.com/MaxHalford/xam/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxHalford%2Fxam/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271113403,"owners_count":24701609,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-19T02:00:09.176Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","machine-learning","preprocessing","python","stacking"],"created_at":"2024-10-02T01:58:19.535Z","updated_at":"2025-08-19T06:33:53.903Z","avatar_url":"https://github.com/MaxHalford.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# xam [![Build Status](https://travis-ci.org/MaxHalford/xam.svg?branch=master)](https://travis-ci.org/MaxHalford/xam)\n\n`xam` is my personal data science and machine learning toolbox. It is written in Python 3 and stands on the shoulders of giants (mainly [pandas](https://pandas.pydata.org/) and [scikit-learn](http://scikit-learn.org/)). It loosely follows scikit-learn's `fit`/`transform`/`predict` convention.\n\n## Installation\n\n- [Install Anaconda for Python 3.x \u003e= 3.5](https://www.continuum.io/downloads)\n- Run `pip install git+https://github.com/MaxHalford/xam --upgrade` in a terminal\n\n:warning: Because xam is a ***personal*** toolkit, the `--upgrade` flag will install the latest releases of each dependency (scipy, pandas etc.). I like to stay up-to-date with the latest library versions.\n\n## Table of contents\n\nUsage example is available in the [docs](docs) folder. Each example is tested with [doctest](https://pymotw.com/2/doctest/).\n\n- [Ensembling](docs/ensembling.md)\n  - [Groupby model](docs/ensembling.md#groupby-model)\n  - [LightGBM with CV](docs/ensembling.md#lightgbm-with-cv)\n  - [Stacking](docs/ensembling.md#stacking)\n  - [Stacking with bagged test predictions](docs/ensembling.md#stacking-with-bagged-test-predictions)\n- [Exploratory data analysis (EDA)](docs/eda.md)\n  - [Feature importance](docs/eda.md#feature-importance)\n- [Feature extraction](docs/feature-extraction.md)\n  - [Bayesian target encoding](docs/feature-extraction.md#bayesian-target-encoding)\n  - [Combining features](docs/feature-extraction.md#combining-features)\n  - [Count encoding](docs/feature-extraction.md#count-encoding)\n  - [Cyclic features](docs/feature-extraction.md#cyclic-features)\n- [Feature selection](docs/feature-selection.md)\n  - [Forward-backward selection](docs/feature-selection#forward-backward-selection)\n- [Linear models](docs/linear-models.md)\n  - [AUC regressor](docs/linear-models.md#auc-regressor)\n- [Model selection](docs/model-selection.md)\n  - [Ordered cross-validation](docs/model-selection.md#ordered-cross-validation)\n- [Natural Language Processing (NLP)](docs/nlp.md)\n  - [NB-SVM](docs/nlp.md#nb-svm)\n  - [Norvig spelling corrector](docs/nlp.md#norvig-spelling-corrector)\n  - [Top-terms classifier](docs/nlp.md#top-terms-classifier)\n- [Pipeline](docs/pipeline.md)\n  - [Column selection](docs/pipeline.md#column-selection)\n  - [Series transformer](docs/pipeline.md#series-transformer)\n  - [DataFrame transformer](docs/pipeline.md#dataframe-transformer)\n  - [Lambda transformer](docs/pipeline.md#lambda-transformer)\n- [Plotting](docs/plotting.md)\n  - [Latex style figures](docs/plotting.md#latex-style-figures)\n- [Preprocessing](docs/preprocessing.md)\n  - [Binning](docs/preprocessing.md#binning)\n  - [Groupby transformer](docs/preprocessing.md#groupby-transformer)\n  - [One-hot encoding](docs/preprocessing.md#one-hot-encoding)\n  - [Resampling](docs/preprocessing.md#resampling)\n- [Time series analysis (TSA)](docs/tsa.md)\n  - [Exponentially weighted average](docs/tsa.md#ewm-optimization)\n  - [Exponential smoothing](docs/tsa.md#exponential-smoothing)\n  - [Frequency average forecasting](docs/tsa.md#frequency-average-forecasting)\n- [Various](docs/various.md)\n  - [Datetime range](docs/various.md#datetime-range)\n  - [Next day of the week](docs/various.md#next-day-of-the-week)\n  - [Subsequence lengths](docs/various.md#subsequence-lengths)\n  - [DataFrame to Vowpal Wabbit](docs/various.md#dataframe-to-vowpal-wabbit)\n  - [Normalized compression distance](docs/various.md#normalized-compression-distance)\n  - [Skyline querying](docs/various.md#skyline-querying)\n  - [Fuzzy duplicates](docs/various.md#fuzzy-duplicates)\n\n## Other Python data science and machine learning toolkits\n\n- [fastai/fastai](https://github.com/fastai/fastai)\n- [Laurae2/Laurae](https://github.com/Laurae2/Laurae)\n- [rasbt/mlxtend](https://github.com/rasbt/mlxtend)\n- [reiinakano/scikit-plot](https://github.com/reiinakano/scikit-plot)\n- [scikit-learn-contrib](https://github.com/scikit-learn-contrib)\n- [zygmuntz/phraug2](https://github.com/zygmuntz/phraug2)\n\n## License\n\nThe MIT License (MIT). Please see the [license file](LICENSE) for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxhalford%2Fxam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxhalford%2Fxam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxhalford%2Fxam/lists"}