{"id":19619591,"url":"https://github.com/lsorber/neo-ls-svm","last_synced_at":"2025-04-28T03:31:31.638Z","repository":{"id":215990363,"uuid":"739301655","full_name":"lsorber/neo-ls-svm","owner":"lsorber","description":"Neo LS-SVM is a modern Least-Squares Support Vector Machine implementation","archived":false,"fork":false,"pushed_at":"2024-04-01T19:57:37.000Z","size":329,"stargazers_count":19,"open_issues_count":10,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-09T00:52:52.722Z","etag":null,"topics":["conformal-prediction","gaussian-processes","kernel-methods","kernel-ridge-regression","ls-svm","machine-learning","prediction-intervals","python","support-vector-machines"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lsorber.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-01-05T08:36:40.000Z","updated_at":"2024-11-05T21:04:59.000Z","dependencies_parsed_at":"2024-04-01T19:56:40.577Z","dependency_job_id":null,"html_url":"https://github.com/lsorber/neo-ls-svm","commit_stats":null,"previous_names":["lsorber/neo-ls-svm"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsorber%2Fneo-ls-svm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsorber%2Fneo-ls-svm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsorber%2Fneo-ls-svm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsorber%2Fneo-ls-svm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lsorber","download_url":"https://codeload.github.com/lsorber/neo-ls-svm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224091909,"owners_count":17254152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conformal-prediction","gaussian-processes","kernel-methods","kernel-ridge-regression","ls-svm","machine-learning","prediction-intervals","python","support-vector-machines"],"created_at":"2024-11-11T11:14:21.800Z","updated_at":"2024-11-11T11:14:23.204Z","avatar_url":"https://github.com/lsorber.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers\u0026message=Open\u0026color=blue\u0026logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/lsorber/neo-ls-svm) [![Open in GitHub Codespaces](https://img.shields.io/static/v1?label=GitHub%20Codespaces\u0026message=Open\u0026color=blue\u0026logo=github)](https://github.com/codespaces/new?hide_repo_select=true\u0026ref=main\u0026repo=739301655)\n\n# Neo LS-SVM\n\nNeo LS-SVM is a modern [Least-Squares Support Vector Machine](https://en.wikipedia.org/wiki/Least-squares_support_vector_machine) implementation in Python that offers several benefits over sklearn's classic `sklearn.svm.SVC` classifier and `sklearn.svm.SVR` regressor:\n\n1. ⚡ Linear complexity in the number of training examples with [Orthogonal Random Features](https://arxiv.org/abs/1610.09072).\n2. 🚀 Hyperparameter free: zero-cost optimization of the [regularisation parameter γ](https://en.wikipedia.org/wiki/Ridge_regression#Tikhonov_regularization) and [kernel parameter σ](https://en.wikipedia.org/wiki/Radial_basis_function_kernel).\n3. 🏔️ Adds a new tertiary objective that minimizes the complexity of the prediction surface.\n4. 🎁 Returns the leave-one-out residuals and error for free after fitting.\n5. 🌀 Learns an affine transformation of the feature matrix to optimally separate the target's bins.\n6. 🪞 Can solve the LS-SVM both in the primal and dual space.\n7. 🌡️ Isotonically calibrated `predict_proba`.\n8. ✅ Conformally calibrated `predict_quantiles` and `predict_interval`.\n9. 🔔 Bayesian estimation of the predictive standard deviation with `predict_std`.\n10. 🐼 Pandas DataFrame output when the input is a pandas DataFrame.\n\n## Using\n\n### Installing\n\nFirst, install this package with:\n\n```bash\npip install neo-ls-svm\n```\n\n### Classification and regression\n\nThen, you can import `neo_ls_svm.NeoLSSVM` as an sklearn-compatible binary classifier and regressor. Example usage:\n\n```python\nfrom neo_ls_svm import NeoLSSVM\nfrom pandas import get_dummies\nfrom sklearn.datasets import fetch_openml\nfrom sklearn.model_selection import train_test_split\n\n# Binary classification example:\nX, y = fetch_openml(\"churn\", version=3, return_X_y=True, as_frame=True, parser=\"auto\")\nX_train, X_test, y_train, y_test = train_test_split(get_dummies(X), y, test_size=0.15, random_state=42)\nmodel = NeoLSSVM().fit(X_train, y_train)\nmodel.score(X_test, y_test)  # 93.1% (compared to sklearn.svm.SVC's 89.6%)\n\n# Regression example:\nX, y = fetch_openml(\"ames_housing\", version=1, return_X_y=True, as_frame=True, parser=\"auto\")\nX_train, X_test, y_train, y_test = train_test_split(get_dummies(X), y, test_size=0.15, random_state=42)\nmodel = NeoLSSVM().fit(X_train, y_train)\nmodel.score(X_test, y_test)  # 82.4% (compared to sklearn.svm.SVR's -11.8%)\n```\n\n### Predicting quantiles\n\nNeo LS-SVM implements conformal prediction with a Bayesian nonconformity estimate to compute quantiles and prediction intervals for both classification and regression. Example usage:\n\n```python\n# Predict the house prices and their quantiles.\nŷ_test = model.predict(X_test)\nŷ_test_quantiles = model.predict_quantiles(X_test, quantiles=(0.025, 0.05, 0.1, 0.9, 0.95, 0.975))\n```\n\nWhen the input data is a pandas DataFrame, the output is also a pandas DataFrame. For example, printing the head of `ŷ_test_quantiles` yields:\n\n|   house_id |    0.025 |     0.05 |      0.1 |      0.9 |     0.95 |    0.975 |\n|-----------:|---------:|---------:|---------:|---------:|---------:|---------:|\n|       1357 | 114283.0 | 124767.6 | 133314.0 | 203162.0 | 220407.5 | 245655.3 |\n|       2367 |  85518.3 |  91787.2 |  93709.8 | 107464.3 | 108472.6 | 114482.3 |\n|       2822 | 147165.9 | 157462.8 | 167193.1 | 243646.5 | 263324.4 | 291963.3 |\n|       2126 |  81788.7 |  88738.1 |  91367.4 | 111944.9 | 114800.7 | 122874.5 |\n|       1544 |  94507.1 | 108288.2 | 120184.3 | 222630.5 | 248668.2 | 283703.4 |\n\nLet's visualize the predicted quantiles on the test set:\n\n\u003cimg src=\"https://github.com/lsorber/neo-ls-svm/assets/4543654/cd24e739-e857-4045-8a70-07e92367a901\" width=\"512\"\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eExpand to see the code that generated the graph above\u003c/summary\u003e\n\n```python\nimport matplotlib.pyplot as plt\nimport matplotlib.ticker as ticker\n\n%config InlineBackend.figure_format = \"retina\"\nplt.rcParams[\"font.size\"] = 8\nidx = (-ŷ_test.sample(50, random_state=42)).sort_values().index\ny_ticks = list(range(1, len(idx) + 1))\nplt.figure(figsize=(4, 5))\nfor j in range(3):\n    end = ŷ_test_quantiles.shape[1] - 1 - j\n    coverage = round(100 * (ŷ_test_quantiles.columns[end] - ŷ_test_quantiles.columns[j]))\n    plt.barh(\n        y_ticks,\n        ŷ_test_quantiles.loc[idx].iloc[:, end] - ŷ_test_quantiles.loc[idx].iloc[:, j],\n        left=ŷ_test_quantiles.loc[idx].iloc[:, j],\n        label=f\"{coverage}% Prediction interval\",\n        color=[\"#b3d9ff\", \"#86bfff\", \"#4da6ff\"][j],\n    )\nplt.plot(y_test.loc[idx], y_ticks, \"s\", markersize=3, markerfacecolor=\"none\", markeredgecolor=\"#e74c3c\", label=\"Actual value\")\nplt.plot(ŷ_test.loc[idx], y_ticks, \"s\", color=\"blue\", markersize=0.6, label=\"Predicted value\")\nplt.xlabel(\"House price\")\nplt.ylabel(\"Test house index\")\nplt.xlim(0, 500e3)\nplt.yticks(y_ticks, y_ticks)\nplt.tick_params(axis=\"y\", labelsize=6)\nplt.grid(axis=\"x\", color=\"lightsteelblue\", linestyle=\":\", linewidth=0.5)\nplt.gca().xaxis.set_major_formatter(ticker.StrMethodFormatter(\"${x:,.0f}\"))\nplt.gca().spines[\"top\"].set_visible(False)\nplt.gca().spines[\"right\"].set_visible(False)\nplt.legend()\nplt.tight_layout()\nplt.show()\n```\n\u003c/details\u003e\n\n### Predicting intervals\n\nIn addition to quantile prediction, you can use `predict_interval` to predict conformally calibrated prediction intervals. Compared to quantiles, these focus on reliable coverage over quantile accuracy. Example usage:\n\n```python\n# Compute prediction intervals for the houses in the test set.\nŷ_test_interval = model.predict_interval(X_test, coverage=0.95)\n\n# Measure the coverage of the prediction intervals on the test set\ncoverage = ((ŷ_test_interval.iloc[:, 0] \u003c= y_test) \u0026 (y_test \u003c= ŷ_test_interval.iloc[:, 1])).mean()\nprint(coverage)  # 94.3%\n```\n\nWhen the input data is a pandas DataFrame, the output is also a pandas DataFrame. For example, printing the head of `ŷ_test_interval` yields:\n\n|   house_id |    0.025 |    0.975 |\n|-----------:|---------:|---------:|\n|       1357 | 114283.0 | 245849.2 |\n|       2367 |  85518.3 | 114411.4 |\n|       2822 | 147165.9 | 292179.2 |\n|       2126 |  81788.7 | 122838.1 |\n|       1544 |  94507.1 | 284062.6 |\n\n## Benchmarks\n\nWe select all binary classification and regression datasets below 1M entries from the [AutoML Benchmark](https://arxiv.org/abs/2207.12560). Each dataset is split into 85% for training and 15% for testing. We apply `skrub.TableVectorizer` as a preprocessing step for `neo_ls_svm.NeoLSSVM` and `sklearn.svm.SVC,SVR` to vectorize the pandas DataFrame training data into a NumPy array. Models are fitted only once on each dataset, with their default settings and no hyperparameter tuning.\n\n\u003cdetails open\u003e\n\u003csummary\u003eBinary classification\u003c/summary\u003e\n\nROC-AUC on 15% test set:\n\n|                          dataset |   LGBMClassifier |         NeoLSSVM |              SVC |\n|---------------------------------:|-----------------:|-----------------:|-----------------:|\n|                              ada |  🥈 90.9% (0.1s) |  🥇 90.9% (1.9s) |     83.1% (4.5s) |\n|                            adult |  🥇 93.0% (0.5s) | 🥈 89.0% (15.7s) |                / |\n|           amazon_employee_access |  🥇 85.6% (0.5s) |  🥈 64.5% (9.0s) |                / |\n|                           arcene |  🥈 78.0% (0.6s) |     70.0% (6.3s) |  🥇 82.0% (4.0s) |\n|                       australian |  🥇 88.3% (0.2s) |     79.9% (1.7s) |  🥈 81.9% (0.1s) |\n|                   bank-marketing |  🥇 93.5% (0.5s) | 🥈 91.0% (11.8s) |                / |\n| blood-transfusion-service-center |     62.0% (0.3s) |  🥇 71.0% (2.2s) |  🥈 69.7% (0.1s) |\n|                            churn |  🥇 91.7% (0.6s) |  🥈 81.0% (2.1s) |     70.6% (2.9s) |\n|           click_prediction_small |  🥇 67.7% (0.5s) | 🥈 66.6% (10.9s) |                / |\n|                          jasmine |  🥇 86.1% (0.3s) |     79.5% (1.9s) |  🥈 85.3% (7.4s) |\n|                              kc1 |  🥇 78.9% (0.3s) |  🥈 76.6% (1.4s) |     45.7% (0.6s) |\n|                         kr-vs-kp | 🥇 100.0% (0.6s) |     99.2% (1.6s) |  🥈 99.4% (2.3s) |\n|                         madeline |  🥇 93.1% (0.5s) |     65.6% (1.9s) | 🥈 82.5% (19.8s) |\n|                  ozone-level-8hr |  🥈 91.2% (0.4s) |  🥇 91.6% (1.7s) |     72.9% (0.6s) |\n|                              pc4 |  🥇 95.3% (0.3s) |  🥈 90.9% (1.5s) |     25.7% (0.3s) |\n|                 phishingwebsites |  🥇 99.5% (0.5s) |  🥈 98.9% (3.6s) |    98.7% (10.0s) |\n|                          phoneme |  🥇 95.6% (0.3s) |  🥈 93.5% (2.1s) |     91.2% (2.0s) |\n|                      qsar-biodeg |  🥇 92.7% (0.4s) |  🥈 91.1% (5.2s) |     86.8% (0.3s) |\n|                        satellite |  🥈 98.7% (0.2s) |  🥇 99.5% (1.9s) |     98.5% (0.4s) |\n|                          sylvine |  🥇 98.5% (0.2s) |  🥈 97.1% (2.0s) |     96.5% (3.8s) |\n|                             wilt |  🥈 99.5% (0.2s) |  🥇 99.8% (1.8s) |     98.9% (0.5s) |\n\n\u003c/details\u003e\n\n\u003cdetails open\u003e\n\u003csummary\u003eRegression\u003c/summary\u003e\n\nR² on 15% test set:\n\n|                       dataset |   LGBMRegressor |         NeoLSSVM |              SVR |\n|------------------------------:|----------------:|-----------------:|-----------------:|\n|                       abalone | 🥈 56.2% (0.1s) |  🥇 59.5% (2.5s) |     51.3% (0.7s) |\n|                        boston | 🥇 91.7% (0.2s) |  🥈 89.6% (1.1s) |     35.1% (0.0s) |\n|              brazilian_houses | 🥈 55.9% (0.3s) |  🥇 88.4% (3.7s) |      5.4% (7.0s) |\n|                      colleges | 🥇 58.5% (0.4s) |  🥈 42.2% (6.6s) |    40.2% (15.1s) |\n|                      diamonds | 🥇 98.2% (0.3s) | 🥈 95.2% (13.7s) |                / |\n|                     elevators | 🥇 87.7% (0.5s) |  🥈 82.6% (6.5s) |                / |\n|                     house_16h | 🥇 67.7% (0.4s) |  🥈 52.8% (6.0s) |                / |\n|          house_prices_nominal | 🥇 89.0% (0.3s) |  🥈 78.3% (2.1s) |     -2.9% (1.2s) |\n|                   house_sales | 🥇 89.2% (0.4s) |  🥈 77.8% (5.9s) |                / |\n|           mip-2016-regression | 🥇 59.2% (0.4s) |  🥈 34.9% (5.8s) |    -27.3% (0.4s) |\n|                     moneyball | 🥇 93.2% (0.3s) |  🥈 91.3% (1.1s) |      0.8% (0.2s) |\n|                           pol | 🥇 98.7% (0.3s) |  🥈 74.9% (4.6s) |                / |\n|                         quake |   -10.7% (0.2s) |  🥇 -1.0% (1.6s) | 🥈 -10.7% (0.1s) |\n| sat11-hand-runtime-regression | 🥇 78.3% (0.4s) |  🥈 61.7% (2.1s) |    -56.3% (5.1s) |\n|                       sensory | 🥇 29.2% (0.1s) |      3.0% (1.6s) |  🥈 16.4% (0.0s) |\n|                        socmob | 🥇 79.6% (0.2s) |  🥈 72.5% (6.6s) |     30.8% (0.1s) |\n|                      space_ga | 🥇 70.3% (0.3s) |  🥈 43.6% (1.5s) |     35.9% (0.2s) |\n|                       tecator | 🥈 98.3% (0.1s) |  🥇 99.4% (0.9s) |     78.5% (0.0s) |\n|                      us_crime | 🥈 62.8% (0.6s) |  🥇 63.0% (2.3s) |      6.7% (0.8s) |\n|                  wine_quality | 🥇 45.6% (0.2s) |  🥈 36.5% (2.8s) |     16.4% (1.6s) |\n\n\u003c/details\u003e\n\n## Contributing\n\n\u003cdetails\u003e\n\u003csummary\u003ePrerequisites\u003c/summary\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e1. Set up Git to use SSH\u003c/summary\u003e\n\n1. [Generate an SSH key](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent#generating-a-new-ssh-key) and [add the SSH key to your GitHub account](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account).\n1. Configure SSH to automatically load your SSH keys:\n    ```sh\n    cat \u003c\u003c EOF \u003e\u003e ~/.ssh/config\n    Host *\n      AddKeysToAgent yes\n      IgnoreUnknown UseKeychain\n      UseKeychain yes\n    EOF\n    ```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e2. Install Docker\u003c/summary\u003e\n\n1. [Install Docker Desktop](https://www.docker.com/get-started).\n    - Enable _Use Docker Compose V2_ in Docker Desktop's preferences window.\n    - _Linux only_:\n        - Export your user's user id and group id so that [files created in the Dev Container are owned by your user](https://github.com/moby/moby/issues/3206):\n            ```sh\n            cat \u003c\u003c EOF \u003e\u003e ~/.bashrc\n            export UID=$(id --user)\n            export GID=$(id --group)\n            EOF\n            ```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e3. Install VS Code or PyCharm\u003c/summary\u003e\n\n1. [Install VS Code](https://code.visualstudio.com/) and [VS Code's Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers). Alternatively, install [PyCharm](https://www.jetbrains.com/pycharm/download/).\n2. _Optional:_ install a [Nerd Font](https://www.nerdfonts.com/font-downloads) such as [FiraCode Nerd Font](https://github.com/ryanoasis/nerd-fonts/tree/master/patched-fonts/FiraCode) and [configure VS Code](https://github.com/tonsky/FiraCode/wiki/VS-Code-Instructions) or [configure PyCharm](https://github.com/tonsky/FiraCode/wiki/Intellij-products-instructions) to use it.\n\n\u003c/details\u003e\n\n\u003c/details\u003e\n\n\u003cdetails open\u003e\n\u003csummary\u003eDevelopment environments\u003c/summary\u003e\n\nThe following development environments are supported:\n\n1. ⭐️ _GitHub Codespaces_: click on _Code_ and select _Create codespace_ to start a Dev Container with [GitHub Codespaces](https://github.com/features/codespaces).\n1. ⭐️ _Dev Container (with container volume)_: click on [Open in Dev Containers](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/lsorber/neo-ls-svm) to clone this repository in a container volume and create a Dev Container with VS Code.\n1. _Dev Container_: clone this repository, open it with VS Code, and run \u003ckbd\u003eCtrl/⌘\u003c/kbd\u003e + \u003ckbd\u003e⇧\u003c/kbd\u003e + \u003ckbd\u003eP\u003c/kbd\u003e → _Dev Containers: Reopen in Container_.\n1. _PyCharm_: clone this repository, open it with PyCharm, and [configure Docker Compose as a remote interpreter](https://www.jetbrains.com/help/pycharm/using-docker-compose-as-a-remote-interpreter.html#docker-compose-remote) with the `dev` service.\n1. _Terminal_: clone this repository, open it with your terminal, and run `docker compose up --detach dev` to start a Dev Container in the background, and then run `docker compose exec dev zsh` to open a shell prompt in the Dev Container.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eDeveloping\u003c/summary\u003e\n\n- This project follows the [Conventional Commits](https://www.conventionalcommits.org/) standard to automate [Semantic Versioning](https://semver.org/) and [Keep A Changelog](https://keepachangelog.com/) with [Commitizen](https://github.com/commitizen-tools/commitizen).\n- Run `poe` from within the development environment to print a list of [Poe the Poet](https://github.com/nat-n/poethepoet) tasks available to run on this project.\n- Run `poetry add {package}` from within the development environment to install a run time dependency and add it to `pyproject.toml` and `poetry.lock`. Add `--group test` or `--group dev` to install a CI or development dependency, respectively.\n- Run `poetry update` from within the development environment to upgrade all dependencies to the latest versions allowed by `pyproject.toml`.\n- Run `cz bump` to bump the package's version, update the `CHANGELOG.md`, and create a git tag.\n\n\u003c/details\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flsorber%2Fneo-ls-svm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flsorber%2Fneo-ls-svm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flsorber%2Fneo-ls-svm/lists"}