{"id":46765487,"url":"https://github.com/fidelity/selective","last_synced_at":"2026-03-09T22:30:07.679Z","repository":{"id":42036079,"uuid":"322388961","full_name":"fidelity/selective","owner":"fidelity","description":"[AMAI 2024] Selective: Feature Selection Library","archived":false,"fork":false,"pushed_at":"2025-09-09T21:13:32.000Z","size":189,"stargazers_count":68,"open_issues_count":0,"forks_count":19,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-10-28T14:55:45.001Z","etag":null,"topics":["feature-selection","supervised-feature-selection","unsupervised-feature-selection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fidelity.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.txt","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-12-17T19:12:45.000Z","updated_at":"2025-09-09T21:13:30.000Z","dependencies_parsed_at":"2024-12-03T16:30:50.373Z","dependency_job_id":"f4926878-7a26-44f7-8e99-ae497ccad34d","html_url":"https://github.com/fidelity/selective","commit_stats":{"total_commits":42,"total_committers":8,"mean_commits":5.25,"dds":0.7142857142857143,"last_synced_commit":"d12b6a4a74768510e8eb45757e7e98b0dd3a44aa"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/fidelity/selective","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fidelity%2Fselective","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fidelity%2Fselective/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fidelity%2Fselective/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fidelity%2Fselective/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fidelity","download_url":"https://codeload.github.com/fidelity/selective/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fidelity%2Fselective/sbom","scorecard":{"id":399021,"data":{"date":"2025-08-11","repo":{"name":"github.com/fidelity/selective","commit":"7e07abc8894b7f26ec3fd44458171a3fc5ebdfbf"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":4,"checks":[{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Code-Review","score":5,"reason":"Found 10/20 approved changesets -- score normalized to 5","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/ci.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/ci.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/fidelity/selective/ci.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/ci.yml:24: update your workflow using https://app.stepsecurity.io/secureworkflow/fidelity/selective/ci.yml/master?enable=pin","Warn: pipCommand not pinned by hash: .github/workflows/ci.yml:31","Warn: pipCommand not pinned by hash: .github/workflows/ci.yml:32","Info:   0 out of   2 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   2 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.md:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE.md:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Security-Policy","score":10,"reason":"security policy file detected","details":["Info: security policy file detected: github.com/fidelity/.github/SECURITY.md:1","Info: Found linked content: github.com/fidelity/.github/SECURITY.md:1","Info: Found disclosure, vulnerability, and/or timelines in security policy: github.com/fidelity/.github/SECURITY.md:1","Info: Found text in security policy: github.com/fidelity/.github/SECURITY.md:1"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":0,"reason":"13 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2022-288 / GHSA-6hrg-qmvc-2xh8","Warn: Project is vulnerable to: PYSEC-2024-231","Warn: Project is vulnerable to: PYSEC-2018-34 / GHSA-2fc2-6r4j-p65h","Warn: Project is vulnerable to: PYSEC-2021-856 / GHSA-5545-2q6w-2gh6","Warn: Project is vulnerable to: PYSEC-2019-108 / GHSA-9fq2-x9r6-wfmf","Warn: Project is vulnerable to: PYSEC-2018-33 / GHSA-cw6w-4rcx-xphc","Warn: Project is vulnerable to: PYSEC-2021-857 / GHSA-f7c7-j99h-c22f","Warn: Project is vulnerable to: GHSA-fpfv-jqm9-f5jm","Warn: Project is vulnerable to: PYSEC-2017-1 / GHSA-frgw-fgh6-9g52","Warn: Project is vulnerable to: PYSEC-2020-73","Warn: Project is vulnerable to: PYSEC-2020-107 / GHSA-jjw5-xxj6-pcv5","Warn: Project is vulnerable to: PYSEC-2024-110 / GHSA-jw8x-6495-233v","Warn: Project is vulnerable to: PYSEC-2020-108"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 1 commits out of 24 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-18T19:42:02.539Z","repository_id":42036079,"created_at":"2025-08-18T19:42:02.539Z","updated_at":"2025-08-18T19:42:02.539Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30314624,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T20:05:46.299Z","status":"ssl_error","status_checked_at":"2026-03-09T19:57:04.425Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["feature-selection","supervised-feature-selection","unsupervised-feature-selection"],"created_at":"2026-03-09T22:30:06.708Z","updated_at":"2026-03-09T22:30:07.624Z","avatar_url":"https://github.com/fidelity.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![ci](https://github.com/fidelity/selective/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fidelity/selective/actions/workflows/ci.yml) [![PyPI version fury.io](https://badge.fury.io/py/selective.svg)](https://pypi.python.org/pypi/selective/) [![PyPI license](https://img.shields.io/pypi/l/selective.svg)](https://pypi.python.org/pypi/selective/) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com) [![Downloads](https://static.pepy.tech/personalized-badge/selective?period=total\u0026units=international_system\u0026left_color=grey\u0026right_color=orange\u0026left_text=Downloads)](https://pepy.tech/project/selective)\n\n\n# Selective: Feature Selection Library\n**Selective** is a white-box feature selection library that supports supervised and unsupervised selection methods for classification and regression tasks. \n\nThe library provides:\n\n* Simple to complex selection methods: Variance, Correlation, Statistical, Linear, Tree-based, or Customized.\n* [Text-based selection](#text-based-selection) to maximize diversity in text embeddings and metadata coverage.\n* Interoperable with data frames as the input.\n* Automated task detection. No need to know what feature selection method works with what machine learning task.\n* Benchmarking multiple selectors using cross-validation with built-in parallelization.\n* Inspection of the results and feature importance. \n\nSelective also provides optimized item selection based on diversity of text embeddings via [TextWiser](https://github.com/fidelity/textwiser) and \ncoverage of binary labels via multi-objective optimization ([AMAI'24](https://trebuchet.public.springernature.app/get_content/2c9eb6df-5c2b-42bc-89d6-4e3eb8bc8799?utm_source=rct_congratemailt\u0026utm_medium=email\u0026utm_campaign=nonoa_20240405\u0026utm_content=10.1007/s10472-024-09941-x), [CPAIOR'21](https://link.springer.com/chapter/10.1007/978-3-030-78230-6_27), [DSO@IJCAI'22](https://arxiv.org/abs/2112.03105)). This approach speeds-up online experimentation and boosts recommender systems significantly as presented at [NVIDIA GTC'22](https://www.youtube.com/watch?v=_v-B2nRy79w).  \n\nSelective is developed by the Artificial Intelligence Center of Excellence at Fidelity Investments.\n\n## Quick Start\n```python\n# Import Selective and SelectionMethod\nfrom sklearn.datasets import fetch_california_housing\nfrom feature.utils import get_data_label\nfrom feature.selector import Selective, SelectionMethod\n\n# Data\ndata, label = get_data_label(fetch_california_housing())\n\n# Feature selectors from simple to more complex\nselector = Selective(SelectionMethod.Variance(threshold=0.0))\nselector = Selective(SelectionMethod.Correlation(threshold=0.5, method=\"pearson\"))\nselector = Selective(SelectionMethod.Statistical(num_features=3, method=\"anova\"))\nselector = Selective(SelectionMethod.Linear(num_features=3, regularization=\"none\"))\nselector = Selective(SelectionMethod.TreeBased(num_features=3))\n\n# Feature reduction\nsubset = selector.fit_transform(data, label)\nprint(\"Reduction:\", list(subset.columns))\nprint(\"Scores:\", list(selector.get_absolute_scores()))\n```\n\n\n## Available Methods\n\n|                                                           Method                                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                        Options                                                                                                                                                                                                                                                                                                                                                                                                                                         |\n|:--------------------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|\n| [Variance per Feature](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html) |                                                                                                                                                                                                                                                                                                                                                                                                                                      `threshold`                                                                                                                                                                                                                                                                                                                                                                                                                                       |\n|   [Correlation pairwise Features](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html)   |                                                                                                                                                                                                                                                                     [Pearson Correlation Coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) \u003cbr\u003e [Kendall Rank Correlation Coefficient](https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient) \u003cbr\u003e [Spearman's Rank Correlation Coefficient](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient) \u003cbr\u003e                                                                                                                                                                                                                                                                      |\n|    [Statistical Analysis](https://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection)     |                                                                                                             [ANOVA F-test Classification](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_classif.html) \u003cbr\u003e [F-value Regression](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html) \u003cbr\u003e [Chi-Square](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html) \u003cbr\u003e [KL Divergence](https://en.wikipedia.org/wiki/Kullback–Leibler_divergence) \u003cbr\u003e [Mutual Information Classification](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html) \u003cbr\u003e [Variance Inflation Factor](https://www.statsmodels.org/stable/generated/statsmodels.stats.outliers_influence.variance_inflation_factor.html)                                                                                                               |\n|                             [Linear Methods](https://en.wikipedia.org/wiki/Linear_regression)                              |                                                                                                   [Linear Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html?highlight=linear%20regression#sklearn.linear_model.LinearRegression) \u003cbr\u003e [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regression#sklearn.linear_model.LogisticRegression) \u003cbr\u003e [Lasso Regularization](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso) \u003cbr\u003e [Ridge Regularization](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge) \u003cbr\u003e                                                                                                    |\n|                          [Tree-based Methods](https://scikit-learn.org/stable/modules/tree.html)                           | [Decision Tree](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier) \u003cbr\u003e [Random Forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html?highlight=random%20forest#sklearn.ensemble.RandomForestClassifier) \u003cbr\u003e [Extra Trees Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html) \u003cbr\u003e [XGBoost](https://xgboost.readthedocs.io/en/latest/) \u003cbr\u003e [LightGBM](https://lightgbm.readthedocs.io/en/latest/) \u003cbr\u003e [AdaBoost](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html) \u003cbr\u003e [CatBoost](https://github.com/catboost)\u003cbr\u003e [Gradient Boosting Tree](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html) \u003cbr\u003e |\n|  [Text-based Methods](https://link.springer.com/chapter/10.1007/978-3-030-78230-6_27)  |                                                                                                                                                                                                                                                                                                                                              `featurization_method` = [TextWiser](https://github.com/fidelity/textwiser) \u003cbr\u003e `optimization_method = [\"exact\", \"greedy\", \"kmeans\", \"random\"]` \u003cbr\u003e `cost_metric = [\"unicost\", \"diverse\"]`                                                                                                                                                                                                                                                                                                                                              |\n\n\n\n## Benchmarking\n\n```python\n# Imports\nfrom sklearn.datasets import fetch_california_housing\nfrom feature.utils import get_data_label\nfrom xgboost import XGBClassifier, XGBRegressor\nfrom feature.selector import SelectionMethod, benchmark, calculate_statistics\n\n# Data\ndata, label = get_data_label(fetch_california_housing())\n\n# Selectors\ncorr_threshold = 0.5\nnum_features = 3\ntree_params = {\"n_estimators\": 50, \"max_depth\": 5, \"random_state\": 111, \"n_jobs\": 4}\nselectors = {\n\n  # Correlation methods\n  \"corr_pearson\": SelectionMethod.Correlation(corr_threshold, method=\"pearson\"),\n  \"corr_kendall\": SelectionMethod.Correlation(corr_threshold, method=\"kendall\"),\n  \"corr_spearman\": SelectionMethod.Correlation(corr_threshold, method=\"spearman\"),\n  \n  # Statistical methods\n  \"stat_anova\": SelectionMethod.Statistical(num_features, method=\"anova\"),\n  \"stat_chi_square\": SelectionMethod.Statistical(num_features, method=\"chi_square\"),\n  \"stat_kl_divergence\": SelectionMethod.Statistical(num_features, method=\"kl_divergence\"),\n  \"stat_mutual_info\": SelectionMethod.Statistical(num_features, method=\"mutual_info\"),\n  \n  # Linear methods\n  \"linear\": SelectionMethod.Linear(num_features, regularization=\"none\"),\n  \"lasso\": SelectionMethod.Linear(num_features, regularization=\"lasso\", alpha=1000),\n  \"ridge\": SelectionMethod.Linear(num_features, regularization=\"ridge\", alpha=1000),\n  \n  # Non-linear tree-based methods\n  \"random_forest\": SelectionMethod.TreeBased(num_features),\n  \"xgboost_classif\": SelectionMethod.TreeBased(num_features, estimator=XGBClassifier(**tree_params)),\n  \"xgboost_regress\": SelectionMethod.TreeBased(num_features, estimator=XGBRegressor(**tree_params))\n}\n\n# Benchmark (sequential)\nscore_df, selected_df, runtime_df = benchmark(selectors, data, label, cv=5)\nprint(score_df, \"\\n\\n\", selected_df, \"\\n\\n\", runtime_df)\n\n# Benchmark (in parallel)\nscore_df, selected_df, runtime_df = benchmark(selectors, data, label, cv=5, n_jobs=4)\nprint(score_df, \"\\n\\n\", selected_df, \"\\n\\n\", runtime_df)\n\n# Get benchmark statistics by feature\nstats_df = calculate_statistics(score_df, selected_df)\nprint(stats_df)\n```\n\n## Text-based Selection\nThis example shows how to use text-based selection. In this scenario, we would like to select a subset of articles that is most diverse in the text embedding space and covers a range of topics. \n\n```python\n# Import Selective and TextWiser\nimport pandas as pd\nfrom feature.selector import Selective, SelectionMethod\nfrom textwiser import TextWiser, Embedding, Transformation\n\n# Data with the text content of each article\ndata = pd.DataFrame({\"article_1\": [\"article text here\"],\n                     \"article_2\": [\"article text here\"],\n                     \"article_3\": [\"article text here\"],\n                     \"article_4\": [\"article text here\"],\n                     \"article_5\": [\"article text here\"]})\n\n# Labels to denote 0/1 coverage metadata for each article \n# across four labels, e.g., sports, international, entertainment, science    \nlabels = pd.DataFrame({\"article_1\": [1, 1, 0, 1],\n                       \"article_2\": [0, 1, 0, 0],\n                       \"article_3\": [0, 0, 1, 0],\n                       \"article_4\": [0, 0, 1, 1],\n                       \"article_5\": [1, 1, 1, 0]},\n                      index=[\"label_1\", \"label_2\", \"label_3\", \"label_4\"])\n\n# TextWiser featurization method to create text embeddings\ntextwiser = TextWiser(Embedding.TfIdf(), Transformation.NMF(n_components=20))\n\n# Text-based selection\n# The goal is to select a subset of articles \n# that is most diverse in the text embedding space of articles\n# and covers the most labels in each topic\nselector = Selective(SelectionMethod.TextBased(num_features=2, featurization_method=textwiser))\n\n# Feature reduction\nsubset = selector.fit_transform(data, labels)\nprint(\"Reduction:\", list(subset.columns))\n```\n\n## Visualization\n\n```python\nimport pandas as pd\nfrom sklearn.datasets import fetch_california_housing\nfrom feature.utils import get_data_label\nfrom feature.selector import SelectionMethod, Selective, plot_importance\n\n# Data\ndata, label = get_data_label(fetch_california_housing())\n\n# Feature Selector\nselector = Selective(SelectionMethod.Linear(num_features=8, regularization=\"none\"))\nsubset = selector.fit_transform(data, label)\n\n# Plot Feature Importance\ndf = pd.DataFrame(selector.get_absolute_scores(), index=data.columns)\nplot_importance(df)\n```\n\n## Installation\n\nSelective requires **Python 3.8+** and can be installed from PyPI using ``pip install selective``.\n\n## Source \n\nAlternatively, you can build a wheel package on your platform from scratch using the source code:\n\n```bash\ngit clone https://github.com/fidelity/selective.git\ncd selective\npip install setuptools wheel # if wheel is not installed\npython setup.py sdist bdist_wheel\npip install dist/selective-X.X.X-py3-none-any.whl\n```\n\n## Test your setup\n\n```\ncd selective\npython -m unittest discover tests\n```\n\n## Citation\n\nIf you use Selective in a publication, please cite it as:\n\n```bibtex\n    @article{DBLP:journals/amai/HaDVH98,\n    author       = {Kad\\i{}o\\u{g}lu, Serdar and Kleynhans, Bernard and Wang, Xin},\n    title        = {Integrating optimized item selection with active learning for continuous exploration in recommender systems},\n    journal      = {Ann. Math. Artif. Intell.},\n    year         = {2024},\n    url          = {https://doi.org/10.1007/s10472-024-09941-x},\n    doi          = {10.1007/s10472-024-09941-x},\n    }\n}\n```\n\n## Support\n\nPlease submit bug reports and feature requests as [Issues](https://github.com/fidelity/selective/issues).\n\n## License\nSelective is licensed under [Apache 2.0](https://github.com/fidelity/selective/blob/master/LICENSE.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffidelity%2Fselective","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffidelity%2Fselective","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffidelity%2Fselective/lists"}