{"id":15288146,"url":"https://github.com/microsoft/nimbusml","last_synced_at":"2025-10-07T03:30:21.570Z","repository":{"id":54565721,"uuid":"153771517","full_name":"microsoft/NimbusML","owner":"microsoft","description":"Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.","archived":true,"fork":false,"pushed_at":"2020-07-16T21:02:38.000Z","size":3960,"stargazers_count":284,"open_issues_count":81,"forks_count":63,"subscribers_count":2192,"default_branch":"master","last_synced_at":"2025-10-02T00:38:47.144Z","etag":null,"topics":["data-science","machine-learning","ml","mlnet","nimbusml","python","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-19T11:17:39.000Z","updated_at":"2025-09-16T02:41:04.000Z","dependencies_parsed_at":"2022-08-13T20:00:15.880Z","dependency_job_id":null,"html_url":"https://github.com/microsoft/NimbusML","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/microsoft/NimbusML","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FNimbusML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FNimbusML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FNimbusML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FNimbusML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/NimbusML/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FNimbusML/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278715508,"owners_count":26033296,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","machine-learning","ml","mlnet","nimbusml","python","scikit-learn"],"created_at":"2024-09-30T15:44:23.238Z","updated_at":"2025-10-07T03:30:20.050Z","avatar_url":"https://github.com/microsoft.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NimbusML\n\n`nimbusml` is a Python module that provides Python bindings for [ML.NET](https://github.com/dotnet/machinelearning). \n\nML.NET was originally developed in Microsoft Research and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel, and others. `nimbusml` was built to enable data science teams that are more familiar with Python to take advantage of ML.NET's functionality and performance. \n\n`nimbusml` enables training ML.NET pipelines or integrating ML.NET components directly into [scikit-learn](https://scikit-learn.org/stable/) pipelines. It adheres to existing `scikit-learn` conventions, allowing simple interoperability between `nimbusml` and `scikit-learn` components, while adding a suite of fast, highly optimized, and scalable algorithms, transforms, and components written in C++ and C\\#.\n\nSee examples below showing interoperability with `scikit-learn`. A more detailed example in the [documentation](https://docs.microsoft.com/en-us/nimbusml/tutorials/b_c-sentiment-analysis-3-combining-nimbusml-and-scikit-learn) shows how to use a `nimbusml` component in a `scikit-learn` pipeline, and create a pipeline using only `nimbusml` components.\n\n`nimbusml` supports `numpy.ndarray`, `scipy.sparse_cst`, and `pandas.DataFrame` as inputs. In addition, `nimbusml` also supports streaming from files without loading the dataset into memory with `FileDataStream`, which allows training on data significantly exceeding memory.\n\nDocumentation can be found [here](https://docs.microsoft.com/en-us/NimbusML/overview) and additional notebook samples can be found [here](https://github.com/Microsoft/NimbusML-Samples).\n\n## Installation\n\n`nimbusml` runs on Windows, Linux, and macOS. \n\n`nimbusml` requires Python **2.7**, **3.5**, **3.6**, **3.7** 64 bit version only.\n\nInstall `nimbusml` using `pip` with:\n\n```\npip install nimbusml\n```\n\n`nimbusml` has been reported to work on Windows 10, MacOS 10.13, Ubuntu 14.04, Ubuntu 16.04, Ubuntu 18.04, CentOS 7, and RHEL 7.\n\n## Examples\n\nHere is an example of how to train a model to predict sentiment from text samples (based on [this](https://github.com/dotnet/machinelearning/blob/master/README.md) ML.NET example). The full code for this example is [here](https://github.com/Microsoft/NimbusML-Samples/blob/master/samples/2.1%20%5BText%5D%20Sentiment%20Analysis%201%20-%20Data%20Loading%20with%20Pandas.ipynb).\n\n```python\nfrom nimbusml import Pipeline, FileDataStream\nfrom nimbusml.datasets import get_dataset\nfrom nimbusml.ensemble import FastTreesBinaryClassifier\nfrom nimbusml.feature_extraction.text import NGramFeaturizer\n\ntrain_file = get_dataset('gen_twittertrain').as_filepath()\ntest_file = get_dataset('gen_twittertest').as_filepath()\n\ntrain_data = FileDataStream.read_csv(train_file, sep='\\t')\ntest_data = FileDataStream.read_csv(test_file, sep='\\t')\n\npipeline = Pipeline([ # nimbusml pipeline\n    NGramFeaturizer(columns={'Features': ['Text']}),\n    FastTreesBinaryClassifier(feature=['Features'], label='Label')\n])\n\n# fit and predict\npipeline.fit(train_data)\nresults = pipeline.predict(test_data)\n```\n\nInstead of creating an `nimbusml` pipeline, you can also integrate components into scikit-learn pipelines:\n\n```python\nfrom sklearn.pipeline import Pipeline\nfrom nimbusml.datasets import get_dataset\nfrom nimbusml.ensemble import FastTreesBinaryClassifier\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nimport pandas as pd\n\ntrain_file = get_dataset('gen_twittertrain').as_filepath()\ntest_file = get_dataset('gen_twittertest').as_filepath()\n\ntrain_data = pd.read_csv(train_file, sep='\\t')\ntest_data = pd.read_csv(test_file, sep='\\t')\n\npipeline = Pipeline([ # sklearn pipeline\n    ('tfidf', TfidfVectorizer()), # sklearn transform\n    ('clf', FastTreesBinaryClassifier()) # nimbusml learner\n])\n\n# fit and predict\npipeline.fit(train_data[\"Text\"], train_data[\"Label\"])\nresults = pipeline.predict(test_data[\"Text\"])\n```\n\n\n\nMany additional examples and tutorials can be found in the [documentation](https://docs.microsoft.com/en-us/NimbusML/overview).\n\n\n## Building\n\nTo build `nimbusml` from source please visit our [developer guide](docs/developers/developer-guide.md).\n\n## Contributing\n\nThe contributions guide can be found [here](CONTRIBUTING.md). \n\n## Support\n\nIf you have an idea for a new feature or encounter a problem, please open an [issue](https://github.com/Microsoft/NimbusML/issues/new) in this repository or ask your question on Stack Overflow.\n\n## License\n\nNimbusML is licensed under the [MIT license](LICENSE).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Fnimbusml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2Fnimbusml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Fnimbusml/lists"}