{"id":13445592,"url":"https://github.com/curiosity-ai/catalyst","last_synced_at":"2026-01-17T00:44:06.028Z","repository":{"id":41251431,"uuid":"200471228","full_name":"curiosity-ai/catalyst","owner":"curiosity-ai","description":"🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.","archived":false,"fork":false,"pushed_at":"2025-10-19T17:40:17.000Z","size":14051,"stargazers_count":825,"open_issues_count":47,"forks_count":83,"subscribers_count":39,"default_branch":"master","last_synced_at":"2025-12-28T01:10:53.307Z","etag":null,"topics":["ai","artificial-intelligence","csharp","embeddings","machine-learning","natural-language-processing","natural-language-understanding","nlp"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/curiosity-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-08-04T09:00:12.000Z","updated_at":"2025-12-15T10:00:13.000Z","dependencies_parsed_at":"2025-10-19T19:23:08.711Z","dependency_job_id":"ca1bebcf-3df0-4452-8f45-abf0eb521d7c","html_url":"https://github.com/curiosity-ai/catalyst","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/curiosity-ai/catalyst","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curiosity-ai%2Fcatalyst","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curiosity-ai%2Fcatalyst/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curiosity-ai%2Fcatalyst/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curiosity-ai%2Fcatalyst/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/curiosity-ai","download_url":"https://codeload.github.com/curiosity-ai/catalyst/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curiosity-ai%2Fcatalyst/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28490523,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T23:55:29.509Z","status":"ssl_error","status_checked_at":"2026-01-16T23:55:29.108Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","artificial-intelligence","csharp","embeddings","machine-learning","natural-language-processing","natural-language-understanding","nlp"],"created_at":"2024-07-31T05:00:36.304Z","updated_at":"2026-01-17T00:44:06.009Z","avatar_url":"https://github.com/curiosity-ai.png","language":"C#","funding_links":[],"categories":["Frameworks, Libraries and Tools","框架, 库和工具","NLP Tools, Libraries, and Frameworks","C# #","Uncategorized","Machine Learning and Data Science","nlp","机器学习和数据科学","Parsing","ai","Natural Language Processing"],"sub_categories":["Machine Learning and Data Science","机器学习和科学研究","Uncategorized","GUI - other"],"readme":"\n[![Nuget](https://img.shields.io/nuget/v/Catalyst.svg?maxAge=0\u0026colorB=brightgreen)](https://www.nuget.org/packages/Catalyst/) [![Build Status](https://dev.azure.com/curiosity-ai/mosaik/_apis/build/status/catalyst?branchName=master)](https://dev.azure.com/curiosity-ai/mosaik/_build/latest?definitionId=10\u0026branchName=master)\n\n\u003cimg src=\"https://raw.githubusercontent.com/curiosity-ai/catalyst/master/Catalyst/catalyst.png?token=ACDCOAYAIML2KGJTHTJP27C5KGCEC\"/\u003e\n\n\u003ca href=\"https://curiosity.ai\"\u003e\u003cimg src=\"https://curiosity.ai/media/cat.color.square.svg\" width=\"100\" height=\"100\" align=\"right\" /\u003e\u003c/a\u003e\n\n_**catalyst**_ is a C# Natural Language Processing library built for speed. Inspired by [spaCy's design](https://spacy.io/), it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.\n\n[![Gitter](https://badges.gitter.im/curiosityai/catalyst.svg)](https://gitter.im/curiosityai/catalyst?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\n## ⚡ Features\n- Fast, modern pure-C# NLP library, supporting [.NET standard 2.0](https://docs.microsoft.com/en-us/dotnet/standard/net-standard)\n- Cross-platform, runs anywhere [.NET core](https://dotnet.microsoft.com/download) is supported - Windows, Linux, macOS and even ARM\n- Non-destructive [tokenization](https://github.com/curiosity-ai/catalyst/blob/master/Catalyst/src/Models/Base/FastTokenizer.cs), \u003e99.9% [RegEx-free](https://blog.codinghorror.com/regex-performance/), \u003e1M tokens/s on a modern CPU\n- Named Entity Recognition ([gazeteer](https://github.com/curiosity-ai/catalyst/blob/master/Catalyst/src/Models/EntityRecognition/Spotter.cs), [rule-based](https://github.com/curiosity-ai/catalyst/blob/master/Catalyst/src/Models/EntityRecognition/PatternSpotter.cs) \u0026 [perceptron-based](https://github.com/curiosity-ai/catalyst/blob/master/Catalyst/src/Models/EntityRecognition/AveragePerceptronEntityRecognizer.cs))\n- Pre-trained models based on [Universal Dependencies](https://universaldependencies.org/) project\n- Custom models for learning [Abbreviations](https://github.com/curiosity-ai/catalyst/blob/master/Catalyst/src/Models/Special/AbbreviationCapturer.cs) \u0026 [Senses](https://github.com/curiosity-ai/catalyst/blob/master/Catalyst/src/Models/EntityRecognition/Spotter.cs#L214)\n- Out-of-the-box support for training [FastText](https://fasttext.cc/) and [StarSpace](https://github.com/facebookresearch/StarSpace) embeddings (pre-trained models coming soon)\n- Part-of-speech tagging\n- Language detection using [FastText](https://github.com/curiosity-ai/catalyst/blob/master/Catalyst/src/Models/Special/FastTextLanguageDetector.cs) or [cld3](https://github.com/curiosity-ai/catalyst/blob/master/Catalyst/src/Models/Special/LanguageDetector.cs)\n- Efficient binary serialization based on [MessagePack](https://github.com/neuecc/MessagePack-CSharp/)\n- Pre-built models for [language packages](https://www.nuget.org/packages?q=catalyst.models) ✨\n- Lemmatization ✨ (using lookup tables ported from [spaCy](https://github.com/explosion/spacy-lookups-data))\n\n\n## Language Packages ✨\nAll language-specific data and models are provided as NuGet packages, you can find all packages [here](https://www.nuget.org/packages?q=catalyst.models). \n\nThe new models are trained on the latest release of [Universal Dependencies v2.7](https://universaldependencies.org/).\n\nWe've also added the option to store and load models using streams:\n`````csharp\n// Creates and stores the model\nvar isApattern = new PatternSpotter(Language.English, 0, tag: \"is-a-pattern\", captureTag: \"IsA\");\nisApattern.NewPattern(\n    \"Is+Noun\",\n    mp =\u003e mp.Add(\n        new PatternUnit(P.Single().WithToken(\"is\").WithPOS(PartOfSpeech.VERB)),\n        new PatternUnit(P.Multiple().WithPOS(PartOfSpeech.NOUN, PartOfSpeech.PROPN, PartOfSpeech.AUX, PartOfSpeech.DET, PartOfSpeech.ADJ))\n));\nusing(var f = File.OpenWrite(\"my-pattern-spotter.bin\"))\n{\n    await isApattern.StoreAsync(f);\n}\n\n// Load the model back from disk\nvar isApattern2 = new PatternSpotter(Language.English, 0, tag: \"is-a-pattern\", captureTag: \"IsA\");\n\nusing(var f = File.OpenRead(\"my-pattern-spotter.bin\"))\n{\n    await isApattern2.LoadAsync(f);\n}\n`````\n\n\n## ✨ Getting Started\n\nUsing _**catalyst**_ is as simple as installing its [NuGet Package](https://www.nuget.org/packages/Catalyst), and setting the storage to use our online repository. This way, models will be lazy loaded either from disk or downloaded from our online repository. **Check out also some of the [sample projects](https://github.com/curiosity-ai/catalyst/tree/master/samples)** for more examples on how to use _**catalyst**_.\n\n\n```csharp\nCatalyst.Models.English.Register(); //You need to pre-register each language (and install the respective NuGet Packages)\n\nStorage.Current = new DiskStorage(\"catalyst-models\");\nvar nlp = await Pipeline.ForAsync(Language.English);\nvar doc = new Document(\"The quick brown fox jumps over the lazy dog\", Language.English);\nnlp.ProcessSingle(doc);\nConsole.WriteLine(doc.ToJson());\n```\n\nYou can also take advantage of C# lazy evaluation and native multi-threading support to process a large number of documents in parallel:\n\n```csharp\nvar docs = GetDocuments();\nvar parsed = nlp.Process(docs);\nDoSomething(parsed);\n\nIEnumerable\u003cIDocument\u003e GetDocuments()\n{\n    //Generates a few documents, to demonstrate multi-threading \u0026 lazy evaluation\n    for(int i = 0; i \u003c 1000; i++)\n    {\n        yield return new Document(\"The quick brown fox jumps over the lazy dog\", Language.English);\n    }\n}\n\nvoid DoSomething(IEnumerable\u003cIDocument\u003e docs)\n{\n    foreach(var doc in docs)\n    {\n        Console.WriteLine(doc.ToJson());\n    }\n}\n```\n\nTraining a new [FastText](https://fasttext.cc/) [word2vec](https://en.wikipedia.org/wiki/Word2vec) embedding model is as simple as this:\n\n```csharp\nvar nlp = await Pipeline.ForAsync(Language.English);\nvar ft = new FastText(Language.English, 0, \"wiki-word2vec\");\nft.Data.Type = FastText.ModelType.CBow;\nft.Data.Loss = FastText.LossType.NegativeSampling;\nft.Train(nlp.Process(GetDocs()));\nft.StoreAsync();\n```\n\nFor fast embedding search, we have also released a C# version of the [\"Hierarchical Navigable Small World\" (HNSW)](https://arxiv.org/abs/1603.09320) algorithm on [NuGet](https://www.nuget.org/packages/HNSW/), based on our fork of Microsoft's [HNSW.Net](https://github.com/curiosity-ai/hnsw.net). We have also released a C# version of the \"Uniform Manifold Approximation and Projection\" ([UMAP](https://umap-learn.readthedocs.io/en/latest/how_umap_works.html)) algorithm for dimensionality reduction on [GitHub](https://github.com/curiosity-ai/umap-csharp) and on [NuGet](https://www.nuget.org/packages/UMAP/).\n\n\n\n## 📖 Links\n\n| Documentation     |                                                           |\n| ----------------- | --------------------------------------------------------- |\n| [Contribute]      | How to contribute to _**catalyst**_ codebase.             |\n| [Samples]         | Sample projects demonstrating _**catalyst**_ capabilities |\n| [![Gitter](https://badges.gitter.im/curiosityai/catalyst.svg)](https://gitter.im/curiosityai/catalyst?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)  | Join our gitter channel                                    |\n\n[Contribute]: https://github.com/curiosity-ai/catalyst/blob/master/CONTRIBUTING.md\n[Samples]: https://github.com/curiosity-ai/catalyst/tree/master/samples\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcuriosity-ai%2Fcatalyst","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcuriosity-ai%2Fcatalyst","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcuriosity-ai%2Fcatalyst/lists"}