https://github.com/soenneker/soenneker.utils.string.cosinesimilarity
A utility library for comparing strings via Cosine Similarity
https://github.com/soenneker/soenneker.utils.string.cosinesimilarity
comparison cosine cosinesimilarity cosinesimilaritystringutil csharp dotnet fuzzy matching similarity string tf-idf utils vector
Last synced: 13 days ago
JSON representation
A utility library for comparing strings via Cosine Similarity
- Host: GitHub
- URL: https://github.com/soenneker/soenneker.utils.string.cosinesimilarity
- Owner: soenneker
- License: mit
- Created: 2023-12-31T18:17:11.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-09-16T22:26:12.000Z (21 days ago)
- Last Synced: 2025-09-17T00:42:52.290Z (21 days ago)
- Topics: comparison, cosine, cosinesimilarity, cosinesimilaritystringutil, csharp, dotnet, fuzzy, matching, similarity, string, tf-idf, utils, vector
- Language: C#
- Homepage: https://soenneker.com
- Size: 1.15 MB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Security: .github/SECURITY.md
Awesome Lists containing this project
README
[](https://www.nuget.org/packages/soenneker.utils.string.cosinesimilarity/)
[](https://github.com/soenneker/soenneker.utils.string.cosinesimilarity/actions/workflows/publish-package.yml)
[](https://www.nuget.org/packages/soenneker.utils.string.cosinesimilarity/)#  Soenneker.Utils.String.CosineSimilarity
### A utility library for comparing strings via Cosine Similarity## Installation
```
dotnet add package Soenneker.Utils.String.CosineSimularity
```## Why?
Imagine you have two sentences or documents. Cosine similarity helps you figure out how similar they are by looking at the **-words-** they share. Here's why it's handy:
### Easy to Understand:
Cosine similarity is easy to understand. It's a number between 0 and 1 that represents how similar two documents are. The closer to 1, the more similar they are.### Not Bothered by Length:
Whether a text is long or short doesn't throw off cosine similarity. It cares more about the words and their relationships than the total number of words.### Meaning, Not Just Frequency:
It focuses on the meaning of words, not just how often they show up. So, even if one document has a lot more words than another, they might still be considered similar if they share important terms.### Efficient for Big Tasks:
When you're dealing with lots of documents or a ton of text, cosine similarity is efficient. It doesn't get bogged down by complicated calculations, making it a practical choice for large datasets.## Usage
```csharp
var text1 = "This is a test";
var text2 = "This is another test";double result = CosineSimilarityStringUtil.CalculateSimilarityPercentage(text1, text2); // 75
```