https://github.com/soenneker/soenneker.utils.string.cosinesimilarity

A utility library for comparing strings via Cosine Similarity
https://github.com/soenneker/soenneker.utils.string.cosinesimilarity

comparison cosine cosinesimilarity cosinesimilaritystringutil csharp dotnet fuzzy matching similarity string tf-idf utils vector

Last synced: 13 days ago
JSON representation

A utility library for comparing strings via Cosine Similarity

Host: GitHub
URL: https://github.com/soenneker/soenneker.utils.string.cosinesimilarity
Owner: soenneker
License: mit
Created: 2023-12-31T18:17:11.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-09-16T22:26:12.000Z (21 days ago)
Last Synced: 2025-09-17T00:42:52.290Z (21 days ago)
Topics: comparison, cosine, cosinesimilarity, cosinesimilaritystringutil, csharp, dotnet, fuzzy, matching, similarity, string, tf-idf, utils, vector
Language: C#
Homepage: https://soenneker.com
Size: 1.15 MB
Stars: 5
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Security: .github/SECURITY.md

Awesome Lists containing this project

README

          [![](https://img.shields.io/nuget/v/soenneker.utils.string.cosinesimilarity.svg?style=for-the-badge)](https://www.nuget.org/packages/soenneker.utils.string.cosinesimilarity/)

[![](https://img.shields.io/github/actions/workflow/status/soenneker/soenneker.utils.string.cosinesimilarity/publish-package.yml?style=for-the-badge)](https://github.com/soenneker/soenneker.utils.string.cosinesimilarity/actions/workflows/publish-package.yml)

[![](https://img.shields.io/nuget/dt/soenneker.utils.string.cosinesimilarity.svg?style=for-the-badge)](https://www.nuget.org/packages/soenneker.utils.string.cosinesimilarity/)

# ![](https://user-images.githubusercontent.com/4441470/224455560-91ed3ee7-f510-4041-a8d2-3fc093025112.png) Soenneker.Utils.String.CosineSimilarity

### A utility library for comparing strings via Cosine Similarity

## Installation

```

dotnet add package Soenneker.Utils.String.CosineSimularity

```

## Why?

Imagine you have two sentences or documents. Cosine similarity helps you figure out how similar they are by looking at the **-words-** they share. Here's why it's handy:

### Easy to Understand:

Cosine similarity is easy to understand. It's a number between 0 and 1 that represents how similar two documents are. The closer to 1, the more similar they are.

### Not Bothered by Length: 

Whether a text is long or short doesn't throw off cosine similarity. It cares more about the words and their relationships than the total number of words.

### Meaning, Not Just Frequency:

It focuses on the meaning of words, not just how often they show up. So, even if one document has a lot more words than another, they might still be considered similar if they share important terms.

### Efficient for Big Tasks:

When you're dealing with lots of documents or a ton of text, cosine similarity is efficient. It doesn't get bogged down by complicated calculations, making it a practical choice for large datasets.

## Usage

```csharp

var text1 = "This is a test";

var text2 = "This is another test";

double result = CosineSimilarityStringUtil.CalculateSimilarityPercentage(text1, text2); // 75

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/soenneker/soenneker.utils.string.cosinesimilarity

Awesome Lists containing this project

README