https://github.com/mehrandvd/simila
A project for string similarities.
https://github.com/mehrandvd/simila
c-sharp string-distance string-matching string-similarity
Last synced: 8 months ago
JSON representation
A project for string similarities.
- Host: GitHub
- URL: https://github.com/mehrandvd/simila
- Owner: mehrandvd
- Created: 2015-02-25T22:27:43.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2023-10-04T10:38:52.000Z (over 2 years ago)
- Last Synced: 2025-04-02T19:47:00.631Z (9 months ago)
- Topics: c-sharp, string-distance, string-matching, string-similarity
- Language: C#
- Size: 2.2 MB
- Stars: 13
- Watchers: 4
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://badge.fury.io/nu/Bit.Simila)
# Installing via NuGet
```powershell
Install-Package Bit.Simila
```
# What is Simila?
Are **Color** and **Colour** equal? No!
```c#
if ("Color" == "Coluor")
// Always false
if ("The Candy Shop" == "The Kandi Schap")
// Always false
```
But they **are Similar** in **Simila**!
```c#
if (simila.AreSimilar("Color", "Colour"))
// It's true now!
if (simila.AreSimilar("The Candy Shop", "The Kandi Schap"));
// It's true now!
```
# How to use
```c#
var simila = new Simila();
// Comparing Words
simila.AreSimilar("Lamborghini", "Lanborgini"); // True
// Comparing Expressions
simila.AreSimilar("Lamborghini is some great car", "Lanborgini is some graet kar"); // True
```
## Customizing Simila
### **Treshold**
You set the sensivity of similarity by setting `Treshold`. If not set, default value is `0.6` which means it considers similar if they are `60%` similar
```c#
// Are similar if their at least 50% similar.
var similaEasy = new Simila()
{
Treshold = 0.5
};
// considered as similar.
similaEasy.IsSimilar("Lamborghini", "Lanborgni"); // True, They are 50% similar.
// Are similar if their at least 80% similar.
var similaTough = new Simila()
{
Treshold = 0.8
};
// considered as NOT similar!
similaEasy.AreSimilar("Lamborghini", "Lanborgni"); // False, Not 80% similar.
```
### Similarity Resolver
Similarity Resolvers are different **algorithms** which Simila can use for similarity checking.
Each algorithm works fine it is being used in its proper scenario.
There are 3 types of similarity resolvers available in Simila:
- **Levenshtein (Default)**: It works good if we need them to **look similar**. You can read more about Levenshtein here: [Levenshtein Algorithm](https://en.wikipedia.org/wiki/Levenshtein_distance)
- **Soundex:** It works good if we need them to **sound similar**. You can read more about Soundex here: [Soundex Algorithm](https://en.wikipedia.org/wiki/Soundex)
- **SharedPair:** It works good if we need them to **structured similar**.
You can configure simila to use a specific algorithm. We call them Resolvers.
#### Using Soudex Resolver
```c#
var similaSounedx = new Simila()
{
Resolver = new SoundexSimilarityResolver()
};
```
#### Using SharedPair Resolver
```c#
var similaSharedPair = new Simila()
{
Resolver = new SharedPairSimilarityResolver()
};
```
#### Using Levenshtein Resolver
Levenshtein is even more configurable. You can set the accepted mistakes both character level and word level.
In this example we told Simila to consider `color` and `colour` words similar.
```c#
var simila = new Simila()
{
Resolver = new PhraseSimilarityResolver(
new WordSimilarityResolver(
new MistakeRepository(new Mistake[]
{
("color", "colour", 1)
})
)
)
};
```
Also you can add some **character level accepted mistakes**.
In this example we told Simila to not only consider `color` and `colour` similar, but also consider `c` and `k` similar too.
```c#
var simila = new Simila()
{
Resolver = new PhraseSimilarityResolver(
new WordSimilarityResolver(
new MistakeRepository(new Mistake[]
{
("color", "colour", 1)
}),
new CharacterSimilarityResolver(
new MistakeRepository(new Mistake[]
{
('c', 'k', 1)
})
)
)
)
};
```