Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/slott56/fuzzycomp

Automatically exported from code.google.com/p/fuzzycomp
https://github.com/slott56/fuzzycomp

Last synced: 7 days ago
JSON representation

Automatically exported from code.google.com/p/fuzzycomp

Awesome Lists containing this project

README

        

About
=====
*Fuzzycomp* is a package purely implemented in Python for comparing
sequences or strings. Some algorithms work equally well on strings as on any
iterable and some algorithms are only for string comparison.

Platforms
=========
*Fuzzycomp* has been tested to work with the following versions of python.
* Python 2.4
* Python 2.5
* Python 2.6
* Python 2.7

Algorithms
==========
*Fuzzycomp* implements the following algorithms.

Comparison
----------
* `Levenshtein Distance `__
* `JaccardDistance `__
* `Hamming Distance `__
* `Jaro Distance `__
* `Jaro Winkler Distance `__
* `Dice Coefficient `__
* `Longest common subsequence `__

Phonetic
--------
* `American Soundex `__
* `New York State Identification and Intelligence System ( NYSIIS ) `__
* `Metaphone `__
* `Cologne Phonetic (Kölner Phonetik) `__

Background
==========
There several major reasons for developing and publishing *fuzzycomp*
although there are other packages available, implementing most of the
algorithms present in *fuzzycomp*.

#. There are an astonishing amount of different ways available for
classifying how similar two strings are, once you leave the domain of
perfect matching. It started with me, needing a way to tell how well a
search phrase was matching the results provided by an external API. Once I
started searching, I was really amazed by the subject and by the many
alternatives there were so I wanted to learn more by developing the
algorithms in Python.
#. This is my first Python packages developed with the intent to be
distributed and used by others. I wanted to learn how to structure and
develop a Python package that could be released to the public and that
could be useful in some way.
#. The last couple of years I have read more and more about unit-testing
and test driven development (TDD), almost always positive things. I have
not had the opportunity to use unit-tests in any previous projects,
so this is my learning ground for TDD and writing unit-tests.

Install
=======
The package is not yet represented on PyPI so the best ways to install
*fuzzycomp* at the moment are:

Using *pip* replacing X Y Z for the version number that you want to install::

pip install http://fuzzycomp.googlecode.com/files/fuzzycomp-X.Y.Z.tar.gz

Or by downloading and unpacking the desired version and from the console
executing::

python setup.py install

Usage
=====
Some example usage of *fuzzycomp*::

>>> from fuzzycomp import fuzzycomp
>>> fuzzycomp.levenshtein_distance( "Hello", "world" )
3

>>> fuzzycomp.soundex("Alfred")
'A416'

Alternatives
============
If speed is of utmost importance to you or you find yourself comparing very
long sequences, you should probably consider some of the available
alternatives out there. They implement most of the algorithms in C so should
be considerably faster.

Some, but by no means all, alternatives are:
* `python-Levenshtein `__

* `Fuzzy `__

* `jellyfish `__

Contact
=======
For bugs or feature requests, please use the issue tracker on the project page.

To get in contact with me regarding the project,
please email [email protected] or follow me on twitter
`@fuzzycode `__.