Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/davidmogar/cucco
Text normalization library for Python
https://github.com/davidmogar/cucco
cucco language manipulation normalization punctuation python python-library text
Last synced: 3 days ago
JSON representation
Text normalization library for Python
- Host: GitHub
- URL: https://github.com/davidmogar/cucco
- Owner: davidmogar
- License: mit
- Created: 2015-05-06T13:59:39.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2018-03-26T09:04:54.000Z (over 6 years ago)
- Last Synced: 2024-10-02T08:13:33.698Z (about 1 month ago)
- Topics: cucco, language, manipulation, normalization, punctuation, python, python-library, text
- Language: Python
- Homepage:
- Size: 188 KB
- Stars: 202
- Watchers: 10
- Forks: 27
- Open Issues: 8
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- starred-awesome - cucco - Text normalization library for Python (Python)
README
=================================================
cucco |Build Status| |codecov| |patreon| |gitter|
=================================================Is that... is that a cucco? Sure it is!
Cucco is here to help you to normalize those nasty texts. Removing extra white spaces is not that hard, right? What about stop words? They're no good... oh, and don't even mention emojis!
This little friend will do the hard work for you. Just set it up and let it peck all over your text.
Oh please, shut up and show me where can I grab a cucco!
--------------------------------------------------------The easiest way to get a cucco is by using pip:
::
$ pip install cucco
But sometimes... sometimes you want to go wild and get the biggest... No, the best!... No, THE MIGHTIEST cucco!
To do so, you may use Git. Clone the repository from Github and do it all the hard way:
::
$ git clone https://github.com/davidmogar/cucco.git
$ cd cucco
$ python setup.py installGot it. How do I use it?
------------------------Now that you have a cucco, I'll let it give you all the details.
Cucuco, cuco cuco cucucuco, CUCCO!
-- Cucco
So true... so true...[tears falling down my face]. Just allow me to add some insight.
There are two ways of using cucco. The first one is through its CLI. You can get more info on this by executing the next command:
::
$ cucco --help
The next example code shows how to normalize a short text using cucco inside your code:
.. code:: python
from cucco import Cucco
cucco = Cucco()
print(cucco.normalize('Who let the cucco out?'))This would apply all normalizations to the text ``Who let the cucco out?``. The output for this normalizations would be the next one:
::
cucco
It's also possible to send a list of normalizations to apply, which will be executed in order.
.. code:: python
from cucco import Cucco
cucco = Cucco()
normalizations = [
'remove_extra_white_spaces',
('replace_punctuation', {'replacement': ' '})
]print(cucco.normalize('Who let the cucco out?', normalizations))
This is the output:
::
Who let the cucco out
For more information on how to use cucco you can `check its website `_, which will be ready cucco-soon.
Supported languages
-------------------You never know when a cucco will learn a new trick. Currently, they can remove stop words for 50 languages. The complete list can be `checked here `_. If you are looking for the source you can find it in this `GitHub repository `_ which uses `json` for the stop words files.
Can I contribute?
-----------------Are you a breeder? No? Don't worry, you can still help.
Maybe you have a good new feature to add. Maybe is not even good. It doesn't matter! It is always good to share ideas, isn't it? Just go for it! Pull requests are warmly welcomed.
Not in the mood to implement it yourself? You can still create an issue and comment about it there. Feedback is always great!
.. |Build Status| image:: https://travis-ci.org/davidmogar/cucco.svg?branch=master
:target: https://travis-ci.org/davidmogar/cucco
.. |codecov| image:: https://codecov.io/gh/davidmogar/cucco/branch/master/graph/badge.svg
:target: https://codecov.io/gh/davidmogar/cucco
.. |patreon| image:: https://img.shields.io/badge/support%20on-patreon-red.svg
:target: https://www.patreon.com/davidmogar
.. |gitter| image:: https://img.shields.io/gitter/room/nwjs/nw.js.svg
:target: https://gitter.im/davidmogar/cucco