Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vzhong/vocab
Vocabulary objects for natural language processing
https://github.com/vzhong/vocab
machine-learning nlp
Last synced: 30 days ago
JSON representation
Vocabulary objects for natural language processing
- Host: GitHub
- URL: https://github.com/vzhong/vocab
- Owner: vzhong
- License: mit
- Created: 2017-04-19T21:18:42.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-06-01T19:40:55.000Z (over 4 years ago)
- Last Synced: 2024-09-24T17:46:14.246Z (about 2 months ago)
- Topics: machine-learning, nlp
- Language: Python
- Size: 19.5 KB
- Stars: 14
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
Vocab
=====.. image:: https://readthedocs.org/projects/vocab/badge/?version=latest
:target: http://vocab.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
.. image:: https://travis-ci.org/vzhong/vocab.svg?branch=master
:target: https://travis-ci.org/vzhong/vocabVocab is a python package that provides vocabulary objects for natural language processing.
Installation
------------.. code-block:: sh
pip install vocab
pip install git+https://github.com/vzhong/vocab.gitUsage
-----.. code-block:: python
>>> from vocab import Vocab, UnkVocab
>>> v = Vocab()
>>> v.word2index('hello', train=True)
0
>>> v.word2index(['hello', 'world'], train=True)
[0, 1]
>>> v.index2word([1, 0])
['world', 'hello']
>>> v.index2word(1)
'world'
>>> small = v.prune_by_count(2)
>>> small.to_dict()
{'counts': {'hello': 2}, 'index2word': ['hello']}
>>> u = UnkVocab()
>>> u.word2index(['hello', 'world'], train=True)
[1, 2]
>>> u.word2index('hello friend !'.split())
[1, 0, 0]
>>> u.index2word(0)
''