https://github.com/ikegami-yukino/shellinford-python
Wavelet Matrix/Tree succinct data structure for full text search (based on shellinford C++ library)
https://github.com/ikegami-yukino/shellinford-python
fm-index fulltext-search
Last synced: 7 months ago
JSON representation
Wavelet Matrix/Tree succinct data structure for full text search (based on shellinford C++ library)
- Host: GitHub
- URL: https://github.com/ikegami-yukino/shellinford-python
- Owner: ikegami-yukino
- License: bsd-3-clause
- Created: 2014-03-10T15:34:50.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2019-02-08T13:54:18.000Z (over 6 years ago)
- Last Synced: 2025-03-18T03:07:59.204Z (7 months ago)
- Topics: fm-index, fulltext-search
- Language: C++
- Homepage:
- Size: 471 KB
- Stars: 22
- Watchers: 4
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.rst
- License: LICENSE
Awesome Lists containing this project
README
shellinford
===========
|travis| |coveralls| |pyversion| |version| |license|Shellinford is an implementation of a Wavelet Matrix/Tree succinct data structure for document retrieval.
It is based on `shellinford`_ C++ library.
.. _shellinford: https://github.com/echizentm/shellinford
NOTE: This module requires C++11 compiler
Installation
============::
$ pip install shellinford
Usage
=====Create a new FM-index instance
-------------------------------.. code:: python
>>> import shellinford
>>> fm = shellinford.FMIndex()- shellinford.Shellinford([use_wavelet_tree=True, filename=None])
- When given a filename, Shellinford loads FM-index data from the file
Build FM-index
-----------------------------.. code:: python
>>> fm.build(['Milky Holmes', 'Sherlock "Sheryl" Shellingford', 'Milky'], 'milky.fm')
- build([docs, filename])
- When given a filename, Shellinford stores FM-index data to the file
Search word from FM-index
---------------------------------.. code:: python
>>> for doc in fm.search('Milky'):
>>> print('doc_id:', doc.doc_id)
>>> print('count:', doc.count)
>>> print('text:', doc.text)
doc_id: 0
count: [1]
text: Milky Holmes
doc_id: 2
count: [1]
text: Milky>>> for doc in fm.search(['Milky', 'Holmes']):
>>> print('doc_id:', doc.doc_id)
>>> print('count:', doc.count)
>>> print('text:', doc.text)
doc_id: 1
count: [1]
text: Milky Holmes- search(query, [_or=False, ignores=[]])
- If `_or` = True, then "OR" search is executed, else "AND" search
- Given `ignores`, "NOT" search is also executed
- NOTE: The search function is available after FM-index is built or loadedCount word from FM-index
---------------------------------.. code:: python
>>> fm.count('Milky'):
2>>> fm.count(['Milky', 'Holmes']):
1- count(query, [_or=False])
- If `_or` = True, then "OR" search is executed, else "AND" search
- NOTE: The count function is available after FM-index is built or loaded
- This function is slightly faster than the search functionAdd a document
---------------------------------.. code:: python
>>> fm.push_back('Baritsu')
- push_back(doc)
- NOTE: A document added by this method is not available to search until build
Read FM-index from a binary file
---------------------------------.. code:: python
>>> fm.read('milky_holmes.fm')
- read(path)
Write FM-index binary to a file
---------------------------------.. code:: python
>>> fm.write('milky_holmes.fm')
- write(path)
Check Whether FM-Index contains string
---------------------------------------.. code:: python
>>> 'baritsu' in fm
License
=========
- Wrapper code is licensed under the New BSD License.
- Bundled `shellinford`_ C++ library (c) 2012 echizen_tm is licensed under the New BSD License... |travis| image:: https://travis-ci.org/ikegami-yukino/shellinford-python.svg?branch=master
:target: https://travis-ci.org/ikegami-yukino/shellinford-python
:alt: travis-ci.org.. |coveralls| image:: https://coveralls.io/repos/ikegami-yukino/shellinford-python/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/ikegami-yukino/shellinford-python?branch=master
:alt: coveralls.io.. |pyversion| image:: https://img.shields.io/pypi/pyversions/shellinford.svg
.. |version| image:: https://img.shields.io/pypi/v/shellinford.svg
:target: http://pypi.python.org/pypi/shellinford/
:alt: latest version.. |license| image:: https://img.shields.io/pypi/l/shellinford.svg
:target: http://pypi.python.org/pypi/shellinford/
:alt: license