Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/laysakura/search-fulltext

CPAN module supporting fulltext search
https://github.com/laysakura/search-fulltext

Last synced: 17 days ago
JSON representation

CPAN module supporting fulltext search

Awesome Lists containing this project

README

        

# NAME

Search::Fulltext - Fulltext search module

# SYNOPSIS

use Search::Fulltext;

my @docs = (
'I like beer the best',
'Wine makes people saticefied', # does not include beer
'Beer makes people happy',
);

my $fts = Search::Fulltext->new({
docs => \@docs,
});
my $results = $fts->search('beer');
is_deeply($results, [0, 2]); # 1st & 3rd doc include 'beer'
my $results = $fts->search('beer AND happy');
is_deeply($results, [2]); # 3rd doc includes both 'beer' & 'happy'

# DESCRIPTION

[Search::Fulltext](http://search.cpan.org/perldoc?Search::Fulltext) is a fulltext search module. It can be used in a few steps.

[Search::Fulltext](http://search.cpan.org/perldoc?Search::Fulltext) has __pluggable tokenizer__ feature, which possibly provides fulltext search for any language.
Currently, __English__ and __Japanese__ fulltext search are officially supported,
although any other languages which have spaces for separating words could be also used.
See [CUSTOM TOKENIZERS](#CUSTOM\_TOKENIZERS) section to learn how to search non-English languages.

__SQLite__'s __FTS4__ is used as an indexer.
Various queries supported by FTS4 (`AND`, `OR`, `NEAR`, ...) are fully provided.
See ["QUERIES"](#QUERIES) section for details.

# METHODS

## Search::Fulltext->new

Creates fulltext index for documents.

- `@param docs` __\[required\]__

Reference to array whose contents are document to be searched.

- `@param index_file` __\[optional\]__

File path to write fulltext index. By default, on-memory index is used.

- `@param tokenizer` __\[optional\]__

Tokenizer name to use. `simple` (default) and `porter` must be supported.
`icu` and `unicode61` could be used if your SQLite libarary used via [DBD::SQLite](http://search.cpan.org/perldoc?DBD::SQLite) module support them.
See [http://www.sqlite.org/fts3.html\#tokenizer](http://www.sqlite.org/fts3.html\#tokenizer) for more details on FTS4 tokenizers.

Japanese tokenizer `perl 'Search::Fulltext::Tokenizer::MeCab::tokenizer'` is also available after you install
[Search::Fulltext::Tokenizer::MeCab](http://search.cpan.org/perldoc?Search::Fulltext::Tokenizer::MeCab) module.

See [CUSTOM TOKENIZERS](#CUSTOM\_TOKENIZERS) section for developing other tokenizers.

## Search::Fulltext->search

Search terms in documents by query language.

- `@returns`

Array of indexes of `docs` passed through `Search::Fulltext->new` in which `query` is matched.

- `@param query`

Query to search from documents.
See ["QUERIES"](#QUERIES) section for types of queries.

# QUERIES

The simplest query would be a term.

my $results = $fts->search('beer');

Other queries below and combination of them can be also used.

my $results = $fts->search('beer AND happy');
my $results = $fts->search('saticefied OR happy');
my $results = $fts->search('people NOT beer');
my $results = $fts->search('make*');
my $results = $fts->search('"makes people"');
my $results = $fts->search('beer NEAR happy');
my $results = $fts->search('beer NEAR/1 happy');

See [http://www.sqlite.org/fts3.html\#section\_3](http://www.sqlite.org/fts3.html\#section\_3) for an explanation of each type of query.

__NOTE:__ Some custom tokenizers might not support full of these queries above.
Check the document of each tokenizer before using complex queries.

# CUSTOM TOKENIZERS

Custom tokenizers can be implemented by pure perl thanks to ["Perl\_tokenizers" in DBD::SQLite](http://search.cpan.org/perldoc?DBD::SQLite#Perl\_tokenizers).
[Search::Fulltext::Tokenizer::MeCab](http://search.cpan.org/perldoc?Search::Fulltext::Tokenizer::MeCab) is an example of custom tokenizers.

See ["Perl\_tokenizers" in DBD::SQLite](http://search.cpan.org/perldoc?DBD::SQLite#Perl\_tokenizers) and [Search::Fulltext::Tokenizer::MeCab](http://search.cpan.org/perldoc?Search::Fulltext::Tokenizer::MeCab) module to learn how to develop custom tokenizers.

# SUPPORTS

Bug reports and pull requests are welcome at [https://github.com/laysakura/Search-Fulltext](https://github.com/laysakura/Search-Fulltext) !

# VERSION

Version 1.03

# AUTHOR

Sho Nakatani , a.k.a. @laysakura