Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/laysakura/search-fulltext
CPAN module supporting fulltext search
https://github.com/laysakura/search-fulltext
Last synced: 17 days ago
JSON representation
CPAN module supporting fulltext search
- Host: GitHub
- URL: https://github.com/laysakura/search-fulltext
- Owner: laysakura
- License: other
- Created: 2013-10-08T09:02:40.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2013-10-20T15:55:35.000Z (about 11 years ago)
- Last Synced: 2024-06-19T18:08:26.425Z (5 months ago)
- Language: Perl
- Size: 316 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: Changes
- License: LICENSE
Awesome Lists containing this project
README
# NAME
Search::Fulltext - Fulltext search module
# SYNOPSIS
use Search::Fulltext;
my @docs = (
'I like beer the best',
'Wine makes people saticefied', # does not include beer
'Beer makes people happy',
);
my $fts = Search::Fulltext->new({
docs => \@docs,
});
my $results = $fts->search('beer');
is_deeply($results, [0, 2]); # 1st & 3rd doc include 'beer'
my $results = $fts->search('beer AND happy');
is_deeply($results, [2]); # 3rd doc includes both 'beer' & 'happy'# DESCRIPTION
[Search::Fulltext](http://search.cpan.org/perldoc?Search::Fulltext) is a fulltext search module. It can be used in a few steps.
[Search::Fulltext](http://search.cpan.org/perldoc?Search::Fulltext) has __pluggable tokenizer__ feature, which possibly provides fulltext search for any language.
Currently, __English__ and __Japanese__ fulltext search are officially supported,
although any other languages which have spaces for separating words could be also used.
See [CUSTOM TOKENIZERS](#CUSTOM\_TOKENIZERS) section to learn how to search non-English languages.__SQLite__'s __FTS4__ is used as an indexer.
Various queries supported by FTS4 (`AND`, `OR`, `NEAR`, ...) are fully provided.
See ["QUERIES"](#QUERIES) section for details.# METHODS
## Search::Fulltext->new
Creates fulltext index for documents.
- `@param docs` __\[required\]__
Reference to array whose contents are document to be searched.
- `@param index_file` __\[optional\]__
File path to write fulltext index. By default, on-memory index is used.
- `@param tokenizer` __\[optional\]__
Tokenizer name to use. `simple` (default) and `porter` must be supported.
`icu` and `unicode61` could be used if your SQLite libarary used via [DBD::SQLite](http://search.cpan.org/perldoc?DBD::SQLite) module support them.
See [http://www.sqlite.org/fts3.html\#tokenizer](http://www.sqlite.org/fts3.html\#tokenizer) for more details on FTS4 tokenizers.Japanese tokenizer `perl 'Search::Fulltext::Tokenizer::MeCab::tokenizer'` is also available after you install
[Search::Fulltext::Tokenizer::MeCab](http://search.cpan.org/perldoc?Search::Fulltext::Tokenizer::MeCab) module.See [CUSTOM TOKENIZERS](#CUSTOM\_TOKENIZERS) section for developing other tokenizers.
## Search::Fulltext->search
Search terms in documents by query language.
- `@returns`
Array of indexes of `docs` passed through `Search::Fulltext->new` in which `query` is matched.
- `@param query`
Query to search from documents.
See ["QUERIES"](#QUERIES) section for types of queries.# QUERIES
The simplest query would be a term.
my $results = $fts->search('beer');
Other queries below and combination of them can be also used.
my $results = $fts->search('beer AND happy');
my $results = $fts->search('saticefied OR happy');
my $results = $fts->search('people NOT beer');
my $results = $fts->search('make*');
my $results = $fts->search('"makes people"');
my $results = $fts->search('beer NEAR happy');
my $results = $fts->search('beer NEAR/1 happy');See [http://www.sqlite.org/fts3.html\#section\_3](http://www.sqlite.org/fts3.html\#section\_3) for an explanation of each type of query.
__NOTE:__ Some custom tokenizers might not support full of these queries above.
Check the document of each tokenizer before using complex queries.# CUSTOM TOKENIZERS
Custom tokenizers can be implemented by pure perl thanks to ["Perl\_tokenizers" in DBD::SQLite](http://search.cpan.org/perldoc?DBD::SQLite#Perl\_tokenizers).
[Search::Fulltext::Tokenizer::MeCab](http://search.cpan.org/perldoc?Search::Fulltext::Tokenizer::MeCab) is an example of custom tokenizers.See ["Perl\_tokenizers" in DBD::SQLite](http://search.cpan.org/perldoc?DBD::SQLite#Perl\_tokenizers) and [Search::Fulltext::Tokenizer::MeCab](http://search.cpan.org/perldoc?Search::Fulltext::Tokenizer::MeCab) module to learn how to develop custom tokenizers.
# SUPPORTS
Bug reports and pull requests are welcome at [https://github.com/laysakura/Search-Fulltext](https://github.com/laysakura/Search-Fulltext) !
# VERSION
Version 1.03
# AUTHOR
Sho Nakatani , a.k.a. @laysakura