Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/stewid/xapr
R bindings to the Xapian search engine
https://github.com/stewid/xapr
Last synced: 3 months ago
JSON representation
R bindings to the Xapian search engine
- Host: GitHub
- URL: https://github.com/stewid/xapr
- Owner: stewid
- License: gpl-2.0
- Created: 2014-09-12T16:59:37.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2021-04-17T19:11:25.000Z (almost 4 years ago)
- Last Synced: 2024-06-12T18:16:46.180Z (7 months ago)
- Language: R
- Size: 102 KB
- Stars: 4
- Watchers: 4
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
[![Build Status](https://travis-ci.org/stewid/xapr.png)](https://travis-ci.org/stewid/xapr)
# Introduction
`xapr` is an R package that provides an interface to the
[Xapian](http://xapian.org/) search engine from R, allowing both
indexing and retrieval operations. A great introduction to
[Xapian](http://xapian.org/) is the
[Getting Started with Xapian](http://getting-started-with-xapian.readthedocs.org/en/latest/).## Indexing
Index the content of a `data.frame` to documents with the Xapian
search engine A `document` is the data returned from a search.The index plan is specified symbolically. An index plan has the form
`data ~ terms` where `data` is the blob of data returned from a search
and the `terms` are the basis for a search in Xapian. A first order
term index the text in the column as free text. A specification of the
form `prefix:term` indicates that the text in `term` should be
indexed with the prefix `prefix`.The prefix is a short string at the beginning of the term to indicate
which field the term indexes. Valid prefixes are: 'A' ,'D', 'E', 'G',
'H', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
'X', 'Y' and 'Z'. See http://xapian.org/docs/omega/termprefixes for a
list of conventional prefixes.The specification `prefix*term` is the same as `term +
prefix:term`. The prefix `X` will create a user defined prefix by
appending the uppercase `term` to `X`. The prefix `Q` will use data
in the `term` column as a unique identifier for the document. `NA`
values in indexed columns are skipped.No response e.g. `~ term + prefix:term` writes the row number as
data to the document.The specification `~X*.` creates prefix terms with all columns plus
free text.If the response contains one or more columns, e.g. `col_1 + col_2 ~
X*.` the response is first converted to `JSON`. A compact form to
convert all fields to `JSON` is to use `. ~ terms`. It is also
possible to drop response fields e.g. `. - col_1 - col_2 ~ X*.` to
include all fields in the response except `col_1` and `col_2`.### Example
This is an `R` version of the `Python` example in the
[Getting Started with Xapian](http://getting-started-with-xapian.readthedocs.org/en/latest/practical_example/index.html)We are going to build a simple search system based on museum catalogue
data released under the Creative Commons Attribution-NonCommercial-
ShareAlike license (http://creativecommons.org/licenses/by-nc-sa/3.0/)
by the Science Museum in London, UK.
(http://api.sciencemuseum.org.uk/documentation/collections/)```{r, index, message = FALSE, comment = "#>"}
library(xapr)## The first 100 rows of the museum catalogue data is distributed with
## the 'xapr' package
filename <- system.file("extdata/NMSI_100.csv", package="xapr")
nmsi <- read.csv(filename, as.is = TRUE)## Create a temporary directory to hold the database
path <- tempfile(pattern="xapr-")
dir.create(path)## Index the 'TITLE' and 'DESCRIPTION' fields with both a suitable
## prefix and without a prefix for general search. Use the 'id_NUMBER'
## as unique identifier. Store all the fields as JSON for display
## purposes.
db <- xindex(. ~ S*TITLE + X*DESCRIPTION + Q:id_NUMBER, nmsi, path)## Display a summary of the Xapian database
summary(db)## Run a search and display docid (rowname) and TITLE from each match
xsearch(db, "watch", TITLE ~ .)## Run a search with multiple words
xsearch(db, "Dent watch", TITLE ~ .)## Run a search with prefix
xsearch(db, "title:sunwatch", TITLE ~ title:S)## Run a search with multiple prefixes
xsearch(db,
"description:\"leather case\" AND title:sundial",
TITLE ~ title:S + description:XDESCRIPTION)
```## Installation
The development files for the `Xapian` search engine must be
installed.```
$ sudo apt-get install libxapian-dev
```To install the development version of `xapr`, it's easiest to use the
devtools package:```{r, install, eval = FALSE}
# install.packages("devtools")
library(devtools)
install_github("stewid/xapr")
```Another alternative is to use `git` and `make`
```
$ git clone https://github.com/stewid/xapr.git
$ cd xapr
$ make install
```**NOTE:** The package is in a very early development phase. Functions
and documentation may be incomplete and subject to
change. Suggestions, bugs, forks and pull requests are
appreciated. Get in touch.# License
The `xapr` package is licensed under the GPL (>= 2). See these files
for additional details:- [LICENSE](LICENSE) - `xapr` package license