Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Neamar/document-highlighter

Context-aware highlighting for search queries in HTML documents.
https://github.com/Neamar/document-highlighter

Last synced: 4 months ago
JSON representation

Context-aware highlighting for search queries in HTML documents.

Awesome Lists containing this project

README

        

Content aware document Highlighter
=======================
![Build Status](https://travis-ci.org/Neamar/document-highlighter.png)
![Coverage Status](https://coveralls.io/repos/Neamar/document-highlighter/badge.png?branch=master)

## What is `document highlighter`?
Add highlight to a raw / HTML document for the specified query. Handle unicode, stop-words and punctuation.
Generate HTML-compliant highlights, even for complex markup.

## Samples
### Plain text
#### Simple case
The following text :

> The index analysis module acts as a configurable registry of Analyzers that can be used in order to both break indexed (analyzed) fields when a document is indexed and process query strings. It maps to the Lucene Analyzer.

When highlighted for the query `The index analysis string` will become:

> **The index analysis** module acts as a configurable registry of Analyzers that can be used in order to both break indexed (analyzed) fields when a document is indexed and process query **strings**. It maps to the Lucene Analyzer.

Note generated markup is minimal (one item per match, and not one item per word).

#### Stopwords
Document highlighter handles stopwords and punctuation according to the language specified. For instance, the following text:

> Install this library, and start using it.

When highlighted for the query `install library` will become:

> **Install this library**, and start using it.

### HTML
This also works for HTML documents, e.g. :

> This document contains _italics_ and stuff.

When highlighted for the query `it contains some italic empty` will become:
> This document **contains _italics_** and stuff.

Document highlighter maintains original markup and add wrapping tags as needed.

## Usage
### Highlight plain text documents
```javascript
var highlighter = require('document-highlighter');

var hl = highlighter.text(
'In JavaScript, you can define a callback handler in regex string replace operations',
'callback handler in operations'
);

console.log(hl.text);
// "In JavaScript, you can define a callback handler in regex string replace operations"

console.log(hl.indices);
// [
// { startIndex: 32, endIndex: 51, content: 'callback handler in' },
// { startIndex: 73, endIndex: 83, content: 'operations' }
// ]
```

### Highlight HTML documents
```javascript
var highlighter = require('document-highlighter');

var hl = highlighter.html(
'Eat drink and be merry for tomorrow we die',
'merry for tomorrow'
);

console.log(hl.html);
// Eat drink and be merry for tomorrow we die

console.log(hl.text);
// Eat drink and be merry for tomorrow we die
```

### Customize highlight markup
```javascript
var highlighter = require('document-highlighter');

var hl = highlighter.text(
'In JavaScript, you can define a callback handler in regex string replace operations',
'callback handler in operations',
{
before: '',
after: '
',
}
);

console.log(hl.text);
// "In JavaScript, you can define a callback handler in regex string replace operations"
```

> Note: in HTML mode, your highlight may be split up in multiple items in order to keep your existing markup (block level elements stop inline highlighting). The default is to add a `.secondary` class; but you can override this using the `beforeSecond` key in the option.

In some case, you may want to customize highlighting for all calls to the highlighter. You can use `defaultOptions` parameter. Note you cannot directly override this with a new object; you need to update the keys one by one.

```javascript
var highlighter = require('document-highlighter');
highlighter.defaultOptions.before = '';
highlighter.defaultOptions.after = '
';

var hl = highlighter.text(
'In JavaScript, you can define a callback handler in regex string replace operations',
'callback handler in operations'
);

console.log(hl.text);
// "In JavaScript, you can define a callback handler in regex string replace operations"
```