Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wetneb/lucene-twitter

Lucene tokenizer and filter for Twitter data
https://github.com/wetneb/lucene-twitter

Last synced: 29 days ago
JSON representation

Lucene tokenizer and filter for Twitter data

Awesome Lists containing this project

README

        

Lucene Twitter Tokenizer [![Build Status](https://travis-ci.org/wetneb/lucene-twitter.svg?branch=master)](https://travis-ci.org/wetneb/lucene-twitter)
========================

This package provides a Lucene tokenizer and filter for Twitter data.
* The `TwitterTokenizer` respects Twitter usernames and hashtags as single tokens (including
the `@` and `#` signs). It is based on Lucene's `ClassicTokenizer` and behaves identically
otherwise.
* The `TwitterLowercaseFilter` lowercases all Twitter usernames and hashtags, as they are
case-insensitive.

Installing
----------

Create a `server/solr/lib` folder in your Solr install if it does not exist already,
and download there [the .jar for this plugin](https://github.com/wetneb/lucene-twitter/releases/download/v0.0.1/lucene-twitter-0.0.1.jar).

Usage
-----

In a Solr schema, you can analyze fields using the tokenizer and filter, for instance like this:
```xml








```

Released under the Apache-2.0 license.