Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wetneb/lucene-twitter
Lucene tokenizer and filter for Twitter data
https://github.com/wetneb/lucene-twitter
Last synced: 29 days ago
JSON representation
Lucene tokenizer and filter for Twitter data
- Host: GitHub
- URL: https://github.com/wetneb/lucene-twitter
- Owner: wetneb
- License: apache-2.0
- Created: 2019-04-06T10:20:45.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-04-06T15:47:18.000Z (over 5 years ago)
- Last Synced: 2024-10-13T14:15:17.681Z (2 months ago)
- Language: Java
- Size: 27.3 KB
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
Lucene Twitter Tokenizer [![Build Status](https://travis-ci.org/wetneb/lucene-twitter.svg?branch=master)](https://travis-ci.org/wetneb/lucene-twitter)
========================This package provides a Lucene tokenizer and filter for Twitter data.
* The `TwitterTokenizer` respects Twitter usernames and hashtags as single tokens (including
the `@` and `#` signs). It is based on Lucene's `ClassicTokenizer` and behaves identically
otherwise.
* The `TwitterLowercaseFilter` lowercases all Twitter usernames and hashtags, as they are
case-insensitive.Installing
----------Create a `server/solr/lib` folder in your Solr install if it does not exist already,
and download there [the .jar for this plugin](https://github.com/wetneb/lucene-twitter/releases/download/v0.0.1/lucene-twitter-0.0.1.jar).Usage
-----In a Solr schema, you can analyze fields using the tokenizer and filter, for instance like this:
```xml
```
Released under the Apache-2.0 license.